Add andersen region-aware CI job [AndersenRegionAware] #526

phate · 2024-06-26T05:21:10Z

No description provided.

phate · 2024-06-26T06:04:28Z

2024-06-26T05:41:37.8621716Z **********
2024-06-26T05:41:37.9042968Z ********************
2024-06-26T05:41:37.9045056Z Failed Tests (1):
2024-06-26T05:41:37.9046072Z   test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-pr38819.test
2024-06-26T05:41:37.9046624Z 
2024-06-26T05:41:37.9046629Z 
2024-06-26T05:41:37.9046735Z Testing Time: 662.54s
2024-06-26T05:41:37.9047027Z   Passed: 1775
2024-06-26T05:41:37.9047283Z   Failed:    1

This is the same single source test that also failed for Steensgaard region-aware. Moreover, we did not have any problem in the run when we added the target to the llvm-test-suite. Ultimately, it means we should invest in some bigger/better/heavier guns for debugging.

The problem arises due to the alias analysis in combination with the region-aware encoder relaxing the sequentialization order, and depending on which sequence the back-end chooses, it can happen that we get a valid one according to program semantics, or maybe a faulty one, because of bugs in the alias analysis/provisioning/encoder.

haved

The CI change itself looks good. I definitely agree that some hardening of tests is needed to avoid flakey CI jobs

phate · 2024-06-26T17:02:10Z

@haved I believe that has nothing to do with flaky CI jobs. My suspicion is that this is a bug in the alias analysis and/or the encoding. The region-aware encoding relaxes the sequential order of the operations with side-effects. This means that the jlm back-end could emit a wrong order of operations if there is a bug in any of those passes. This would not happen with the agnostic encoding (at least not without further transformations such as invariant value redirection + dead node elimination).

phate · 2024-06-26T17:03:08Z

Let's not merge this before I have not found out what the root cause is.

haved · 2024-06-26T17:27:53Z

@phate, I agree with your description of the problem. Calling it "flakey CI" was not a good description on my part.

In general there seems to be multiple places in the compiler where non-determinism is introduced due to iterating over sets and maps keyed by addresses. As long as there are no bugs this is not an issue, but when there are, it can make it more difficult to reproduce.

This non-determinism combined with the current bug you describe is why I referred to the potential CI job as flaky. Even if the CI job fails consistently, it is no fun when you run a failing job locally and no issues occur.

To make bugs more reproducible, I see a few options, but there are probably many more:

serialize the program in multiple ways to detect missing state dependencies. Either do N random topological orderings, or do the one that lexographically minimizes the ordering, and one that lexographically maximizes it, in order to make as many operations as possible switch places. If there are bugs, they are hopefully detected consistently (but no guarantee).
change the implementation of util::HashSet and add a new util::HashMap that in debug builds preserve insertion order when iterating, or deterministically randomizes iteration order.
somehow make malloc fully deterministic in debug builds. I'm not sure if this is possible?

phate · 2024-06-26T19:38:45Z

@haved Right, I see your point with flaky CI now. Yes, having the compiler behave differently/non-deterministically is not great for reproducibility. Regarding your mentioned options:

The option of producing multiple program serializations is high on my TODO list. The basic idea is that we simply have a tool that takes an LLVM file and some optimizations as input. It applies the optimizations on the input LLVM file and then just writes several output LLVM files to a folder. Each of these output files has a different permutation of operations with side-effects (everything that consumes/produces state edges). If that single input file is a complete program, then we can compile each of the output files to an executable, run each executable, and compare the outputs between all of them. If the output diverges of a single executable from the other executables, we clearly have a bug in one of the input optimization passes.
This option makes sense as well, but it should not be in debug build I think. We currently use release builds in the CI for performance reasons. We should have another flag that indicates whether the deterministic or non-deterministic versions should be used.
No idea either. I guess it is maybe worth finding out.

@caleridas might have something to say here as well.

caleridas · 2024-06-27T20:42:58Z

Long-term, any indeterminism should be rooted out from the compiler, and not just in debug builds. (I confess to being guilty to introducing indeterministic behavior to begin with, and it was a bad idea). iterating HashSet/HashMap should mostly be "forbidden" unless the iteration has no effect on output (e.g. just deleting from the map, or copying things somewhere else), or the result is brought into a deterministic order afterwards. The strategy that I have observed to work best in various projects to get rid of this problem is to "intentionally" introduce into hash maps and add validation tests that fail if multiple runs fail to produce bit-identical output.
Within limits, malloc can be made deterministic on the same target/machine (and libc configuration, and various other factors) for single threaded programs when disabling ASLR for a process, but that's not an overall strategy -- better to accept that anything that hashes on address "must not" be iterated over assuming any stability as an implementation rule. I think it ought to be possible to do a clang-tidy rule to find & flag places in the code that do this for manual inspection.
The other question is of increasing test coverage by fuzzing and intentionally producing a larger range of "conceptually valid output" subject to given graph ordering constraints. That should however be controlled (pseudo-)randomness: Set up a test using mersenne twister with a fixed seed and have any variation be determined by this.

haved · 2024-06-29T06:41:31Z

Aiming to remove non-determinism would be very nice, and making it a part of automatic testing sounds like a good plan long-term. I agree that it is a better solution to not rely on any ordering of HashSets keyed with addresses.

I have created discussion #529 to not discuss all of this in this PR.

Regarding non-determinism from malloc, I have for some reason gotten very consistent results even though ASLR is enabled. I have had logic bugs that caused every execution to give the wrong result when executed normally, but never give the wrong result when executed in gdb. This was not a memory bug, so the only possible difference was iteration order in sets of addresses.

Add andersen region-aware CI job

4b9e5d5

phate requested review from sjalander and haved June 26, 2024 05:21

phate enabled auto-merge (squash) June 26, 2024 05:22

phate disabled auto-merge June 26, 2024 06:06

haved approved these changes Jun 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add andersen region-aware CI job [AndersenRegionAware] #526

Add andersen region-aware CI job [AndersenRegionAware] #526

phate commented Jun 26, 2024

phate commented Jun 26, 2024 •

edited

Loading

haved left a comment

phate commented Jun 26, 2024

phate commented Jun 26, 2024

haved commented Jun 26, 2024

phate commented Jun 26, 2024

caleridas commented Jun 27, 2024

haved commented Jun 29, 2024 •

edited

Loading

Add andersen region-aware CI job [AndersenRegionAware] #526

Are you sure you want to change the base?

Add andersen region-aware CI job [AndersenRegionAware] #526

Conversation

phate commented Jun 26, 2024

phate commented Jun 26, 2024 • edited Loading

haved left a comment

Choose a reason for hiding this comment

phate commented Jun 26, 2024

phate commented Jun 26, 2024

haved commented Jun 26, 2024

phate commented Jun 26, 2024

caleridas commented Jun 27, 2024

haved commented Jun 29, 2024 • edited Loading

phate commented Jun 26, 2024 •

edited

Loading

haved commented Jun 29, 2024 •

edited

Loading