Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Improve performance of literal de-duplication #1179

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

bclement-ocp
Copy link
Collaborator

This patch improves the different code paths that are used to ensure literal uniqueness. There are a couple of performance bugs here that are fixed by this patch:

  • In Rel_utils, we use a set of Xliteral views, which are converted to actual Xliterals in the comparison function. This means that we hash the literal views repeatedly (each time a comparison is performed).

  • Everywhere, we use sets of literals as temporary caches for uniqueness, where we could use hash maps (with better access times) instead.

  • In Adt_rel, we de-deduplicate input literals, but this is (almost) already done in the Ccx module. Almost, because the Ccx module allows multiple literals as long as they originally come from different expressions in the input problem -- which should occur rarely enough that we don't need the double deduplication.

On a development branch and a specific problem that was creating a lot of literals, the first change was a 3x speedup (from 36s to 12s), and the two second changes combined were an additional 25% speedup (from 12s to 9s).

On our internal dataset, the full patch is a speedup of about 3-4%.

This patch improves the different code paths that are used to ensure
literal uniqueness. There are a couple of performance bugs here that are
fixed by this patch:

 - In Rel_utils, we use a set of Xliteral *views*, which are converted
   to actual Xliterals in the comparison function. This means that we
   hash the literal views repeatedly (each time a comparison is
   performed).

 - Everywhere, we use sets of literals as temporary caches for
   uniqueness, where we could use hash maps (with better access times)
   instead.

 - In Adt_rel, we de-deduplicate input literals, but this is (almost)
   already done in the Ccx module. Almost, because the Ccx module allows
   multiple literals as long as they originally come from different
   expressions in the input problem -- which should occur rarely enough
   that we don't need the double deduplication.

On a development branch and a specific problem that was creating a lot
of literals, the first change was a 3x speedup (from 36s to 12s), and
the two second changes combined were an additional 25% speedup (from 12s
to 9s).

On our internal dataset, the full patch is a speedup of about 3-4%.
@bclement-ocp
Copy link
Collaborator Author

On our internal dataset, the full patch is a speedup of about 3-4%.

And also a tiny performance boost with a +13-5 delta (absolute +8 problems solved) due to less timeouts.
(The -5 can be explained by the fact that we Hashtbl iteration and Set iterations are generally not going to follow the same order, and so we process literals in a slightly different order)

@Halbaroth Halbaroth merged commit a3c0902 into OCamlPro:next Jul 26, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants