perf: faster `Nat.testBit` #4188

FR-vdash-bot · 2024-05-16T04:45:59Z

1 &&& n is faster than n &&& 1 for big n.

Maybe we could have a much faster C implementation in the future.

leanprover-community-mathlib4-bot · 2024-05-16T06:50:42Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase f74980ccee82ca2abdae65dcbc5571d4640ed076 --onto 3035d2f8f689b52963f49b2414414913ca296953. (2024-05-16 06:50:42)
❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase f74980ccee82ca2abdae65dcbc5571d4640ed076 --onto 3de60bb1f63efe9bb56380f911f86980b9f3332c. (2024-05-21 14:29:09)

FR-vdash-bot · 2024-05-16T06:51:34Z

awaiting-review

src/Init/Data/Nat/Bitwise/Lemmas.lean

alexkeizer · 2024-05-16T12:30:19Z

src/Init/Data/Nat/Bitwise/Basic.lean

+def testBit (m n : Nat) : Bool :=
+  -- `1 &&& n` is faster than `n &&& 1` for big `n`. This may change in the future.
+  1 &&& (m >>> n) != 0


Do you have numbers to confirm this claim?
Is it faster in kernel reduction or for compiled code? If it really is faster, doesn't the compiler just optimize it?

def test (x : Nat) : Nat := 1 &&& x def test' (x : Nat) : Nat := x &&& 1 def run_test (n : Nat) (x : Nat) : IO Unit := do let mut t := 0 for _ in [0 : n] do t := test x def run_test' (n : Nat) (x : Nat) : IO Unit := do let mut t := 0 for _ in [0 : n] do t := test' x def n := 10^5 def x := 11^1000000 #eval timeit "1 &&& x" (run_test n x) -- 41ms #eval timeit "x &&& 1" (run_test' n x) -- 1.5s

That's a surprisingly big difference, almost suspiciously so. I only know enough about micro-benchmarking to know results can sometimes be misleading, so I tried playing with various variations (e.g., sprinkling no_inline) but everything points towards 1 &&& x indeed being significantly faster.

Admittedly, In my original comment I was only thinking about "small" (up-to 64 bit) numbers, is this x &&& 1 vs 1 &&& x a known behaviour of gmp (which I assume what is going on here, with the big numbers given in your testcase)?

alexkeizer

I'm not knowledgable enough about micro-benchmarks to say whether the given example doesn't have any weird inlining or other behaviour that might interfere with the numbers, but the difference does seem so significant that it makes sense to switch over.

Maybe we could have a much faster C implementation in the future.

For 64 bit or lower, bitwise and and shifting already have special cased code generation. I guess for larger numbers you could optimize indexing much more by using knowledge of the gmp representation, but we'd need some good evidence that the extra maintenance burden is worth the benefit. In any case, that's not a discussion relevant to this PR, so maybe just take that comment out of the PR description.

Otherwise, I've got a minor comment about the proof style but the rest LGTM

src/Init/Data/Nat/Bitwise/Lemmas.lean

src/Init/Data/Nat/Bitwise/Basic.lean

perf: faster Nat.testBit

1092097

Maybe we could have a much faster C implementation in the future.

FR-vdash-bot requested a review from semorrison as a code owner May 16, 2024 04:45

FR-vdash-bot added 2 commits May 16, 2024 14:17

fix

a9a9564

fix

860fcee

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label May 16, 2024

github-actions bot added the awaiting-review Waiting for someone to review the PR label May 16, 2024

alexkeizer reviewed May 16, 2024

View reviewed changes

FR-vdash-bot mentioned this pull request May 18, 2024

chore: upstream Nat.binaryRec #3756

Open

semorrison requested a review from alexkeizer May 21, 2024 06:13

Update Lemmas.lean

2da403e

alexkeizer reviewed May 21, 2024

View reviewed changes

src/Init/Data/Nat/Bitwise/Lemmas.lean Outdated Show resolved Hide resolved

src/Init/Data/Nat/Bitwise/Basic.lean Outdated Show resolved Hide resolved

FR-vdash-bot added 2 commits May 21, 2024 23:16

docs

2cc3b92

Update Lemmas.lean

56ed4b3

semorrison removed the awaiting-review Waiting for someone to review the PR label May 23, 2024

semorrison added this pull request to the merge queue May 23, 2024

Merged via the queue into leanprover:master with commit 93758cc May 23, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: faster `Nat.testBit` #4188

perf: faster `Nat.testBit` #4188

FR-vdash-bot commented May 16, 2024 •

edited

Loading

leanprover-community-mathlib4-bot commented May 16, 2024 •

edited

Loading

FR-vdash-bot commented May 16, 2024

alexkeizer May 16, 2024

FR-vdash-bot May 16, 2024

alexkeizer May 21, 2024

alexkeizer left a comment •

edited

Loading

perf: faster Nat.testBit #4188

perf: faster Nat.testBit #4188

Conversation

FR-vdash-bot commented May 16, 2024 • edited Loading

leanprover-community-mathlib4-bot commented May 16, 2024 • edited Loading

FR-vdash-bot commented May 16, 2024

alexkeizer May 16, 2024

Choose a reason for hiding this comment

FR-vdash-bot May 16, 2024

Choose a reason for hiding this comment

alexkeizer May 21, 2024

Choose a reason for hiding this comment

alexkeizer left a comment • edited Loading

Choose a reason for hiding this comment

perf: faster `Nat.testBit` #4188

perf: faster `Nat.testBit` #4188

FR-vdash-bot commented May 16, 2024 •

edited

Loading

leanprover-community-mathlib4-bot commented May 16, 2024 •

edited

Loading

alexkeizer left a comment •

edited

Loading