-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: faster Nat.testBit
#4188
perf: faster Nat.testBit
#4188
Conversation
Maybe we could have a much faster C implementation in the future.
Mathlib CI status (docs):
|
awaiting-review |
src/Init/Data/Nat/Bitwise/Basic.lean
Outdated
def testBit (m n : Nat) : Bool := | ||
-- `1 &&& n` is faster than `n &&& 1` for big `n`. This may change in the future. | ||
1 &&& (m >>> n) != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have numbers to confirm this claim?
Is it faster in kernel reduction or for compiled code? If it really is faster, doesn't the compiler just optimize it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test (x : Nat) : Nat := 1 &&& x
def test' (x : Nat) : Nat := x &&& 1
def run_test (n : Nat) (x : Nat) : IO Unit := do
let mut t := 0
for _ in [0 : n] do
t := test x
def run_test' (n : Nat) (x : Nat) : IO Unit := do
let mut t := 0
for _ in [0 : n] do
t := test' x
def n := 10^5
def x := 11^1000000
#eval timeit "1 &&& x" (run_test n x) -- 41ms
#eval timeit "x &&& 1" (run_test' n x) -- 1.5s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a surprisingly big difference, almost suspiciously so. I only know enough about micro-benchmarking to know results can sometimes be misleading, so I tried playing with various variations (e.g., sprinkling no_inline
) but everything points towards 1 &&& x
indeed being significantly faster.
Admittedly, In my original comment I was only thinking about "small" (up-to 64 bit) numbers, is this x &&& 1
vs 1 &&& x
a known behaviour of gmp (which I assume what is going on here, with the big numbers given in your testcase)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not knowledgable enough about micro-benchmarks to say whether the given example doesn't have any weird inlining or other behaviour that might interfere with the numbers, but the difference does seem so significant that it makes sense to switch over.
Maybe we could have a much faster C implementation in the future.
For 64 bit or lower, bitwise and
and shifting already have special cased code generation. I guess for larger numbers you could optimize indexing much more by using knowledge of the gmp representation, but we'd need some good evidence that the extra maintenance burden is worth the benefit. In any case, that's not a discussion relevant to this PR, so maybe just take that comment out of the PR description.
Otherwise, I've got a minor comment about the proof style but the rest LGTM
1 &&& n
is faster thann &&& 1
for bign
.