Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<cmath>: Use Clang builtins for classification/comparison functions #4612

Merged
merged 4 commits into from
Apr 26, 2024

Conversation

StephanTLavavej
Copy link
Member

Fixes #4609.

This is limited to functions that the UCRT has templated, where our non-template overloads will be preferred. For example, this PR omits fpclassify() because the UCRT overloads it for float, double, and long double (to call _fdtest(), _dtest(), and _ldtest(), respectively), so even though Clang provides a constexpr-compatible __builtin_fpclassify, the STL can't provide overloads without introducing ambiguity.

Note that we define our overloads in the global namespace and then drag them into namespace std. Therefore, as long as users include <cmath>, their global namespace calls will benefit from this improvement.

Also note an additional limitation: calls to the binary comparison functions with mixed argument types will select the UCRT templates as being better matches.

In the medium future, it looks like we're going to need a major UCRT overhaul of <math.h> to implement all of the constexpr demanded by C++23/26 - at that time we can also untangle the additional overloads that the STL is providing (for both integral and floating-point), and teach the UCRT to use MSVC and Clang builtins when possible.

The MSVC compiler back-end should probably provide similar builtins - I am a lazy kitty 🐈 💤 so I haven't asked them, but again the upcoming constexpr overhaul work will likely result in such machinery becoming available.

C:\Temp>type meow.cpp
#include <cmath>

bool is_this_finite(double dbl) {
    return std::isfinite(dbl);
}
C:\Temp>clang-cl /EHsc /nologo /W4 /MT /O2 /c /FAs meow.cpp
Click to expand codegen before:
C:\Temp>type meow.asm
        .text
        .def    @feat.00;
        .scl    3;
        .type   0;
        .endef
        .globl  @feat.00
.set @feat.00, 0
        .intel_syntax noprefix
        .file   "meow.cpp"
        .def    "?is_this_finite@@YA_NN@Z";
        .scl    2;
        .type   32;
        .endef
        .section        .text,"xr",one_only,"?is_this_finite@@YA_NN@Z"
        .globl  "?is_this_finite@@YA_NN@Z"      # -- Begin function ?is_this_finite@@YA_NN@Z
        .p2align        4, 0x90
"?is_this_finite@@YA_NN@Z":             # @"?is_this_finite@@YA_NN@Z"
.seh_proc "?is_this_finite@@YA_NN@Z"
# %bb.0:
        push    rbx
        .seh_pushreg rbx
        sub     rsp, 48
        .seh_stackalloc 48
        .seh_endprologue
        mov     rax, qword ptr [rip + __security_cookie]
        xor     rax, rsp
        mov     qword ptr [rsp + 40], rax
        movsd   qword ptr [rsp + 32], xmm0
        lea     rcx, [rsp + 32]
        call    _dtest
        test    ax, ax
        setle   bl
        mov     rcx, qword ptr [rsp + 40]
        xor     rcx, rsp
        call    __security_check_cookie
        mov     eax, ebx
        add     rsp, 48
        pop     rbx
        ret
        .seh_endproc
                                        # -- End function
        .section        .drectve,"yni"
        .ascii  " /DEFAULTLIB:libcmt.lib"
        .ascii  " /DEFAULTLIB:oldnames.lib"
        .ascii  " /FAILIFMISMATCH:\"_MSC_VER=1900\""
        .ascii  " /FAILIFMISMATCH:\"_ITERATOR_DEBUG_LEVEL=0\""
        .ascii  " /FAILIFMISMATCH:\"RuntimeLibrary=MT_StaticRelease\""
        .ascii  " /DEFAULTLIB:libcpmt.lib"
        .addrsig
        .globl  _fltused
Click to expand codegen after:
C:\Temp>type meow.asm
        .text
        .def    @feat.00;
        .scl    3;
        .type   0;
        .endef
        .globl  @feat.00
.set @feat.00, 0
        .intel_syntax noprefix
        .file   "meow.cpp"
        .def    "?is_this_finite@@YA_NN@Z";
        .scl    2;
        .type   32;
        .endef
        .section        .text,"xr",one_only,"?is_this_finite@@YA_NN@Z"
        .globl  "?is_this_finite@@YA_NN@Z"      # -- Begin function ?is_this_finite@@YA_NN@Z
        .p2align        4, 0x90
"?is_this_finite@@YA_NN@Z":             # @"?is_this_finite@@YA_NN@Z"
# %bb.0:
        movq    rax, xmm0
        movabs  rcx, 9223372036854775807
        and     rcx, rax
        movabs  rax, 9218868437227405312
        cmp     rcx, rax
        setl    al
        ret
                                        # -- End function
        .section        .drectve,"yni"
        .ascii  " /DEFAULTLIB:libcmt.lib"
        .ascii  " /DEFAULTLIB:oldnames.lib"
        .ascii  " /FAILIFMISMATCH:\"_MSC_VER=1900\""
        .ascii  " /FAILIFMISMATCH:\"_ITERATOR_DEBUG_LEVEL=0\""
        .ascii  " /FAILIFMISMATCH:\"RuntimeLibrary=MT_StaticRelease\""
        .ascii  " /DEFAULTLIB:libcpmt.lib"
        .addrsig
        .globl  _fltused

@StephanTLavavej StephanTLavavej added the performance Must go faster label Apr 21, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner April 21, 2024 01:05
@frederick-vs-ja
Copy link
Contributor

Also note an additional limitation: calls to the binary comparison functions with mixed argument types will select the UCRT templates as being better matches.

I'm planning to create a follow-up PR addressing this. The changes would be somehow boilerplate though.

@StephanTLavavej StephanTLavavej self-assigned this Apr 26, 2024
@StephanTLavavej
Copy link
Member Author

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 9c04cbe into microsoft:main Apr 26, 2024
39 checks passed
@StephanTLavavej StephanTLavavej deleted the faster-math branch April 26, 2024 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

std::isfinite poor codegen with Microsoft STL + Clang
3 participants