Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline Assembly for detecting CPU features on Arm #143

Open
ashvardanian opened this issue Apr 18, 2024 · 1 comment
Open

Inline Assembly for detecting CPU features on Arm #143

ashvardanian opened this issue Apr 18, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@ashvardanian
Copy link
Owner

Following the discussion in #137, it would be great to reach some uniformity in feature detection on x86 and Arm. On the latter, we can't yet use SVE, and only differentiate "extended NEON" and serial code. Assuming broader adoption among Arm devices, we need to isolate the features we use in different sub-generations of Arm v8, and consider bit-level operations from SVE.

https://github.com/ashvardanian/SimSIMD/blob/18d17686124ddebd9fe55eee56b2e0273a613d4b/include/simsimd/simsimd.h#L208-L228

In other libraries, like SimSIMD, I currently use the Linux API to check for those capabilities. But that is less portable than using inline Assembly, and we may need to detect those features on the upcoming Apple M-series chips.

@ashvardanian ashvardanian added the help wanted Extra attention is needed label Aug 4, 2024
@ashvardanian
Copy link
Owner Author

There is now a better reference for how to implement this, also coming from SimSIMD:

    // Read CPUID registers directly
    unsigned long id_aa64isar0_el1 = 0, id_aa64isar1_el1 = 0, id_aa64pfr0_el1 = 0, id_aa64zfr0_el1 = 0;

    // Now let's unpack the status flags from ID_AA64ISAR0_EL1
    // https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Registers/ID-AA64ISAR0-EL1--AArch64-Instruction-Set-Attribute-Register-0?lang=en
    __asm__ __volatile__("mrs %0, ID_AA64ISAR0_EL1" : "=r"(id_aa64isar0_el1));
    // DP, bits [47:44] of ID_AA64ISAR0_EL1
    unsigned supports_integer_dot_products = ((id_aa64isar0_el1 >> 44) & 0xF) >= 1;
    // Now let's unpack the status flags from ID_AA64ISAR1_EL1
    // https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Registers/ID-AA64ISAR1-EL1--AArch64-Instruction-Set-Attribute-Register-1?lang=en
    __asm__ __volatile__("mrs %0, ID_AA64ISAR1_EL1" : "=r"(id_aa64isar1_el1));
    // I8MM, bits [55:52] of ID_AA64ISAR1_EL1
    unsigned supports_i8mm = ((id_aa64isar1_el1 >> 52) & 0xF) >= 1;
    // BF16, bits [47:44] of ID_AA64ISAR1_EL1
    unsigned supports_bf16 = ((id_aa64isar1_el1 >> 44) & 0xF) >= 1;

    // Now let's unpack the status flags from ID_AA64PFR0_EL1
    // https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Registers/ID-AA64PFR0-EL1--AArch64-Processor-Feature-Register-0?lang=en
    __asm__ __volatile__("mrs %0, ID_AA64PFR0_EL1" : "=r"(id_aa64pfr0_el1));
    // SVE, bits [35:32] of ID_AA64PFR0_EL1
    unsigned supports_sve = ((id_aa64pfr0_el1 >> 32) & 0xF) >= 1;
    // AdvSIMD, bits [23:20] of ID_AA64PFR0_EL1 can be used to check for `fp16` support
    //  - 0b0000: integers, single, double precision arithmetic
    //  - 0b0001: includes support for half-precision floating-point arithmetic
    unsigned supports_fp16 = ((id_aa64pfr0_el1 >> 20) & 0xF) == 1;

    // Now let's unpack the status flags from ID_AA64ZFR0_EL1
    // https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Registers/ID-AA64ZFR0-EL1--SVE-Feature-ID-Register-0?lang=en
    if (supports_sve)
        __asm__ __volatile__("mrs %0, ID_AA64ZFR0_EL1" : "=r"(id_aa64zfr0_el1));
    // I8MM, bits [47:44] of ID_AA64ZFR0_EL1
    unsigned supports_sve_i8mm = ((id_aa64zfr0_el1 >> 44) & 0xF) >= 1;
    // BF16, bits [23:20] of ID_AA64ZFR0_EL1
    unsigned supports_sve_bf16 = ((id_aa64zfr0_el1 >> 20) & 0xF) >= 1;
    // SVEver, bits [3:0] can be used to check for capability levels:
    //  - 0b0000: SVE is implemented
    //  - 0b0001: SVE2 is implemented
    //  - 0b0010: SVE2.1 is implemented
    // This value must match the existing indicator obtained from ID_AA64PFR0_EL1:
    //    unsigned supports_sve = ((id_aa64zfr0_el1) & 0xF) >= 1;
    //    unsigned supports_sve2 = ((id_aa64zfr0_el1) & 0xF) >= 2;
    unsigned supports_neon = 1; // NEON is always supported

    return (simsimd_capability_t)(                                                                    //
        (simsimd_cap_neon_k * (supports_neon)) |                                                      //
        (simsimd_cap_neon_f16_k * (supports_neon && supports_fp16)) |                                 //
        (simsimd_cap_neon_bf16_k * (supports_neon && supports_bf16)) |                                //
        (simsimd_cap_neon_i8_k * (supports_neon && supports_i8mm && supports_integer_dot_products)) | //
        (simsimd_cap_sve_k * (supports_sve)) |                                                        //
        (simsimd_cap_sve_f16_k * (supports_sve && supports_fp16)) |                                   //
        (simsimd_cap_sve_bf16_k * (supports_sve && supports_sve_bf16)) |                              //
        (simsimd_cap_sve_i8_k * (supports_sve && supports_sve_i8mm)) |                                //
        (simsimd_cap_serial_k));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant