Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGILL at vmovdqus #418

Open
jshuhnow opened this issue Nov 17, 2020 · 6 comments
Open

SIGILL at vmovdqus #418

jshuhnow opened this issue Nov 17, 2020 · 6 comments

Comments

@jshuhnow
Copy link
Contributor

Hello. I've been instrumenting quicly.
My HW spec goes:

  • Ubuntu 18.04 LTS
  • Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
  • CPU Flags (from lscpu): fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
  • gcc-/g++ -7.5.0

➜ gdb cli
(gdb) r
Starting program: {...}/quicly/cli 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
main (argc=1, argv=0x7fffffffe028) at {...}/quicly/src/cli.c:1089
1089	    ctx = quicly_spec_context;
(gdb) disas
Dump of assembler code for function main:
   0x000055555555cc40 <+0>:	push   %rbp
   0x000055555555cc41 <+1>:	mov    %rsp,%rbp
   0x000055555555cc44 <+4>:	push   %r15
   0x000055555555cc46 <+6>:	push   %r14
   0x000055555555cc48 <+8>:	push   %r13
   0x000055555555cc4a <+10>:	push   %r12
   0x000055555555cc4c <+12>:	mov    %rsi,%r13
   0x000055555555cc4f <+15>:	push   %rbx
   0x000055555555cc50 <+16>:	mov    $0x1,%esi
   0x000055555555cc55 <+21>:	mov    %edi,%r12d
   0x000055555555cc58 <+24>:	mov    $0x10,%edi
   0x000055555555cc5d <+29>:	sub    $0x5a8,%rsp
   0x000055555555cc64 <+36>:	movl   $0x0,-0x51c(%rbp)
   0x000055555555cc6e <+46>:	mov    %fs:0x28,%rax
   0x000055555555cc77 <+55>:	mov    %rax,-0x38(%rbp)
   0x000055555555cc7b <+59>:	xor    %eax,%eax
   0x000055555555cc7d <+61>:	callq  0x55555555bdb0 <calloc@plt>
=> 0x000055555555cc82 <+66>:	vmovdqu 0x23da16(%rip),%xmm0        # 0x55555579a6a0 <quicly_spec_context>
   0x000055555555cc8a <+74>:	lea    0x23e46f(%rip),%rdi        # 0x55555579b100 <tlsctx>
   0x000055555555cc91 <+81>:	mov    %rax,0x24f258(%rip)        # 0x5555557abef0 <reqs>
   0x000055555555cc98 <+88>:	lea    0x23e449(%rip),%rax        # 0x55555579b0e8 <stream_open>
   0x000055555555cc9f <+95>:	vmovaps %xmm0,0x24f019(%rip)        # 0x5555557abcc0 <ctx>
   0x000055555555cca7 <+103>:	mov    %rdi,0x24f012(%rip)        # 0x5555557abcc0 <ctx>
   0x000055555555ccae <+110>:	vmovdqu 0x23d9fa(%rip),%xmm0        # 0x55555579a6b0 <quicly_spec_context+16>

I tried switching the gcc and g++ version to 5, 6, and 8, as per my colleague's advice; and gdb has given me the same SIGILL at the vmovdqu on other lines.

Maybe this is a machine-specific issue, rather than quicly's; but I'd like to get some advice from the community for further endeavor.

@kazuho
Copy link
Member

kazuho commented Nov 17, 2020

Please specify -DWITH_FUSION=OFF when running CMake.

By default, CMakeLists.txt of quicly assumes that x86-64 has necessary capabilities to use the fusion crypto engine (i.e.-mavx2 -maes -mpclmul).

@jshuhnow
Copy link
Contributor Author

@kazuho Thank you so much. It solved the problem.

Although, it is noted that it did compile but failed "during the execution". There seems room for improving the build script, for this specific case. What do you say? Feel free to keep open or close the issue in either case.

@jshuhnow
Copy link
Contributor Author

jshuhnow commented Nov 17, 2020

Side note: it resulted in halving the performance, without considering the difference of the CPUs used. (Still huge)
For a single 1GB file throughput, 2.91Gbps (with i5-7600 @3.50GHz) -> 1.61Gbps (with the aforementioned spec @2.67Ghz)

Clarification: [Not to be addressed as an issue.]

@jshuhnow
Copy link
Contributor Author

I found setting the processor-specific flags is far more complicated matter than I initially thought.
Maybe we should not handle it on a cmake-level.
For those interested in solving this, please go read https://gitlab.kitware.com/cmake/cmake/-/issues/18445.

@jshuhnow
Copy link
Contributor Author

jshuhnow commented Nov 18, 2020

Still if one wants to check the support of the CPU flags at a runtime,
my code snippet might help:

# At L105
IF (WITH_FUSION)
    LIST(APPEND CLI_FILES deps/picotls/lib/fusion.c)

    # Reference - https://cmake.org/cmake/help/v3.18/command/cmake_host_system_information.html
    cmake_host_system_information(RESULT HAS_MMX QUERY "HAS_MMX")

    if(HAS_MMX)
        MESSAGE("HAS_MMX")
        set(CLI_COMPILE_FLAGS "-mavx2 ${CLI_COMPILE_FLAGS}")
    endif()
    
ENDIF ()

But this may be not a judicious way in the scope of the entire project as mentioned in the previous comment and the linked discussion.

@jshuhnow
Copy link
Contributor Author

jshuhnow commented Nov 19, 2020

I've put many thoughts into this issue last night.

First,
The current way of checking local parameters

IF ((CMAKE_SIZEOF_VOID_P EQUAL 8) AND
    (CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64") OR
     (CMAKE_SYSTEM_PROCESSOR STREQUAL "amd64") OR
     (CMAKE_SYSTEM_PROCESSOR STREQUAL "AMD64"))
    SET(WITH_FUSION_DEFAULT "ON")
ELSE ()
    SET(WITH_FUSION_DEFAULT "OFF")
ENDIF ()
OPTION(WITH_FUSION "whether or not to use the Fusion AES-GCM engine in the cli binary" ${WITH_FUSION_DEFAULT})

is, as shown above, not 100% rigorous. This can be improved.

Second,
A proper way of abstraction is, to catch the two birds, local- and cross- compilation, we should stream CPU flags to the the cmake, rather than the current way. In this case, it's better to turn off WITH_FUSION by default and let those with specific needs to supply appropriate flags.

So in code-level, there can be two ways.
To stick to local-compilation only,

IF (CHECK_ALL_LOCAL_ENV_OPTIONS)
    SET(WITH_FUSION_DEFAULT "ON")
ELSE ()
    SET(WITH_FUSION_DEFAULT "OFF")
ENDIF()

...

To embrace cross-compilation,

(erase the whole, thus WITH_FUSION is not set by default)

and move the abstraction to the CI/CD pipeline.

@kazuho what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants