Evaluate stack guard implementation(s) #8666

mcspr · 2022-09-02T05:43:39Z

Right now we have at least three ways to ensure stack-smash does not happen.

CONT guard check, with two u32 at both start and the end our struct. Additionally, we implement stack size check and fill our local array with the same value. ref. https://gitter.im/esp8266/Arduino?at=630eddaff4d7a323dead0fcf and https://github.com/esp8266/Arduino/blob/master/cores/esp8266/cont_util.cpp cont_... functions related to checks and 're-painting'
-fstack-protector that instruments functions (also __attribute__((stack_protect)) to manually inject such checks) with guard variables and a special __stack_chk_{guard,fail()}; ref. https://gcc.gnu.org/onlinedocs/gccint/Stack-Smashing-Protection.html and https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_postmortem.cpp implementations
StackThunk also implements guards for its stack through a similar mechanisms (ref. stack_thunk_fatal_overflow() and main implementation in https://github.com/esp8266/Arduino/blob/master/cores/esp8266/StackThunk.h)

Could GCC implementation supplement both StackThunk and CONT? Or, replace it? (see __attribute__ above)
Should CONT checks randomize its guard value to separate stack contents themselves from structure guard members?
Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value?
w

The text was updated successfully, but these errors were encountered:

earlephilhower · 2022-09-02T08:47:24Z

The CONT and StackThunk guards and -fstack-protector are protecting against two different problems, I believe.

AIUI stack-protector checks for things like writing a local stack array beyond bounds to, say, inject a different return address or a higher-level call's locals. It has no idea what the real total stack limits are, only where the current function call's place on the stack it.

The CONT and StackThunk guards check for total stack overflow, after the fact. Too many levels of calls or too large locals. (i.e. every funciton was well behaved, but they needed, say, 5K of stack)

GCC never knows the real total stack limits, that's more of the OS's job to handle.

One idea toyed with before (there should be a PR or discussion w/a code sample in fact) was to set the HW data breakpoint to the last 16 (or other power-of-two) bytes of the stack. With that you would get the Cont/Thunk stack checker behavior, instantly.

A pie in the sky idea would be to adjust the GCC xtensa machine description to check any stack allocations at runtime against some well-known "current stack limits". Pretty massive performance penalty and it would not, obviously, do anything with the blob code.

TD-er · 2022-09-02T08:48:09Z

Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value?

Just curious.
What would you expect the code should do then?
I get the idea of preventing to overwrite stuff outside some range instead of detecting already corrupted data.
But what would be the alternative then? It is not like the compiler can detect all runtime situations where entering a function might cause a stack overflow, so then it will be a runtime thing. But then what? Throw an exception which in almost any project equals a crash anyway.

Edit:
N.B. we already have some checks somewhere I guess, since I hit them every now and then when writing a proper new bug, where addressing something outside an allocated array causes a crash (load prohibited kind of error) so why not using the same for the stack implementation? Or is it already implemented as such?

mcspr · 2022-09-02T11:15:02Z

Every variant of stack smashing protection seem to utilize a canary value that is either nearby the stack memory or in case of protector just added as an additional stack variable that is created on stack immediately as function is called. I think I misunderstood the way boundaries are set, so it does not seem that CONT could interact with the protector code that well just by checking that nearby memory is untouched :/

The repainting still seems suspect, don't either guards or paint value need to change so we don't mistake structure with the array itself when both values are equal?

The mentioned code that does HW breakpoint is #5158
(not sure which one of those gists is the correct one, and what are the downsides)
Xtensa arch docs say both reads and writes are monitored, and it could work with up to a 64bytes range.

Machine-definition way seems to be related to -fstack-check as yet another instrumentation option
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fstack-check
https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html
https://www.adacore.com/uploads/techPapers/Stack_Analysis.pdf
Which splits into either machine specific way or a library-implemented one.

Option adds extra code at the start of the func and checks whether sp stays within bounds. ref. In GCC code I see probe_stack_range has some calls to stack_check_libfunc, which in case of ADA is _gnat_stack_check. But it does not look like it is something overridable, like stack-protector func is
https://github.com/gcc-mirror/gcc/search?q=probe_stack_range

(or any other GCC instrumentation does not seem to help; even patchable function entry will inject nops just before func sets up any stack)

jjsuwa-sys3175 · 2022-09-03T13:14:42Z

First, any memory guard methods other than breakpoint only affect artifacts compiled with them enabled.
Second, breakpoint is powerless for memory access violations other than its specified "the point".
Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

mcspr · 2022-09-03T13:54:20Z

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least. Like in the mentioned error sanitizer example or the way stack-check does it. The point is not to recover, but to reduce the time from getting the error and noticing 'oops this is a stack overflow / smashing'

(which btw is also mis-named in StackThunk)

jjsuwa-sys3175 · 2022-09-03T14:15:03Z

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Be careful that memory regions can often be accessed discontinuously with stride of tens (or more) of bytes.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least.

And that is all, I guess. It is much better to be interrupted by the stack guard in a potentially problematic place than by the WDT in a completely unrelated one.

TD-er · 2022-09-03T14:22:41Z

Just one thing you might want to look into, just to be sure about whether the current "stack painting" is working as it should.
The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call.
There are lots of issues about this to be found online.
A number of bugs (or attack vectors) in various crypto related code often can be related to such issues where memory isn't actually cleared even though the code would suggest it should be.

jjsuwa-sys3175 · 2022-09-03T14:48:03Z

The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call.
There are lots of issues about this to be found online.

Hmm, didn't GCC have a counterpart to SecureZeroMemory() in VC++?
Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

jjsuwa-sys3175 · 2022-09-03T15:29:14Z

Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

In GCC, we can use extended asm to force dependency injection:

__builtin_memset (array, 0, sizeof (array));
__asm__ volatile (""::"m"(array));

Now GCC recognizes that all the elements of array after __builtin_memset() are referenced in the no-operation but volatile (cannot be optimized away) asm statement.

mcspr · 2022-09-06T11:09:24Z

I don't think we have that issue with 'repaints' themselves? It is a separate func call, plus code does not seem to optimize to memset and simply uses loop.

Notably, we have explicit_bzero in libc for memset-0 case
https://github.com/earlephilhower/newlib-xtensa/blob/xtensa-4_0_0-lock-arduino/newlib/libc/string/explicit_bzero.c

Comparing some code, FreeBSD does not employ the any barriers
https://www.leidinger.net/FreeBSD/dox/libkern/html/d5/da7/explicit__bzero_8c_source.html
(edit: and there's a discussion - https://reviews.freebsd.org/D4964 - as to why not having it may cause LTO eliminate the call)

glibc source does
https://code.woboq.org/userspace/glibc/string/explicit_bzero.c.html

Will update with the stack errors reporting, so cont and bssl attach our postmortem stack pointer error via this, plus rename to separate smash from overflow

Arduino/cores/esp8266/core_esp8266_postmortem.cpp

Line 315 in 313b3c0

s_stacksmash_addr = (uint32_t)__builtin_return_address(0);

Arduino/cores/esp8266/core_esp8266_postmortem.cpp

Lines 156 to 157 in 313b3c0

    
           else if (rst_info.reason == REASON_USER_STACK_SMASH) { 
        
               ets_printf_P(PSTR("\nStack overflow detected.\n"));

btw I noticed stack watchpoint / breakpoint is also implemented in esp-idf for tasks stack. Each time they switch, this check is set up
https://github.com/espressif/esp-idf/blob/fde4afc67a2035a1f461dc511cd85dde1ef90368/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c#L368-L373
https://github.com/espressif/esp-idf/search?q=vPortSetStackWatchpoint
But, will still need to think on how it is going to be enabled, though. Maybe something for later.

Wire everything that relies on stack smashing detection to call `__stack_chk_fail()` (aka what libssp / ssp / stack-protector uses) Expose it in our debugging header Rename overflow -> smashing, as these are different things we are trying to detect (meaning, that we check for things writing there, not some kind of `alloca` issue or the way `-fstack-check` would have worked) ref. esp8266#8666 `-fstack-protector` continues to work as it always did CONT replaces `abort()`, also moves its check to the loop wrapper to avoid dumping otherwise useless SYS context memory StackThunk replaces a similar `abort()` call

Wire everything that relies on stack smashing detection to call `__stack_chk_fail()` (aka what libssp / ssp / stack-protector uses) Expose it in our debugging header Rename overflow -> smashing, as these are different things we are trying to detect (meaning, that we check for things writing there, not some kind of `alloca` issue or the way `-fstack-check` would have worked) ref. #8666 `-fstack-protector` continues to work as it always did CONT replaces `abort()`, also moves its check to the loop wrapper to avoid dumping otherwise useless SYS context memory StackThunk replaces a similar `abort()` call

mcspr added type: question component: core labels Sep 2, 2022

mcspr mentioned this issue Sep 9, 2022

Nice stack smashing postmortem message #8670

Merged

mcspr mentioned this issue Jan 19, 2024

ESP8266 Crashing During Prints randomly #9069

Open

6 tasks

mcspr mentioned this issue Feb 5, 2024

CONT stack overflow postmortem #9083

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate stack guard implementation(s) #8666

Evaluate stack guard implementation(s) #8666

mcspr commented Sep 2, 2022

earlephilhower commented Sep 2, 2022

TD-er commented Sep 2, 2022 •

edited

Loading

mcspr commented Sep 2, 2022

jjsuwa-sys3175 commented Sep 3, 2022

mcspr commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022

TD-er commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022 •

edited

Loading

mcspr commented Sep 6, 2022 •

edited

Loading

Evaluate stack guard implementation(s) #8666

Evaluate stack guard implementation(s) #8666

Comments

mcspr commented Sep 2, 2022

earlephilhower commented Sep 2, 2022

TD-er commented Sep 2, 2022 • edited Loading

mcspr commented Sep 2, 2022

jjsuwa-sys3175 commented Sep 3, 2022

mcspr commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022

TD-er commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022

jjsuwa-sys3175 commented Sep 3, 2022 • edited Loading

mcspr commented Sep 6, 2022 • edited Loading

TD-er commented Sep 2, 2022 •

edited

Loading

jjsuwa-sys3175 commented Sep 3, 2022 •

edited

Loading

mcspr commented Sep 6, 2022 •

edited

Loading