Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate stack guard implementation(s) #8666

Open
mcspr opened this issue Sep 2, 2022 · 10 comments
Open

Evaluate stack guard implementation(s) #8666

mcspr opened this issue Sep 2, 2022 · 10 comments

Comments

@mcspr
Copy link
Collaborator

mcspr commented Sep 2, 2022

Right now we have at least three ways to ensure stack-smash does not happen.

Could GCC implementation supplement both StackThunk and CONT? Or, replace it? (see __attribute__ above)
Should CONT checks randomize its guard value to separate stack contents themselves from structure guard members?
Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value?
w

@earlephilhower
Copy link
Collaborator

The CONT and StackThunk guards and -fstack-protector are protecting against two different problems, I believe.

AIUI stack-protector checks for things like writing a local stack array beyond bounds to, say, inject a different return address or a higher-level call's locals. It has no idea what the real total stack limits are, only where the current function call's place on the stack it.

The CONT and StackThunk guards check for total stack overflow, after the fact. Too many levels of calls or too large locals. (i.e. every funciton was well behaved, but they needed, say, 5K of stack)

GCC never knows the real total stack limits, that's more of the OS's job to handle.

One idea toyed with before (there should be a PR or discussion w/a code sample in fact) was to set the HW data breakpoint to the last 16 (or other power-of-two) bytes of the stack. With that you would get the Cont/Thunk stack checker behavior, instantly.

A pie in the sky idea would be to adjust the GCC xtensa machine description to check any stack allocations at runtime against some well-known "current stack limits". Pretty massive performance penalty and it would not, obviously, do anything with the blob code.

@TD-er
Copy link
Contributor

TD-er commented Sep 2, 2022

Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value?

Just curious.
What would you expect the code should do then?
I get the idea of preventing to overwrite stuff outside some range instead of detecting already corrupted data.
But what would be the alternative then? It is not like the compiler can detect all runtime situations where entering a function might cause a stack overflow, so then it will be a runtime thing. But then what? Throw an exception which in almost any project equals a crash anyway.

Edit:
N.B. we already have some checks somewhere I guess, since I hit them every now and then when writing a proper new bug, where addressing something outside an allocated array causes a crash (load prohibited kind of error) so why not using the same for the stack implementation? Or is it already implemented as such?

@mcspr
Copy link
Collaborator Author

mcspr commented Sep 2, 2022

Every variant of stack smashing protection seem to utilize a canary value that is either nearby the stack memory or in case of protector just added as an additional stack variable that is created on stack immediately as function is called. I think I misunderstood the way boundaries are set, so it does not seem that CONT could interact with the protector code that well just by checking that nearby memory is untouched :/

The repainting still seems suspect, don't either guards or paint value need to change so we don't mistake structure with the array itself when both values are equal?

The mentioned code that does HW breakpoint is #5158
(not sure which one of those gists is the correct one, and what are the downsides)
Xtensa arch docs say both reads and writes are monitored, and it could work with up to a 64bytes range.

Machine-definition way seems to be related to -fstack-check as yet another instrumentation option
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fstack-check
https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html
https://www.adacore.com/uploads/techPapers/Stack_Analysis.pdf
Which splits into either machine specific way or a library-implemented one.

Option adds extra code at the start of the func and checks whether sp stays within bounds. ref. In GCC code I see probe_stack_range has some calls to stack_check_libfunc, which in case of ADA is _gnat_stack_check. But it does not look like it is something overridable, like stack-protector func is
https://github.com/gcc-mirror/gcc/search?q=probe_stack_range

(or any other GCC instrumentation does not seem to help; even patchable function entry will inject nops just before func sets up any stack)

@jjsuwa-sys3175
Copy link
Contributor

  • First, any memory guard methods other than breakpoint only affect artifacts compiled with them enabled.
  • Second, breakpoint is powerless for memory access violations other than its specified "the point".
  • Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

@mcspr
Copy link
Collaborator Author

mcspr commented Sep 3, 2022

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least. Like in the mentioned error sanitizer example or the way stack-check does it. The point is not to recover, but to reduce the time from getting the error and noticing 'oops this is a stack overflow / smashing'

(which btw is also mis-named in StackThunk)

@jjsuwa-sys3175
Copy link
Contributor

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Be careful that memory regions can often be accessed discontinuously with stride of tens (or more) of bytes.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least.

And that is all, I guess. It is much better to be interrupted by the stack guard in a potentially problematic place than by the WDT in a completely unrelated one.

@TD-er
Copy link
Contributor

TD-er commented Sep 3, 2022

Just one thing you might want to look into, just to be sure about whether the current "stack painting" is working as it should.
The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call.
There are lots of issues about this to be found online.
A number of bugs (or attack vectors) in various crypto related code often can be related to such issues where memory isn't actually cleared even though the code would suggest it should be.

@jjsuwa-sys3175
Copy link
Contributor

The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call.
There are lots of issues about this to be found online.

Hmm, didn't GCC have a counterpart to SecureZeroMemory() in VC++?
Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

@jjsuwa-sys3175
Copy link
Contributor

jjsuwa-sys3175 commented Sep 3, 2022

Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

In GCC, we can use extended asm to force dependency injection:

__builtin_memset (array, 0, sizeof (array));
__asm__ volatile (""::"m"(array));

Now GCC recognizes that all the elements of array after __builtin_memset() are referenced in the no-operation but volatile (cannot be optimized away) asm statement.

@mcspr
Copy link
Collaborator Author

mcspr commented Sep 6, 2022

I don't think we have that issue with 'repaints' themselves? It is a separate func call, plus code does not seem to optimize to memset and simply uses loop.

Notably, we have explicit_bzero in libc for memset-0 case
https://github.com/earlephilhower/newlib-xtensa/blob/xtensa-4_0_0-lock-arduino/newlib/libc/string/explicit_bzero.c

Comparing some code, FreeBSD does not employ the any barriers
https://www.leidinger.net/FreeBSD/dox/libkern/html/d5/da7/explicit__bzero_8c_source.html
(edit: and there's a discussion - https://reviews.freebsd.org/D4964 - as to why not having it may cause LTO eliminate the call)

glibc source does
https://code.woboq.org/userspace/glibc/string/explicit_bzero.c.html

Will update with the stack errors reporting, so cont and bssl attach our postmortem stack pointer error via this, plus rename to separate smash from overflow

s_stacksmash_addr = (uint32_t)__builtin_return_address(0);

else if (rst_info.reason == REASON_USER_STACK_SMASH) {
ets_printf_P(PSTR("\nStack overflow detected.\n"));

btw I noticed stack watchpoint / breakpoint is also implemented in esp-idf for tasks stack. Each time they switch, this check is set up
https://github.com/espressif/esp-idf/blob/fde4afc67a2035a1f461dc511cd85dde1ef90368/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c#L368-L373
https://github.com/espressif/esp-idf/search?q=vPortSetStackWatchpoint
But, will still need to think on how it is going to be enabled, though. Maybe something for later.

mcspr added a commit to mcspr/esp8266-Arduino that referenced this issue Sep 9, 2022
Wire everything that relies on stack smashing detection to call
`__stack_chk_fail()` (aka what libssp / ssp / stack-protector uses)
Expose it in our debugging header

Rename overflow -> smashing, as these are different things we are trying
to detect (meaning, that we check for things writing there, not some
kind of `alloca` issue or the way `-fstack-check` would have worked)
ref. esp8266#8666

`-fstack-protector` continues to work as it always did
CONT replaces `abort()`, also moves its check to the loop wrapper to
avoid dumping otherwise useless SYS context memory
StackThunk replaces a similar `abort()` call
mcspr added a commit that referenced this issue Oct 31, 2022
Wire everything that relies on stack smashing detection to call
`__stack_chk_fail()` (aka what libssp / ssp / stack-protector uses)
Expose it in our debugging header

Rename overflow -> smashing, as these are different things we are trying
to detect (meaning, that we check for things writing there, not some
kind of `alloca` issue or the way `-fstack-check` would have worked)
ref. #8666

`-fstack-protector` continues to work as it always did
CONT replaces `abort()`, also moves its check to the loop wrapper to
avoid dumping otherwise useless SYS context memory
StackThunk replaces a similar `abort()` call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants