Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC cache configuration control #7060

Merged
merged 89 commits into from
Dec 6, 2020
Merged

Conversation

mhightower83
Copy link
Contributor

@mhightower83 mhightower83 commented Feb 4, 2020

MMU - Adjust the Ratio of ICACHE to IRAM

The Arduino IDE Tools menu has a new option, MMU.

Possible selections are:

  1. 32KB cache + 32KB IRAM (balanced)
  2. 16KB cache + 48KB IRAM (IRAM)
  3. 16KB cache + 48KB IRAM and 2nd Heap (shared)
    • The 2nd heap size will vary with free IRAM
    • Enables Non-32-Bit Access for IRAM by default. Required for umm_malloc library to work.
    • The 2nd heap is supported by the standard malloc APIs.
    • Heap selection is handled through a HeapSelect class.
  4. 16KB cache + 32KB IRAM + 16KB 2nd Heap (not shared)
    • Not managed by umm_malloc library
    • Non-32-Bit Access for IRAM must be enabled separately

New build defines and possible values.These are the results of the menu options described above:

#define balanced IRAM IRAM and Heap
shared
IRAM and Heap
not shared
MMU_IRAM_SIZE 0x8000 0xC000 0xC000 0x8000
MMU_ICACHE_SIZE 0x8000 0x4000 0x4000 0x4000
MMU_IRAM_HEAP -- -- defined, enables
umm_malloc
--
MMU_SEC_HEAP -- > _text_end ** > _text_end ** 0x40108000
MMU_SEC_HEAP_SIZE -- variable ** variable ** 0x4000

** These defines are to inline functions that calculate the values, based on unused code space.

IRAM, unlike DRAM, must be accessed as aligned full 32-bit words, no byte or short access.
I assume pgm_read macros would work; however, the store operation would remain an issue. ets_memcpy - appears to work well as long as byte count is rounded up to be evenly divided by 4.

Non-32-Bit Access

Pulled in work from earlephilhower's PR #6978, updated/refactored to handle writes to iRAM and more. This allows word and byte access to iRAM through a load/store exception handler. This would best be used, for infrequently accessed data. Expect it to be very slow, each character access will require a complete save and restore of all 16+ registers.

The Arduino IDE Tools menu has a new option, Non-32-Bit Access.

Selections are:

  • Use pgm_read macros for IRAM/PROGMEM
  • Byte/Word access to IRAM/PROGMEM (very slow)

To get a sense of how memory access time is effected, see examples MMU48K and irammem in ESP8266.

Miscellaneous

For calls to umm_malloc with interrupts disabled.

  • malloc will always allocate from DRAM when called with interrupts disabled.
  • realloc will fail if not built with USE_ISR_SAFE_EXC_WRAPPER defined.
    • The current build has USE_ISR_SAFE_EXC_WRAPPER defined in mmu_iram.h.
  • realloc requests that require malloc to complete, will allocate from DRAM.

ISR/Exception Handler Issue

The non-32-bit exception handler is called by a "C" wrapper function in ROM. This ROM function enables interrupts before calling our registered handler. Defining USE_ISR_SAFE_EXC_WRAPPER in mmu_iram.h will install a replacement that does not enable interrupts (now default). The effects on Network performance are unknown.

To keep ISR execution time with interrupts disabled at a minimum, avoid the use of IRAM from ISRs. Especially the use of non-32-bit read/writes on IRAM.

How to Select Heap

The MMU selection 16KB cache + 48KB IRAM and 2nd Heap (shared) allows you to use the standard heap API function calls (malloc, calloc, free, ... ). to allocate memory from DRAM or IRAM. The selection can be made by instantiating the class HeapSelectIram or HeapSelectDram.The usage is similar to that of the InterruptLock class. The default/initial heap source is DRAM. The class is in umm_malloc/umm_malloc.h

...
    char *bufferDram;
    bufferDram = (char *)malloc(33);
    char *bufferIram;
    {
        HeapSelectIram ephemeral;
        bufferIram = (char *)malloc(33);
    }
...
    free(bufferIram);
    free(bufferDram);
...

Low level functions for selecting a heap. These are used by the above Classes:

  • umm_get_current_heap_id()
  • umm_set_heap_by_id( ID value )
  • Possible ID values
    • UMM_HEAP_DRAM
    • UMM_HEAP_IRAM
    • UMM_HEAP_EXTERNAL (code present in umm_malloc only, not enabled)

Also, APIs added from earlephilhower's PR #6978 are:

  • ESP.setIramHeap() Pushes current heap onto a stack and sets IRAM heap.
  • ESP.setDramHeap() Pushes current heap onto a stack and sets DRAM heap.
  • ESP.resetHeap() Restores previously pushed heap.

Updated to reflect current features in the PR.

Expaned boards.txt.py to allow new MMU options and create revised .ld's
Updated eboot to pass 48K IRAM segments.
Added Cache_Read_Enable intercept to modify call for 16K ICACHE
Update platform.txt to pass new mmu options through to compiler and linker preprocessor.
Added quick example: esp8266/MMU48K
Added MMU_ qualifier to new defines.
Moved changes into their own file.
Don't know how to fix platformio issue.
Updated tools/sizes.py to report correct IRAM size and indicate ICACHE size.
Merged in earlephilhower's work on unaligned exception. Refactored and added
support for store operations and changed the name to be more closely aligned
with its function. Improved crash reporting path.
@earlephilhower
Copy link
Collaborator

Very cool, and probably of much more general use than my virtual memory setup (although being able to malloc an 8MB block was kind of neat...). I heard the impact to performance was minimal in your tests which is even more amazing.

One thing I see you didn't pull in is the changes I did to make the memory accessible. When using the non32b exception handler, UMM can manage the memory as just another heap, allowing users to malloc() and new() with impunity. I think that would make it much more usable for general folks.

The interface could be the stack one I tried (worked well for me and let me hack libs to silently use internal/fast or external/slow), or @devyte was thinking more of a single flag (normal/iram/external) for simplicity.

Any thought from you on malloc interfaces?

Also, PlatformIO is not liking the .ld->.h conversions you've done. I can't quite grok what's wrong and have never been able to get a working general PIO install running on my system, but you might want to take a look.

@mhightower83
Copy link
Contributor Author

@earlephilhower Thanks for the kind words.

One thing I see you didn't pull in is the changes I did to make the memory accessible. When using the non32b exception handler, UMM can manage the memory as just another heap, allowing users to malloc() and new() with impunity. I think that would make it much more usable for general folks.

Yes, that was in a different PR that I meant to go back and look at. I was hoping that I could adapt UMM to do that when I added the non32b; however, I think I see an issue in the ROMs _xtos_c_wrapper_handler for processing exceptions. It does a rsil 0 a little bit before calling the registered exception handler.

I also had trouble with the Hello Bear example when I fixed it up with an iRAM stack directly. My memory has faded on the specifics, I don't remember the kind of crash. I suppose it could be interrupt related. --- Just looked through your PR --- Not sure why I had trouble and you didn't with the Bear stuff. I'll have to try that again.

I am also thinking that UMM should revert back to internal DRAM, anytime it is called with interrupts disabled. That would result in all the iRAM allocations occurring in the foreground and maybe for exception processing disable IRQs at the start and leave it to the exit logic to restore PS.

Any thought from you on malloc interfaces?

Not sure. While simplicity is usually best, this is not looking very simple. DRAM, IRAM, FLASH, and external SRAM each have different performance properties and concerns. For now, I'll look at merging in your umm_malloc changes.

PlatformIO

Yea I didn't know what to do about that. I have an idea now that might work better w/o all the renaming to .h, I'll push it in a couple of days.

Added some inline functions to aid in byte and short access to iRAM.
 * only byte read has been tested
Updated .ld file to work better with platform.io; however, I am still
missing some steps, so platformio will still fail.
@earlephilhower
Copy link
Collaborator

I also had trouble with the Hello Bear example when I fixed it up with an iRAM stack directly. My memory has faded on the specifics, I don't remember the kind of crash. I suppose it could be interrupt related. --- Just looked through your PR --- Not sure why I had trouble and you didn't with the Bear stuff. I'll have to try that again.

I was unable to get the SSL stack in external SRAM working, and I doubt that you'll be able to make it work in IRAM, either. There is no problem other than it's too slow and accessed on non-word boundaries, and accessed too intensely. I get a WDT w/external SRAM for the BSSL stack, and I would bet that's what you're seeing too.

I actually moved the 17KB SSL buffer (i.e. the per-connection info) to SRAM and that ran fine and any perf. difference was undetectable. I also moved the String() allocator to it, too, and also had no issue in anything I tested.

It's probably a matter of adding more optimistic_yield()s in the library, but I didn't look into it.

master was missing new additions added by boards.txt.py in the PR.
Which the CI flags when it rebuilds boards.txt.
Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp,
WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from
@earlephilhower PR esp8266#6994.

Reworked umm_malloc to use context pointers instead of copy context.
umm_malloc now supports allocations from IRAM. Added class
HeapSelectIram, ... to aid in selecting alternate heaps,
modeled after class InterruptLock.
Restrict alloc request from ISRs to DRAM.

Never ending improvements to debug printing.

Sec Heap option now pulls in free IRAM left over in the 1st 32K block.
Managed through umm_malloc with HeapSelectIram.

Updated examples.
Don't know what to do with platformio it doesn't like my .S file.
ifdef out USE_ISR_SAFE_EXC_WRAPPER to block the new assemlby module
from building on platformio only.
@mcspr
Copy link
Collaborator

mcspr commented Mar 6, 2020

Re eb9882e. I don't see any errors showing up in the CI... if that's the reason for confusion about PIO issues, here some locally rolled back to 91fc391. Paths are cleaned up for readability:

cores/esp8266/exc-c-wrapper-handler.S: Assembler messages:
cores/esp8266/exc-c-wrapper-handler.S:99: Error: literal pool location required for text-section-literals; specify with .literal_position
cores/esp8266/exc-c-wrapper-handler.S:123: Error: literal pool location required for text-section-literals; specify with .literal_position
cores/esp8266/exc-c-wrapper-handler.S:99: Error: literal pool location required for text-section-literals; specify with .literal_position

tools/platformio-build.py Outdated Show resolved Hide resolved
@mhightower83
Copy link
Contributor Author

@mcspr Thankyou! That was my problem.

@d-a-v d-a-v removed the merge-conflict PR has a merge conflict that needs manual correction label Nov 23, 2020
Limited access to some detailed typdefs/prototypes to .cpp
modules, to avoid future build conflicts.

Completed TODO for verifing that the "C" structure struct __exception_frame
matches the ASM version.

Fixed some typo's, code rot, and added some more cases in examaple irammem.ino.
Refactored a little and reordered printing to ease comparison between methods.

Corrected `#ifdef __cplusplus` coverage area. Cleaned up `extern "C" ...` usage.
Fixes issues with including mmu_iram.h or esp8266_undocumented.h in .c files.
Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default menu option changes nothing so it should be safe to merge.
Changes are numerous but quite clear if time is spent to read them.
In my humble opinion, this feature will give a second life to esp8266/arduino where memory is sometimes tight.
I tried it with esp8266Audio which requires a fair amount of cpu power for decoding, and all went well with the shared second heap allowing a bigger audio buffer and much more free DRAM.
Thank you for this huge work @mhightower83 !

Some comment tuning.

In the context of _xtos_set_exception_handler and the functions it registers,
changed to type int for exception cause type. This is also the type used by gdbstub
and some other Xtensa files I found.
@mhightower83
Copy link
Contributor Author

@devyte

#7060 (comment)

  uint32_t v;
}  mmu_cre_status_t;

extern mmu_cre_status_t mmu_status;

Does this still need to be cleaned up?

That has been removed.

@d-a-v d-a-v merged commit 8b662ed into esp8266:master Dec 6, 2020
@devyte
Copy link
Collaborator

devyte commented Dec 6, 2020

Congratulations @mhightower83, this is an awesome feature!

@mhightower83
Copy link
Contributor Author

@devyte Thanks! It looks like that little POC for 16K/32K cache selection grew a little bit.

I want to thank @earlephilhower for his work on a virtual memory PR which I leverage to get started with the exception handler and 1st Heap selection API.

@mhightower83 mhightower83 deleted the poc-cache-config branch December 7, 2020 02:07
jjsuwa-sys3175 added a commit to jjsuwa-sys3175/Arduino that referenced this pull request Dec 11, 2020
* add double-quotes to `compiler.S.flags`

* fix windows-specific processes (`recipe.hooks.linking.prelink.[12].pattern.windows`)

* rewrite processing of "mkdir" and "cp" in python because of platform-independence
earlephilhower pushed a commit that referenced this pull request Dec 14, 2020
* Fix: cannot build after #7060 on Win64

* add double-quotes to `compiler.S.flags`

* fix windows-specific processes (`recipe.hooks.linking.prelink.[12].pattern.windows`)

* rewrite processing of "mkdir" and "cp" in python because of platform-independence

* make consistent with the use of quotation marks in other *.py files
davisonja added a commit to davisonja/Arduino that referenced this pull request Dec 28, 2020
…lash

* upstream/master: (72 commits)
  Typo error in ESP8266WiFiGeneric.h (esp8266#7797)
  lwip2: use pvPortXalloc/vPortFree and "-free -fipa-pta" (esp8266#7793)
  Use smarter cache key, cache Arduino IDE (esp8266#7791)
  Update to SdFat 2.0.2, speed SD access (esp8266#7779)
  BREAKING - Upgrade to upstream newlib 4.0.0 release (esp8266#7708)
  mock: +hexdump() from debug.cpp (esp8266#7789)
  more lwIP physical interfaces (esp8266#6680)
  Rationalize File timestamp callback (esp8266#7785)
  Update to LittleFS v2.3 (esp8266#7787)
  WiFiServerSecure: Cache SSL sessions (esp8266#7774)
  platform.txt: instruct GCC to perform more aggressive optimization (esp8266#7770)
  LEAmDNS fixes (esp8266#7786)
  Move uzlib to master branch (esp8266#7782)
  Update to latest uzlib upstream (esp8266#7776)
  EspSoftwareSerial bug fix release 6.10.1: preciseDelay() could delay() for extremely long time, if period duration was exceeded on entry. (esp8266#7771)
  Fixed OOM double count in umm_realloc. (esp8266#7768)
  Added missing check for failure on umm_push_heap calls in Esp.cpp (esp8266#7767)
  Fix: cannot build after esp8266#7060 on Win64 (esp8266#7754)
  Add the missing 'rename' method wrapper in SD library. (esp8266#7766)
  i2s: adds i2s_rxtxdrive_begin(enableRx, enableTx, driveRxClocks, driveTxClocks) (esp8266#7748)
  ...
@Frtrillo
Copy link

@devyte Thanks! It looks like that little POC for 16K/32K cache selection grew a little bit.

I want to thank @earlephilhower for his work on a virtual memory PR which I leverage to get started with the exception handler and 1st Heap selection API.

Thank you for this amazing option. It really comes in handy specially when using SSL, helping the ESP8266 instead of replacing with ESP32.

As a sidenote for newcomers when you use the tool, the extra HEAP won't show with the usual freeHeap function, but its there working and having extra heap.

@mhightower83
Copy link
Contributor Author

@efitrillo That is good to hear.

The well-established Heap APIs, like freeHeap, work with the current Heap selected.
Do something like this to see the free IRAM Heap:

#include <umm_malloc/umm_heap_select.h>

#ifdef UMM_HEAP_IRAM
  {
    // Note, the current heap does not change if the IRAM Heap was not in the 
    // build option. In that case, ESP.getFreeHeap() will report free DRAM space.
    HeapSelectIram ephemeral;  
    Serial.printf("IRAM free: %6d\r\n", ESP.getFreeHeap());
  }
#else 
  Serial.printf("IRAM free: 0\r\n");
#endif 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Improve available heap
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

7 participants