Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ota commands in flash #6538

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

davisonja
Copy link

@davisonja davisonja commented Sep 20, 2019

Currently OTA updates involve the use of (volatile) RTC memory to keep track of update commands across a chip reset. If there is a powerloss part-way through the flashing process the result is likely to be corrupt flash. Ideally these commands would be persisted in something less volatile - like flash.

This PR adds that functionality, and resolves #905.

Update commands are stored in a reserved area of flash that has been inserted just before the start of the FS in flash. This ensures that other things don't end up moving (like sdkwifi pages), that we don't lose FS space, but does cost 2 pages of potential sketch space. This may be an issue on smaller sized devices and it is possible to make the reservation optional - though the last consensus was it was a small enough loss so as not to be an issue.

A pointer to the location of the flash store has been added to the start of the flash, along with a magic number that the bootloader can use to recognise whether or not flash-storage should be checked. In this way it's possible to use the updated bootloader with older sketches without any issues (and means that control over using flash storage should be kept within the sketch).

As it stands the location of the update data hasn't been changed, and so is still in the first 1MB of flash, meaning you have a maximum sketch size half of that (less reserved areas). Once this has received some meaningful testing beyond my boards I anticipate updating the system to allow update data to be stored further on in flash, thus allowing larger OTA updates - assuming it's not already been done; I've not been through the current state beyond what was required to merge in the flash update solution.

boards.txt.py now generates .ld files that should include all the necessary information, and in the right places.

There is now a README.md file in the eboot folder as the start of some descriptions of how it works. It still needs some details added, but realistically all that info is already in the source itself.

I've now tested it on real hardware, which works, after a few fixes. 😄

@davisonja
Copy link
Author

@d-a-v @Androbin a draft PR if you're interested.

@OpenUAS
Copy link

OpenUAS commented Sep 20, 2019

Good useful PR, so a CI that does not fail and the PR goes through would be great...

@devyte devyte changed the title Ota commands in flash WIP - Ota commands in flash Sep 20, 2019
@davisonja
Copy link
Author

davisonja commented Sep 20, 2019 via email

@Androbin
Copy link
Contributor

What would happen if I deployed the current bootloader and had it update itself to the new version later? Would it brick itself because the running code would be overwritten?

@davisonja
Copy link
Author

What would happen if I deployed the current bootloader and had it update itself to the new version later? Would it brick itself because the running code would be overwritten?

While I don't have my notes handy at the moment I'm fairly sure that self-updating was supported, if not already (I forget whether the writing in eboot is already totally run-from-RAM) then certainly as an achievable improvement - switching to the flash-using bootloader entirely OTA is an intended use-case.

…ault off.

Readded the start of the eboot docs in README.md (still WIP, but it's a start)
Adjusted the default number of commands that are cycled through for wear levelling to 32
Fixed a 🤦 bug in the loop to find the first available flash-based command block
@davisonja davisonja changed the title WIP - Ota commands in flash Ota commands in flash Oct 3, 2019
@lrodorigo
Copy link
Contributor

lrodorigo commented Oct 16, 2020 via email

earlephilhower added a commit that referenced this pull request Oct 16, 2020
Instead of using either a series of etc_putc or setting a series of bytes
one by one, use a simple macro to define 32b constants to build up strings.

Saves ~30 bytes of program code in eboot for #6538 to work with.
@devyte
Copy link
Collaborator

devyte commented Oct 17, 2020

@lrodorigo it is always a good thing to have other testers confirming that something works!

@devyte
Copy link
Collaborator

devyte commented Oct 17, 2020

@davisonja that's only 152 bytes over, not a lot. I don't think it makes sense to keep both rtc and flash commands, so I'd say you're on the right path.

@drzony
Copy link
Contributor

drzony commented Oct 17, 2020

@devyte @davisonja This code exploits the same write without erase "feature" that SPI does, it may stop working at some point with new flashes. from what I can see it "should" work on PUYA because of read-then-write, but we need to be careful here.

…lash

* upstream/master: (72 commits)
  Typo error in ESP8266WiFiGeneric.h (esp8266#7797)
  lwip2: use pvPortXalloc/vPortFree and "-free -fipa-pta" (esp8266#7793)
  Use smarter cache key, cache Arduino IDE (esp8266#7791)
  Update to SdFat 2.0.2, speed SD access (esp8266#7779)
  BREAKING - Upgrade to upstream newlib 4.0.0 release (esp8266#7708)
  mock: +hexdump() from debug.cpp (esp8266#7789)
  more lwIP physical interfaces (esp8266#6680)
  Rationalize File timestamp callback (esp8266#7785)
  Update to LittleFS v2.3 (esp8266#7787)
  WiFiServerSecure: Cache SSL sessions (esp8266#7774)
  platform.txt: instruct GCC to perform more aggressive optimization (esp8266#7770)
  LEAmDNS fixes (esp8266#7786)
  Move uzlib to master branch (esp8266#7782)
  Update to latest uzlib upstream (esp8266#7776)
  EspSoftwareSerial bug fix release 6.10.1: preciseDelay() could delay() for extremely long time, if period duration was exceeded on entry. (esp8266#7771)
  Fixed OOM double count in umm_realloc. (esp8266#7768)
  Added missing check for failure on umm_push_heap calls in Esp.cpp (esp8266#7767)
  Fix: cannot build after esp8266#7060 on Win64 (esp8266#7754)
  Add the missing 'rename' method wrapper in SD library. (esp8266#7766)
  i2s: adds i2s_rxtxdrive_begin(enableRx, enableTx, driveRxClocks, driveTxClocks) (esp8266#7748)
  ...
@davisonja
Copy link
Author

@devyte @davisonja This code exploits the same write without erase "feature" that SPI does, it may stop working at some point with new flashes. from what I can see it "should" work on PUYA because of read-then-write, but we need to be careful here.

Are there flash systems where this doesn't apply, @drzony ? If I follow your comment it's the behaviour where we're modifying already written blocks to mark commands as needing action?

@davisonja
Copy link
Author

@davisonja that's only 152 bytes over, not a lot. I don't think it makes sense to keep both rtc and flash commands, so I'd say you're on the right path.

Finally actually had a first run at removing the RTC stuff, and it looks like that does the trick, so compressed images and flash storage can both run, without the RTC support (and also without serial debugging on, but with #7545 applied more liberally through the other debug code it might yet be possible to get that in too.

I've yet to verify the code actually works after all these random code removals, but the fact that it fits is a good start :)

@drzony
Copy link
Contributor

drzony commented Dec 28, 2020

@davisonja Currently I don't think there are any flashes that we don't handle. But the workaround is based on flash ID. If another flash comes out that works in the same manner as PUYA ones, then it will break silently (without read-then-write PUYA flashes corrupt the bytes written).
Yes, I mean writing to same bytes of flash twice without erase for command storage.

@earlephilhower
Copy link
Collaborator

Your CI failures are related to you having a ESP8266SDFat submodule in your commit. The easiest thing at this point would be to just update to the proper ESP8266SDFat commit ESP8266SdFat @ 0a46e4e and git add that submodule again.

@davisonja
Copy link
Author

@drzony I might need to find the PUYA info on a computer - I had a quick look last night from my phone.

My impression of flash systems were they (all) require an erase cycle (usually a sequence of consecutive addresses - a page or block) which resets the storage to a known value (typically 1's IIRC); from there you can write actual values which is actually achieved by updating bits from the erase-value. The write process can be repeated as many times as you like, but you can only ever toggle a bit from the erase-value to the other one. So in the case of an erase-value of 1, you can only ever clear bits, but you can do so at anytime - there's no concept of write-count, so 'twice' doesn't really apply.

From a (fairly quick) look around this seems to be fairly standard, though the exact amount you have to 'write' varies (as in, a byte, or several bytes) the ability to update any 1 to a 0 holds. It's definitely a common 'trick'. Worth a note in the info on the eboot docs, tho.

@drzony
Copy link
Contributor

drzony commented Dec 29, 2020

@davisonja
Yes, this is a standard, but spi_flash_write does not erase anything.
So in eboot_command_write_to_flash there is a second write to the same address without erase. This works since most SPI flashes have undocumented feature that you can write to the same address as long as it only has 1->0 transitions. (FYI spi_flash_write writes always in 4-byte increments). There is no write-count, but PUYA flashes corrupt data on second write if the address was not read from before the write. So on "standard" flashes you can write bit-by-bit 1->0 transitions, but on PUYA flashes (and probably other in the future) you can only write each 4 bytes once (and then erase is required) unless you do read-then-write trick.

From a (fairly quick) look around this seems to be fairly standard, though the exact amount you have to 'write' varies (as in, a byte, or several bytes) the ability to update any 1 to a 0 holds. It's definitely a common 'trick'. Worth a note in the info on the eboot docs, tho.

That's the problem I'm highlighting, that this is a 'common trick', but it's not really documented by any manufacturer and may not work on all flashes.

Some of it is already described in #7644
There was a lenghty discussion in #7514

@davisonja
Copy link
Author

@drzony that's an interesting collection of reading, ta.

Do you think the current scheme (as in this incomplete PR) is going to be ok to proceed with given that it looks as though it's compatible with the flash implementations we've dealt with so far? I'm not clear whether 'be careful here' is aimed at that being kept in mind, or in changing the process the code is using to mark commands...

@drzony
Copy link
Contributor

drzony commented Dec 31, 2020

@davisonja I would change the way of writing the commands. The simplest idea would be to have each flag as a separate uint32_t (this will waste some space, but will make sure that the bytes are written only once) and do not write them when saving the command (only when marking). Then 0xFFFFFFFF would mean flag not set and 0x00000000 flag set.

@davisonja
Copy link
Author

@drzony so the considered wisdom is that a full 32-bits is safest - is that for flash in general, or the current scheme of writing used in this project?

@drzony
Copy link
Contributor

drzony commented Jan 1, 2021

@davisonja It's the limitation of spi_flash_write from Espressif SDK:

  1. Memory passed to spi_flash_write must be 4-byte aligned i.e. (uint32_t)buffer % 4 == 0
  2. It always writes 4 bytes even if passed size is less than 4 bytes
  3. If starting flash offset is not 4-byte aligned, you cannot pass page (256 bytes) boundary

I wrote workarounds for this, you can see them here

@zhfei1979
Copy link

Poor bug, still not fixed

@davisonja
Copy link
Author

davisonja commented Apr 9, 2022 via email

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 9, 2022

Unfortunately the code base has moved significantly since it was first
built

We can help with this if needed

@d-a-v d-a-v modified the milestones: 3.1, 4.0.0 Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-conflict PR has a merge conflict that needs manual correction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OTA which survives power failure
9 participants