Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WString: Optimize a bit #7553

Merged
merged 4 commits into from
Nov 16, 2020
Merged

WString: Optimize a bit #7553

merged 4 commits into from
Nov 16, 2020

Conversation

jjsuwa
Copy link
Contributor

@jjsuwa jjsuwa commented Aug 27, 2020

  • move bodies of dtor, init() and charAt() to .h (implicitly inlined)

  • unify the description of initialization into one: init() (literally), and called from each ctors, invalidate() and move().

  • invert the SSO state logic in order to make init state zeroed (as a result, inlined init() insns: 4 to 3)

IROM size @ FSBrowser: 304836 -> 304596 (saved 240 from master)

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but the PR has some problems. I'm very wary of changing String because it can end up affecting a lot of things down the line if we mess anything up.

Do you have any numbers for the changes here, in terms of code size or String ops/sec? I really doubt these changes have an appreciable size or performance difference.

cores/esp8266/WString.h Outdated Show resolved Hide resolved
cores/esp8266/WString.cpp Outdated Show resolved Hide resolved
cores/esp8266/WString.cpp Show resolved Hide resolved
cores/esp8266/WString.cpp Show resolved Hide resolved
@earlephilhower
Copy link
Collaborator

Can you please give some hard #s on the size difference for a few examples using this vs. master, and the runtime differences, too?

You could do a build and then objdump -T to get the function sizes to do it at a mcro-level, or you could build FSBrowser.ino which makes extensive use of Strings and check total bin size.

@jjsuwa
Copy link
Contributor Author

jjsuwa commented Aug 28, 2020

FSBrowser.ino build results:

  • commit e53aa87 (master in Aug 26, 2020):
    FSBrowser.ino.bin : 338304 bytes
    • IROM : 305020
    • IRAM : 26340
    • DATA : 1272
    • RODATA : 1512
    • BSS : 25576
      objdump -t (excerpt) :
00000008 g     F .text._ZN6StringC2Ec	00000031 _ZN6StringC2Ec
00000008 g     F .text._ZN6StringC2Ec	00000031 _ZN6StringC1Ec
0000000c g     F .text._ZN6StringC2Ehh	0000003d _ZN6StringC2Ehh
0000000c g     F .text._ZN6StringC2Ehh	0000003d _ZN6StringC1Ehh
00000014 g     F .text._ZN6StringC2Eih	0000004f _ZN6StringC2Eih
00000014 g     F .text._ZN6StringC2Eih	0000004f _ZN6StringC1Eih
0000000c g     F .text._ZN6StringC2Ejh	0000003b _ZN6StringC2Ejh
0000000c g     F .text._ZN6StringC2Ejh	0000003b _ZN6StringC1Ejh
00000014 g     F .text._ZN6StringC2Elh	0000004f _ZN6StringC2Elh
00000014 g     F .text._ZN6StringC2Elh	0000004f _ZN6StringC1Elh
0000000c g     F .text._ZN6StringC2Emh	0000003b _ZN6StringC2Emh
0000000c g     F .text._ZN6StringC2Emh	0000003b _ZN6StringC1Emh
00000010 g     F .text._ZN6StringC2Efh	0000005b _ZN6StringC2Efh
00000010 g     F .text._ZN6StringC2Efh	0000005b _ZN6StringC1Efh
0000000c g     F .text._ZN6StringC2Edh	0000004e _ZN6StringC2Edh
0000000c g     F .text._ZN6StringC2Edh	0000004e _ZN6StringC1Edh
00000004 g     F .text._ZN6StringD2Ev	00000013 _ZN6StringD1Ev
00000004 g     F .text._ZN6StringD2Ev	00000013 _ZN6StringD2Ev
00000010 g     F .text._ZN6String4copyEPKcj	0000005b _ZN6String4copyEPKcj
00000010 g     F .text._ZN6String4copyEPK19__FlashStringHelperj	0000005b _ZN6String4copyEPK19__FlashStringHelperj
  • commit f4352b1d72180401f0669054f860d150cd025712 (this PR, latest):
    FSBrowser.ino.bin : 338320 bytes
    • IROM : 305036
    • IRAM : 26340
    • DATA : 1272
    • RODATA : 1512
    • BSS : 25576
      objdump -t (excerpt) :
00000008 g     F .text._ZN6StringC2Ec	0000002a _ZN6StringC2Ec
00000008 g     F .text._ZN6StringC2Ec	0000002a _ZN6StringC1Ec
00000010 g     F .text._ZN6StringC2Ehh	0000003d _ZN6StringC2Ehh
00000010 g     F .text._ZN6StringC2Ehh	0000003d _ZN6StringC1Ehh
00000018 g     F .text._ZN6StringC2Eih	00000051 _ZN6StringC2Eih
00000018 g     F .text._ZN6StringC2Eih	00000051 _ZN6StringC1Eih
00000010 g     F .text._ZN6StringC2Ejh	0000003b _ZN6StringC2Ejh
00000010 g     F .text._ZN6StringC2Ejh	0000003b _ZN6StringC1Ejh
00000018 g     F .text._ZN6StringC2Elh	00000051 _ZN6StringC2Elh
00000018 g     F .text._ZN6StringC2Elh	00000051 _ZN6StringC1Elh
00000010 g     F .text._ZN6StringC2Emh	0000003b _ZN6StringC2Emh
00000010 g     F .text._ZN6StringC2Emh	0000003b _ZN6StringC1Emh
00000014 g     F .text._ZN6StringC2Efh	00000055 _ZN6StringC2Efh
00000014 g     F .text._ZN6StringC2Efh	00000055 _ZN6StringC1Efh
00000010 g     F .text._ZN6StringC2Edh	00000045 _ZN6StringC2Edh
00000010 g     F .text._ZN6StringC2Edh	00000045 _ZN6StringC1Edh
00000004 g     F .text._ZN6StringD2Ev	0000001c _ZN6StringD1Ev
00000004 g     F .text._ZN6StringD2Ev	0000001c _ZN6StringD2Ev
00000010 g     F .text._ZN6String4copyEPKcj	00000059 _ZN6String4copyEPKcj
00000010 g     F .text._ZN6String4copyEPK19__FlashStringHelperj	00000059 _ZN6String4copyEPK19__FlashStringHelperj

@devyte
Copy link
Collaborator

devyte commented Aug 28, 2020

That shows that your optimization attempts are going the wrong way: 16 bytes increase in IROM.

@jjsuwa
Copy link
Contributor Author

jjsuwa commented Aug 28, 2020

  • commit cb6dca9df0e677f336f842660de1a61eed705d75 (this PR: latest):
    FSBrowser.ino.bin : 338288 bytes
    • IROM : 305004
    • IRAM : 26340
    • DATA : 1272
    • RODATA : 1512
    • BSS : 25576
      objdump -t (excerpt) :
00000008 g     F .text._ZN6StringC2Ec	00000029 _ZN6StringC2Ec
00000008 g     F .text._ZN6StringC2Ec	00000029 _ZN6StringC1Ec
0000000c g     F .text._ZN6StringC2Ehh	00000033 _ZN6StringC2Ehh
0000000c g     F .text._ZN6StringC2Ehh	00000033 _ZN6StringC1Ehh
00000014 g     F .text._ZN6StringC2Eih	00000047 _ZN6StringC2Eih
00000014 g     F .text._ZN6StringC2Eih	00000047 _ZN6StringC1Eih
0000000c g     F .text._ZN6StringC2Ejh	00000031 _ZN6StringC2Ejh
0000000c g     F .text._ZN6StringC2Ejh	00000031 _ZN6StringC1Ejh
00000014 g     F .text._ZN6StringC2Elh	00000047 _ZN6StringC2Elh
00000014 g     F .text._ZN6StringC2Elh	00000047 _ZN6StringC1Elh
0000000c g     F .text._ZN6StringC2Emh	00000031 _ZN6StringC2Emh
0000000c g     F .text._ZN6StringC2Emh	00000031 _ZN6StringC1Emh
00000010 g     F .text._ZN6StringC2Efh	0000004b _ZN6StringC2Efh
00000010 g     F .text._ZN6StringC2Efh	0000004b _ZN6StringC1Efh
0000000c g     F .text._ZN6StringC2Edh	0000003b _ZN6StringC2Edh
0000000c g     F .text._ZN6StringC2Edh	0000003b _ZN6StringC1Edh
00000004 g     F .text._ZN6StringD2Ev	0000001c _ZN6StringD2Ev
00000004 g     F .text._ZN6StringD2Ev	0000001c _ZN6StringD1Ev
00000010 g     F .text._ZN6String4copyEPKcj	00000059 _ZN6String4copyEPKcj
00000010 g     F .text._ZN6String4copyEPK19__FlashStringHelperj	00000059 _ZN6String4copyEPK19__FlashStringHelperj

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing jumps out at me.

HOWEVER: we've been bitten before by innocent-seeming minor tweaks here. Please go over the current unit tests covering the String class and see if you can find any uncovered code that is affected by your changes, and report back here.
If you find uncovered code, or uncovered corner cases, please add tests to ensure correct behavior.

union {
struct _ptr ptr;
struct _sso sso;
};
static constexpr auto CAPACITY_MAX = (1U << sizeof(ptr.cap) * 8) - 1;
Copy link
Contributor

@TD-er TD-er Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used in the .h file, so why not move it to the .cpp?
Or could it be used from other classes? (it is protected, so only those inheriting from this class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it is protected, so only those inheriting from this class)

just only, but still possible.
if so, we should warn: [API change] (of course .isNotSSO too)

i'd like to know why these internal implementation members are protected but not private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the init() function was really needed for proper functioning of the class, so if someone wants to make a derived class, this function must be protected (or public).
I can imagine it is not needed for using the String class, so then protected makes perfect sense.
Now init() is no longer needed in the current implementation, but it may indeed break API design if the function is removed. The same for the CAPACITY_MAX value.

@TD-er
Copy link
Contributor

TD-er commented Aug 31, 2020

Is the init() function still needed?
Its declaration is already protected and it is only used now in the invalidate function.

@@ -159,7 +159,8 @@ unsigned char String::changeBuffer(unsigned int maxStrLen) {
// Fallthrough to normal allocator
size_t newSize = (maxStrLen + 16) & (~0xf);
// Make sure we can fit newsize in the buffer
if (newSize > CAPACITY_MAX) {
size_t maxCapacity = (1U << sizeof(ptr.cap) * 8) - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be careful when refactoring, the original definition was static constexpr auto. At least make this constexpr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the advice, @devyte

i have periodically checked the disassembled binaries using objdump -d.
in this case, the same as before.

cores/esp8266/WString.cpp Outdated Show resolved Hide resolved
unsigned char isSSO : 1;
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields
unsigned char isNotSSO : 1;
_sso() : len(0), isNotSSO(0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been proposed to move in the opposite direction: direct member init instead of constructor member init sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direct member init for bitfield requires c++20, will bring some of CI tests failure (possibly host emulation)

Copy link
Collaborator

@devyte devyte Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I meant. You should be able to do direct member init below in the union in some form or other, i. e. {0}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bitfield members that's not possible, as that's one of the new features of C++20.
So for this special case of the bitfields you can only initialize them in the constructor of the _sso struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick test:

  union
  {
    _sso sso = {{0},{0}};
    _ptr ptr;
  };

works fine with C++11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx @devyte:

that's surely fine, in terms of the logic: wholly zeroed, due to funccall of memset(this, 0, sizeof(_sso)).
zeroing _sso.buff[0] and _sso.bitfields would be sufficient (and of course most efficient).

both class member initializers (char member = 0;) and braces initialization (struct _foo foo = { 0, ... };) cause whole initialization
(assumed that each of unspecified members has default value).

member initializer (: bar(0), ...) seems to be only way to realize sparse(holed) initialization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sso struct is 12 bytes in size, yet you seem keen on doing microoptimizations here. I don't consider this worthwhile, the only case where this would be significant is in a tight loop where >90% of the time is spent in initing a String object, and in such a case I would likely yell at the author. Personally, I think there are other places where spending time and effort would bring far better returns.
If you decide to pursue this, please:

  • find bin size for initialization with {{0},{0}}
  • compare against bin size with your member initializer
  • compare runtime performance of both cases
  • report back here

BTW, we're already on gcc 10.x. Although we have C++17 enabled, in theory we have access to C++20 experimental features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we're already on gcc 10.x. Although we have C++17 enabled, in theory we have access to C++20 experimental features.

It was already mentioned before, this class is also ported to ESP32.
Are they also using gcc 10.x?
To be honest, if the braces initialization also works, then I would say to stick with what's working and not forcing to only be able to compile with the newest C++ standard.

What I'm more worried about is how the initialization is done when in a union.
In other words, what will be guaranteed to be initialized with the isSSO set to 1 (or the inverted one set to 0)?
We can talk about semantics and how the code may be the most aestetic pleasing to the reader.
But in the end it is important this one flag is always default set to mean isSSO is active.
If that should be done via automatic zero'ed initialization (thus inverting the flag) or the suggested:

  union
  {
    _sso sso = {{0},{0}};
    _ptr ptr;
  };

That's all fine by me.
Most important is that this isSSO is in its 'enabled' state (which isn't with this suggested code by the way) as default member value.

unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields
unsigned char isSSO : 1;
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields
unsigned char isNotSSO : 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Why the change to reverse logic? I see in addition that this is used as if(! isSSO()) in code, so this is a reverse of a reversed name that reverses a bit.
If there is a good reason for reversing, then please change the name to something that reflects the new meaning without the Not. Having to look at things like e. g. !isSSO() and then bool isSSO() {return !isNotSSO;} is just painful.
And I won't even mention the setSSO() body ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, unsigned char disabled : 1; or something, might be unambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need a better term, perhaps "heapBuffer" ? (isSSO was about not using a heap allocated buffer)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before moving forward with any change, please explain why it needs to change at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its easy: the default constructors are setting "SSO" to true. by making the default "false" or "0", the initialization can be similified because almost all compilers and all architectures have some kind of loophole optimisation to find the best possible code pattern for initializing a variable or a set of variables to zero.

I think instead of using "SSO" which is an acronym that is hard to understand in the first place, I would suggest to call it "HeapString" or "isHeapString" or "isLargeString" . "isnotSSO" which is "is not small string optimisation" is really awkward english.

char * buff;
uint16_t cap;
uint16_t len;
};
// This allows strings up up to 11 (10 + \0 termination) without any extra space.
enum { SSOSIZE = sizeof(struct _ptr) + 4 - 1 }; // Characters to allocate space for SSO, must be 12 or more
static constexpr auto SSOSIZE = sizeof(struct _ptr) + 4 - 1; // Characters to allocate space for SSO, must be 12 or more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the enum was that we were certain that way didn't create a constant with the value with old gcc compiler, even when using the value in code. We checked with sizeof and looked at the bin dump.
Are you positive that there is no case where this new definition creates a constant that increases the mem footprint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you indicated, this change has no performance effect (either better or worse).
according to the principle of minimum action, leaving it alone is probably the right way, i understood.

buf[0] = c;
buf[1] = 0;
*this = buf;
uint16_t buf = __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? (uint16_t)c : (uint16_t)c << 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks duplicated, please refactor.

buf[0] = c;
buf[1] = 0;
return concat(buf, 1);
uint16_t buf = __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? (uint16_t)c : (uint16_t)c << 8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate, please refactor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think also that most compilers perform this optimisation automatically. there might not be an observable behavior difference between old and new code.

@earlephilhower
Copy link
Collaborator

Guys, I'm seeing this PR morphing into two things.

  1. An explicit init of the SSO/PTR union to catch potential OOM invalid frees (ie. "bug fix")
  2. Some tweaks to the constructors to save +/- a few bytes but with no logical changes

Number 1 seems like a fine change and should be a 2-line PR, while number 2 is big and introduces issues of taste ans style.

Maybe the best way forward is to drop any functional changes from this and open a new one with only the "init sso/ptr union" change? That init PR should be easily approvable and get us runtime in end users to see if it has any effect.

@devyte
Copy link
Collaborator

devyte commented Sep 2, 2020

I agree, let's split the "bug fix" off into its own pr to discuss.

struct {
char _offset0 = 0; // wbuffer()[0] = 0;
char _offset1[SSOSIZE - 1];
char _offset11 = 0; // setSSO(true); setLen(0);
Copy link
Contributor

@TD-er TD-er Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why you want to do this, but I think the naming is a bit strange here.
Also, why not just use a constructor in one (not both) of the existing structs _ptr or _sso ?

I think _sso is the best choice here for having the constructor to zero all values.
And since it has to be done in the body of the constructor anyway, you can also make a function in that struct and call it both from its constructor and from the String::init() function.

As done here, using an unnamed struct with _offsetNN names is confusing.
For example _offset11 is actually the same address as used by the _sso flags.

@@ -68,7 +71,9 @@ class String {
explicit String(unsigned long, unsigned char base = 10);
explicit String(float, unsigned char decimalPlaces = 2);
explicit String(double, unsigned char decimalPlaces = 2);
~String(void);
inline ~String() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but what extra does inline add to a function implemented in a .h file?
I always thought implementing a function in a .h file always implies inline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you said, function bodies inside a class definition (typically in .h) are implicitly marked as inline.

like extra arithmetic parentheses, may be removed later.

Copy link
Collaborator

@mcspr mcspr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsuwa also note of the #7611 / d3b3a43

Adding this test to tests/host/core/test_string.cpp fails on master, but not with this patch:

...

TEST_CASE("String::move", "[core][String]")
{
    const char buffer[] = "this string goes over the sso limit";

    String target;
    String source(buffer);

    target = std::move(source);
    REQUIRE(source.c_str() != nullptr);
    REQUIRE(target == buffer);
}

...

Perhaps you could add this test (or my commit) here?

re ALL: all existing change requests seem resolved, please re-review? I do believe initialization issue discussed above was resolved through the use of init(). Naming might not, but idk if it is still relevant.

@earlephilhower
Copy link
Collaborator

What is the new code size difference with these changes, @jjsuwa ?

The changing polarity of the isSSO bit into isHeap does not affect any actual code, it's just an unnecessary change IMO which makes things obtuse since the code is still using the name SSO in method names. Saving 3-4 bytes of flash would not be worth that obfuscation IMO (at the level of build noise). If the change saves 100 bytes or more, then it's meaningful.

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add in @mcspr 's add'l String test which passes w/the fixed/minimized ::move(), too?

rhs.setLen(0);
rhs.setBuffer(nullptr);
invalidate();
sso = rhs.sso; // memcpy(this, &rhs, sizeof(*this));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment would be best removed.

@@ -225,32 +215,9 @@ String & String::copy(const __FlashStringHelper *pstr, unsigned int length) {

#ifdef __GXX_EXPERIMENTAL_CXX0X__
void String::move(String &rhs) noexcept {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup of code here!

mcspr added a commit to mcspr/esp8266-Arduino that referenced this pull request Sep 27, 2020
based on the esp8266#7553 without isSSO -> isHeap rename and inline optimizations
additionally, remove useless pre-c++11 preprocessor checks

Co-authored-by: Takayuki 'January June' Suwa <[email protected]>
earlephilhower pushed a commit that referenced this pull request Sep 27, 2020
* (test) WString: c_str() returns null pointer

target = std::move(source) does not reset buffer pointer back to the sso

* wstring: correctly do move invalidation & copy

based on the #7553 without isSSO -> isHeap rename and inline optimizations
additionally, remove useless pre-c++11 preprocessor checks

Co-authored-by: Takayuki 'January June' Suwa <[email protected]>
Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes do not reflect the long discussion in this PR.
However approving the proposed changes as they are today.

@Jason2866
Copy link
Contributor

Nice one. Saves 544 bytes when used with Tasmota. Without 609844 bytes. With 609300 bytes.

Copy link
Contributor

@dirkmueller dirkmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Oh please, please pretty please.

Jason2866 referenced this pull request in arendst/Tasmota Oct 22, 2020
Tasmota Arduino Core v2.7.4.5 allowing webpassword over 47 characters (#9687)
Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-edit- latest push has add'l changes that need to be reviewed before re-approval.

While I'm not that big of a fan of the isSSO->isHeap conversion, what's left here does seem legit, minimal, functional, and did have a measurable impact on other large projects, so approving. Thx!

inline const char *buffer() const { return (const char *)(isSSO() ? sso.buff : ptr.buff); }
inline char *wbuffer() const { return isSSO() ? const_cast<char *>(sso.buff) : ptr.buff; } // Writable version of buffer
const char *buffer() const { return isSSO() ? sso.buff : ptr.buff; }
char *wbuffer() const { return const_cast<char *>(buffer()); } // Writable version of buffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of const_cast<> I would do the more readable other way around:

const char* buffer() const { return wbuffer(); }
char* wbuffer() const { return isSSO() ? sso.buff : ptr.buff; } // Writable version of buffer

(I wonder why the compiler don't complain against the const in char* wbuffer() const)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wonder why the compiler don't complain against the const in char* wbuffer() const)

You don't change the pointer, only where the pointer points to?
Otherwise you would need something like const char* const wbuffer() const, which is not open for debate on ugliness IMHO ;)

cores/esp8266/WString.cpp Show resolved Hide resolved
wbuffer()[fromIndex + 1] = '\0';
char* temp = strrchr(wbuffer(), ch);
wbuffer()[fromIndex + 1] = tempchar;
char *wbuffer = this->wbuffer();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local variable wbuffer is the same as the class method wbuffer() so I'd really ask to change this name to something else.

I think the logic is still correct, it's just a readability and sanity thing.

wbuffer()[right] = '\0';
out = wbuffer() + left; // pointer arithmetic
wbuffer()[right] = temp; //restore character
char *wbuffer = this->wbuffer();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Also, for these two changes have you seen any code size difference? wbuffer() is a pretty simple call and I think it has a good chance of being inlined to a simple reference to the string array...

@Jason2866
Copy link
Contributor

@earlephilhower the last change does indeed save some bytes more.
With this change we have a 72 bytes smaller Tasmota build

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 10, 2020

In order for this PR to finally reach merging state, and thanks for this, can you please, as suggested by @earlephilhower rename char *wbuffer = this->wbuffer(); to avoid name collision ?
I'd suggest char *writableBuffer. But it could be writableb, or wb, writb, iamdifferentfromwbuffer or even just wbuf.
Well... I guess you got it @jjsuwa :)

edit Sorry I have been confused by the "forced-push".

@earlephilhower @devyte I guess your reviews were addressed.

* move bodies of dtor, `init()` and `charAt()` to .h (implicitly inlined)

* unify descriptions of the initialization into one: `init()` (literally), that is called from each ctors, `invalidate()` and `move()`

* invert the SSO state logic in order to make init state zeroed (as a result, each inlined `init()` saves 1 insn)

* detab and trim

* remove `inline` from .h

* cosmetics

* optimize the non-SSO -> SSO transition part of `changeBuffer()`

* remove duped body of `operator =(StringSumHelper &&rval)`

* remove common subexpressions from `lastIndexOf()` and `substring()`

* eliminate `strlen(buf)` after calling `sprintf(buf, ...)` that returns # of chars written

* eliminate `len()` after calling `setLen(newlen)`

* make ctor`(char c)` inlineable

* optimize `setLen()`

* replace constant-forwarding overload functions with default argument ones

* optimize `concat(char c)`

IROM size @ FSBrowser: 304916 -> 304372 (saved 544 from master)
Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the local variable wbuffer was renamed, thanks! We should put this in and then see if we have any issues at end users, then in a month or so submit the same change to the ESP32 repo.

IROM size @ FSBrowser: 304372 -> 304356 (saved 560 from master)
sso.len = 0;
sso.isHeap = 0;
void init(void) __attribute__((always_inline)) {
sso.buff[0] = 0; // In the Xtensa ISA, in fact, 32-bit store insn ("S32I.N") is one-byte shorter than 8-bit one ("S8I").
Copy link
Contributor

@TD-er TD-er Nov 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the comment; One byte or one bit shorter?
Or do you mean the generated binary code?

@d-a-v d-a-v merged commit 8fe80f1 into esp8266:master Nov 16, 2020
@jjsuwa jjsuwa deleted the WString_cutWaste branch November 16, 2020 14:43
@devyte
Copy link
Collaborator

devyte commented Nov 16, 2020

@jjsuwa What was the total gain from this optimization effort? Flash size, IRAM, heap, speed etc. Could you please redo your checks from end of August and report here for completeness?

@jjsuwa-sys3175
Copy link
Contributor

sorry, now @jjsuwa is unable to login due to my mistake (DNS MX expiration)...

@devyte :

Could you please redo your checks from end of August and report here for completeness?

end of August... such a long ago :)

FSBrowser.ino build results:

  • commit faf59f5, "Update to GCC 10.2 (Update to GCC 10.2 #7607)" (Sep 24, 2020):
    FSBrowser.ino.bin : 338512 bytes

    • IROM : 305252
    • IRAM : 26324
    • DATA : 1272
    • RODATA : 1512
    • BSS : 25552
  • commit faf59f5 + this PR (replacing core/esp8266/WString.*):
    FSBrowser.ino.bin : 337744 bytes

    • IROM : 304484
    • IRAM : 26324
    • DATA : 1272
    • RODATA : 1512
    • BSS : 25552

for speed... i guess, at least it shouldn't become slow by this PR because of code shrinkage, funccall overhead elimination and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants