Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 32b loads to set print strings #7545

Merged
merged 4 commits into from
Oct 16, 2020

Conversation

earlephilhower
Copy link
Collaborator

Instead of using either a series of etc_putc or setting a series of bytes
one by one, use a simple macro to define 32b constants to build up strings.

Saves ~30 bytes of program code in eboot

Instead of using either a series of etc_putc or setting a series of bytes
one by one, use a simple macro to define 32b constants to build up strings.

Saves ~30 bytes of program code in eboot
// 4 chars at a time. Smaller code than byte-wise assignment.
uint32_t fmt[2];
fmt[0] = S('v', '%', '0', '8');
fmt[1] = S('x', '\n', 0, 0);
ets_printf((const char*) fmt, ver);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more readable solution I guess is:

#define TINYSTR(str) [__builtin_strlen(str) + 1] = str
#define TINYCHARS(chars) [__builtin_strlen(chars)] = chars

and then,

    const char fmt TINYSTR("v%08x\n");
    ets_printf(fmt, ver);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, greater then 4 bytes string literals will be placed into RODATA (and memcpy will copy them)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry,

#define S(str) ((uint32_t)(str"\0\0\0")[0] | ((uint32_t)(str"\0\0\0")[1] << 8) | ((uint32_t)(str"\0\0\0")[2] << 16) | ((uint32_t)(str"\0\0\0")[3] << 24))

and

    const uint32_t fmt[] = { S("v%08"), S("x\n") };
    ets_printf(fmt, ver);

are OK (now placed into .text).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there aren't any really good solutions. We don't allow RODATA because we don't have the C init stuff in the bootloader. Can't use PSTR because ets_printf can't handle PROGMEM byte accesses. GCC's multichar literal puts things in the wrong order for strings (endian issue). So we end up with either a 4-param or 1-param marco w/array indexing.

Honestly, I like the existing setup more than the string 4-chars because with the current macro, if you accidentally put in a >4 char element it will error (while the string[index] will silently allow it).

But, your idea to make it a const array to get in .text, that's pretty cool. Let me see what it does to the size, might buy 4-8 bytes per instance...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, looks like that's not going to work. I see .rodata appear when I do const uint32_t s={S(),S()}; Back where we were...

Copy link
Collaborator Author

@earlephilhower earlephilhower Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, anyway, the string needs to be in RAM since it will be byte accessed. So if it was reduced to just a pointer to ICACHE (where eboot lives)/PROGMEM it would end up SEGV'ing. Has to be a copy to RAM step no matter what.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, get to be quite weird definition...
( #include <assert.h> needed )

#define S(x, str) \
    do { \
        const unsigned int l = (__builtin_strlen(str) + 1 + 3) / 4; \
        static_assert(sizeof(x) / sizeof(*x) == l); \
        unsigned int i = 0; \
        while(i < l) \
            x[i] = ((uint32_t)(str"\0\0\0")[i * 4 + 0] | ((uint32_t)(str"\0\0\0")[i * 4 + 1] << 8) | ((uint32_t)(str"\0\0\0")[i * 4 + 2] << 16) | ((uint32_t)(str"\0\0\0")[i * 4 + 3] << 24)), ++i; \
    } while(0)

and

void test(void) {
    uint32_t fmt[3];
    S(fmt, "foo bar baz");

    extern int write(const void*);
    write((const char*)fmt);
}

emits the below (good result)

	.file	"test.c"
	.text
	.literal_position
	.literal .LC0, 544173926
	.literal .LC1, 544366946
	.literal .LC2, 8020322
	.align	4
	.global	test
	.type	test, @function
test:
	l32r	a2, .LC0
	addi	sp, sp, -32
	s32i.n	a2, sp, 0
	l32r	a2, .LC1
	s32i.n	a0, sp, 28
	s32i.n	a2, sp, 4
	l32r	a2, .LC2
	s32i.n	a2, sp, 8
	mov.n	a2, sp
	call0	write
	l32i.n	a0, sp, 28
	addi	sp, sp, 32
	ret.n
	.size	test, .-test
	.ident	"GCC: (GNU) 10.1.0"

but this

void test(void) {
    uint32_t fmt[4];
    S(fmt, "foo bar baz ");  /* 1 whitespace appended */

    extern int write(const void*);
    write((const char*)fmt);
}

emits bad...

	.file	"test.c"
	.text
	.section	.rodata
.LC0:
	.string	"foo bar baz "
	.string	""
	.string	""
	.string	""
	.text
	.literal_position
	.literal .LC1, .LC0
	.align	4
	.global	test
	.type	test, @function
test:
	addi	sp, sp, -32
	l32r	a3, .LC1
	s32i.n	a0, sp, 28
	movi.n	a4, 0
.L2:
	l8ui	a2, a3, 1
	l8ui	a5, a3, 2
	slli	a2, a2, 8
	slli	a5, a5, 16
	or	a2, a2, a5
	l8ui	a5, a3, 0
	add.n	a6, sp, a4
	or	a2, a2, a5
	l8ui	a5, a3, 3
	addi.n	a4, a4, 4
	slli	a5, a5, 24
	or	a2, a2, a5
	s32i.n	a2, a6, 0
	addi.n	a3, a3, 4
	bnei	a4, 16, .L2
	mov.n	a2, sp
	call0	write
	l32i.n	a0, sp, 28
	addi	sp, sp, 32
	ret.n
	.size	test, .-test
	.ident	"GCC: (GNU) 10.1.0"

@mhightower83
Copy link
Contributor

@earlephilhower, At the moment I am having trouble remember the exact specifics; however, earlier this year (feels like years ago) I was able to get the Boot ROM loader to handle a special eboot, with two segments one for code and one for data which got loaded to DRAM. It also required changes to elft2bin.py. It was one of many tangents/attempts I went on when doing the HWDT Stack Dump. There were some issues in getting the checksum correct for each segment. Only the 1st segment is biased by 0xFE the additional segments use 0. My notes indicate there are also some payload length tweaks that are needed in creating the .bin. If you are interested I can hunt down and update the branch, I think there are changes in eboot and elft2bin.py since then that would need to be incorporated. It might take me a while, I am a bit distracted.

@earlephilhower
Copy link
Collaborator Author

Thanks, @mhightower83 . For now, though, I'd really like to just make it simple and save a few bytes because I know there are add'l changes coming in (fixed verify, eboot protect, OTA command in flash, etc.).

The way it works now the ROM copies the first 4K to IRAM (per the flash copy structure which elf2bin.py generates and the ROM parses) and jumps to the start of code. In eboot.c it's definitely possible to manually copy from IRAM->DRAM for RODATA/etc., but that's additional code which eats up any savings. I don't remember if the ROM can do multiple region copies, bu if it does then we could do the same stuff for RODATA (just align start to 4K somewhere, not freely assigned as it would be normally). But, again, that's way more work than I'm thinking of right now, for little gain.

@mhightower83
Copy link
Contributor

I don't remember if the ROM can do multiple region copies

@earlephilhower Yes, it does; however, I could not find it documented. I ended out using the empirical method to figure out what it wanted.

Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does elf file need to be regenerated since aug24 ?

@earlephilhower earlephilhower merged commit bbc14c0 into esp8266:master Oct 16, 2020
@earlephilhower earlephilhower deleted the bootmin branch October 16, 2020 17:52
@dirkenstein
Copy link
Contributor

Hello, I think this should be:

// 4 chars at a time.  Smaller code than byte-wise assignment.
 uint32_t fmt[2];
 fmt[0] = S('v', '%', '0', '8');
 fmt[1] = S('x', '\n', 0, 0);
 ets_printf((const char*) &fmt, ver);

I think the cast was using the actual text as the pointer value, not the location of fmt. Have I misunderstood something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants