Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thello.s: sizes of constant strings should use .equ, not loading a byte from data memory, and various other optimizations. #2

Open
pcordes opened this issue May 18, 2021 · 1 comment
Assignees

Comments

@pcordes
Copy link

pcordes commented May 18, 2021

Someone linked https://github.com/robohack/experiments/blob/430b5ea22bc2f4f697c659aeb399e938d09744c1/thello.s for an example of a BSD build command, which is why I'm randomly looking at it.

It has one bug (in a comment): syscall definitely can't take an arg in RCX, the syscall instruction itself destroys RCX before the kernel gets control. ( https://stackoverflow.com/questions/32253144/why-is-rcx-not-used-for-passing-parameters-to-system-calls-being-replaced-with) Linux uses R10 instead of RCX, with the rest of the convention matching the function-calling convention. I'd guess most other x86-64 SysV OSes do the same, but I don't know for sure.

# SYSCALL ARGS
# rdi rsi rdx r10 r8 r9
  # wrong original: # rdi rsi rdx rcx r8 r9    # that's the function-calling convention.

Separately from that:

	andq $-16, %rsp		# clear the 4 least significant bits of stack pointer to align it

RSP is already aligned by 16 on process entry, as guaranteed by the x86-64 System V ABI.

	mov $4, %rax		# SYS_write
	mov $1, %rdi

You can mov $4, %eax to do this more efficiently (implicit zero-extension to 64-bit), especially if you're later trying to optimize by merging a length into the low by of RDX (which most kernels zero on process entry). Also, you can #include <sys/syscall.h> to get call numbers as CPP macro #defines, so you can mov $SYS_write, %eax. (Call your file .S so gcc will run it through CPP first).

You can use as -O2 or -Os to do simple optimizations like mov $4, %rax into mov $4, %eax like NASM does, because the architectural effect is identical. (If using GCC, -Wa,-O2, not gcc -O2)

	mov $hello, %rsi

Using a 32-bit sign-extended immediate for an absolute address is possible but inefficient. Normally you'd use lea hello(%rip), %rsi, or mov $hello, %esi (if 32-bit sign-extended works, so does zero-extended, assuming user-space using the bottom of the virtual address space, not the top.) https://stackoverflow.com/questions/57212012/how-to-load-address-of-function-or-label-into-register

	mov $1, %rax		# SYS_exit
	xor %rdi, %rdi

Again, 32-bit operand-size is 100% fine, especially for the xor since exit() takes an int arg. See my answer on https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and

Putting a constant byte in static storage is just silly; make it an assemble time constant you can use as an immediate like mov $hello_len, %edx (Or %rdx if you want).

.section .data       # could be .section .rodata

hello:
	.ascii "Hello world!\n"
hello_len = . - hello
# .equ hello_len,  . - hello     # alternative using .equ

	#.byte . - hello

So

	mov hello_len, %dl	# Note: does not clear upper bytes. Use movzxb (move zero extend) for that

becomes

	mov $hello_len, %edx       # zero-extends to fill RDX
@robohack robohack self-assigned this May 18, 2021
robohack added a commit that referenced this issue May 18, 2021
- add attribution (mostly stolen from https://polprog.net/blog/netbsdasmprog/)
- replace syscall args comment with the actual one from syscall.c
- rename $hello to $hello_str

(in response to first part of issue #2)
@robohack
Copy link
Owner

Hi Peter,

Thanks very much for your detailed comments and analysis!

I've dealt with the first item (the bug in the comment), and noted the origin of this example -- that's what I get for copy&paste!

It has been a long time since I did any Intel assembly coding, and this is actually my first x86_64-specific toy. Most of my practical experience with assembler is way back when on pdp11, vax, 6502, 1802, etc. and ancient x86, so I definitely appreciate your insight!

BTW, I like the idea of storing the length of the string in memory for other purposes, i.e. not just having a constant in the current assembly unit, so I'll probably keep that as an example, but I'll add a comment about avoiding the storage and using a constant instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants