Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite code generation #21

Merged
merged 61 commits into from
Aug 7, 2022
Merged

Conversation

lucalewin
Copy link
Owner

Reason

The previous method of code generation was messy. It was pretty hard to implement new features like #11 or #12.
Therefore I decided to rewrite the code generation for x86-64 machines completely.

Difference between the old and new generation

The old method was based on functions returning the code they generated which would then be concatenated with the code in the current function. This and using some badly written utility functions resulted in a lot of memory leaks.

The new method uses an additional data structure to store the generated instructions during the generation. After the generation finishes, the data structure will be stringified, which requires a lot less string concatenation.

String Literals

The main reason to work on this was to resolve #11. As previously stated it was nearly impossible to implement it in the old code generation system. Therefore the new system was designed to be able to support local string literals.

This old example

import sys.io::print;

const str: string = "Hello, World!";

function main(): i32 {
    print(str);
    return 0;
}

can now be rewritten in a much simpler form

import sys.io::print;

function main(): i32 {
    print("Hello, World!");
    return 0;
}

which previously was impossible.

String Table

This is now possible because of the implementation of a String Table which stores all string literals that are defined.
The String Table also holds an array of labels to allow multiple variables to reference the same string literal. It also makes sure that if a string literal is defined multiple times, it will only be generated once.

Example:

const str0: string = "Test";
const str1: string = "Different String Literal";
const str2: string = "Test";

function main(): char {
    return str0[2];
}

The string literal "Test" is defined twice, but in the generated assembly code:

global _var_str0__char@1
global _var_str1__char@1
global _var_str2__char@1
global _func_main_
global _start
_var_str0__char@1:
_var_str2__char@1:
_string_0: db `Test`,0
_var_str1__char@1:
_string_1: db `Different String Literal`,0
section .data
section .bss
section .text
_func_main_:
	mov rax, 2
	movsx rax, byte[_var_str0__char@1+1*rax]
	ret
_start:
	call _func_main_
	mov rdi, rax
	mov rax, 60
	syscall

it is only generated once.

Tests

This pull request also contains new tests for conditional expressions and string literals.
But there are also two tests (one new, one old) that currently fail. I'm going to fix them with another pull request.

Tags

#11 #12

This should help to better understand the structure of the compiler
this file is replaced by `inc/assembly/registers.h`
functionality is replaced in `src/generation/arch/x86-64/assembly/registers.c`
for future implementations
to match the structure of the `src` directory
limitations:
- only `number`, `boolean`, `character` literals are supported
- not all binary expression will work as expected --> will be fixed later
when dividing, RAX contains the quotient and RDX contains the remainder. RDX also contains the high bits of the dividend (RDX:RAX).
The problem was, that the compiler did not know that RDX could be non zero.
This lead to the problem, where RDX should be zero, but is set to the remainder of an other division, which results in the wrong result or an FloatingPoint exception -> crash of the program
Implemented Solution: clear RDX before every division with XOR (xor rdx, rdx)
Better solution: the compiler should know if RDX could be non zero and then clear RDX before the division
previously an error was throw when a variable was declared which had the same identifier as a variable in the parent scope.
This is now fixed
the null terminator was missing
@lucalewin lucalewin self-assigned this Aug 7, 2022
@lucalewin lucalewin added documentation Improvements or additions to documentation enhancement New feature or request feature request fix includes a fix to an issue labels Aug 7, 2022
@lucalewin lucalewin added this to the Version 0.4.0 milestone Aug 7, 2022
@lucalewin lucalewin merged commit dff23d7 into develop Aug 7, 2022
@lucalewin lucalewin deleted the lucalewin/remake-code-generation branch August 7, 2022 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request feature request fix includes a fix to an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

restrictions on string literals
1 participant