Skip to content

Assembly

k-off edited this page Sep 28, 2019 · 2 revisions

Assembly syntax

The assembly language has the rule "In one line - one instruction."

An instruction or statement is the smallest autonomous part of a programming language; team or set of commands. A program is usually a sequence of instructions.

Source: Statement - Wikipedia

Empty lines, comments, as well as extra tabs or spaces are ignored.

Comment

There is a constant COMMENT_CHAR in op.h. It determines which character indicates the beginning of the comment.

In the provided file it is the hashtag — #.

That is, everything between the character '#' and the end of the line is considered a comment.

A comment can be located anywhere in the file.

Example #1:

# Codam
# is a programming school

Example #2:

ld %0, r2    # And it is located in the Marine Terrein

Alternative comment

In the provided archive vm_champs.tar in directory champs/examples there is the file bee_gees.s with the champion code, which the original program asm translates into byte-code without errors.

There are two kinds of comments in the code of this champion:

  • standard, which was discussed above;
  • alternative, about which there is no information in subject.

Instead of octotorp (#) here is used ;.

An example of using a comment of this type:

sti r1, %:live, %1    ; Marine Terrein is placed in Amsterdam, Netherlands

How to deal with it?

This type of comment is not described in the subject, but it is supported by the original translator. Therefore we most likely do not have to support it.

But let's add it into the file op.h:

# define ALT_COMMENT_CHAR    ;

It will be the only (excepting the norme) change we make to file op.h.

Champion name

Name of the champion must be defined in the file .s. There is a command in assembly, saved in the constant NAME_CMD_STRING. In the provided file op.h it is defined as .name.

Command .name must be followed by a string containing champion name:

.name    "Batman"

The length of the name must not exceed the value defined in the constant PROG_NAME_LENGTH. In the provided file it equals to 128.

An empty string is a valid champion name:

.name    ""

But lack of the string is an error:

.name    

Champion comment

Also, in the file .s must be present champion's comment.

Command for comment is defined in the constant COMMENT_CMD_STRING in the file op.h as .comment.

The length of comment is restricted by the value of constant COMMENT_LENGTH and must not exceed 2048.

Command .comment is very similar to .name and behaves identically in case of empty string and abscence of the string.

Other commands

Some of the provided .s example files contain the command .extend.

This command and all other commands excepting .name and .comment, are not described in the subject and are detected as an error by the original asm compiler.

Executable code

Champion's executable code consists of instructions.

Assembly language has a rule one instruction per line. The new line character \n means both end of line and end of instruction. So instead of ; as for C language, we will use \n.

This role means that even after the last instruction there must be a new line character. Otherwise the asm will display an error message.

Each instruction consists of several components:

Label

Label consists of characters defined in the constant LABEL_CHARS. In op.h they are: abcdefghijklmnopqrstuvwxyz_0123456789.

Label can not containt characters other than those defined in the constant LABEL_CHARS.

Label must be followed by a character defined in constant LABEL_CHAR. In op.h it is :.

Why do we need labels?

Label points to the instruction which immediately follows this label. Label points to one singe instruction and not to the block of instructions.

.name       "Batman"
.comment    "This city needs me"

loop:
        sti r1, %:live, %1    # <-- operation 'sti' is pointed to by the label 'loop'
live:
        live %0               # <-- operation 'live' is pointed to by the label 'live'
        ld %0, r2             # <-- and this operation is not pointed to by any labels
        zjmp %:loop

Labels make our life easier, by making coding process easier.

How do labels work?

As we know, the code writtedd in assembly language will be transformed into byte-code. The virtual machine will work with the byte-code.

Let's assume we want to create a loop, in which operation live will be performed. We have operation zjmp, which can send us a few bytes forward or backward from current posistion.

To create a loop we have to give to operation zjmp a value. But what is the value? To find it out, we have to calculate how many bytes in the bytecode the operation code and its argument will use.

As we know, operation code is always 1 byte, and operation live has a single argument of size 4 bytes.

It turns out we need to go back 5 bytes back:

live %1
zjmp %-5

Not so difficult, but these calculations take time and it would be much easier to "switch to the operation live". This is what labels are made for.

We simply create a label loop and write it as zjmp operation argument:

loop:    live %1
         zjmp %:loop

Now it is the translators' job to calculate the 'distance' between operations. It will calculate how many bytes to jump back and will insert this value in the byte-code.

Examples

Valid labels:

marker:
        live %0
marker:


        live %0
marker: live %0

All labels from the examples above have the same value and meaning. You may choose any style you want.

Many labels for one operation

Another valid example:

marker:

label:
        live %0

This means label marker, and label label point to a single operation.

Pointing to nothing

Valid example:

marker: 
# End of file

In this case label points to the end of executable code.

Important to have a \n at the end of line. Otherwise translator will print an error message.

Operations and their arguments

Assembly language consists of 16 operations. Each operation has 1 to 3 arguments.

Information about operation names and about arguments they receive is provided in the file op.c.

Operation code Operation name Argument #1 Argument #2 Argument #3
1 live T_DIR
2 ld T_DIR / T_IND T_REG
3 st T_REG T_REG / T_IND
4 add T_REG T_REG T_REG
5 sub T_REG T_REG T_REG
6 and T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG
7 or T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG
8 xor T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG
9 zjmp T_DIR
10 ldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG
11 sti T_REG T_REG / T_DIR / T_IND T_REG / T_DIR
12 fork T_DIR
13 lld T_DIR / T_IND T_REG
14 lldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG
15 lfork T_DIR
16 aff T_REG

Understanding of operations and how they work with the arguments of different types is the base of understanding of "Corewar".

Operations and their arguments

Arguments

There are three types of arguments:

1. Registry — T_REG

Registry is a variable where we can store some data. The size of registries in octets is defined in the file op.h in the constant REG_SIZE and is equal to 4. A registry is a part of cursor (process), but this will be discussed later.

An octet in computer science is eight binary digits.

Source: Octet — Wikipedia

Amount of registries is defined in the constant REG_NUMBER with the value 16, so available regisries are r1, r2, r3 ... r16.

Registry values

During the startup of the virtual machine all the registries, excepting r1, will be initialized with the value of 0.

Registry r1 will contain the negated number of champion.

This number is unique and is required for the operation live to report champion alive.

Cursor placed at the beginning of the player 2, will have in it's registry r1 value -2.

If operation live is executed with the argument -2, virtual machine will know player 2 is alive:

live %-2

2. Direct — T_DIR

Direct argument consists of special character defined in the constant DIRECT_CHAR (%) and a number or label, which represents direct value.

If there is a label in the argument, there must be the character defined in the constant LABEL_CHAR (:) in front of it:

sti r1, %:marker, %1

What are direct and indirect values?

Let's consider a simple example.

We have a number 5. Direct value means we just use 5 as is. That is, 5 is 5.

If 5 is an indirect, it reprsents a relative address, pointing five bytes forward from the current position.

Direct and indirect labels

What is the difference between labels and numbers?

Actually there is no difference at all. The labels are transformed into their numrical equivalents by translator at the compiling stage.

This means labels are numbers, written in the form of words in .s file but replaced by numerical values by translator.

Process of label replacement is describer in the chapter "Why do we need labels?»".

3. Indirect — T_IND

Argument of type indirect can be a number or a label, which will represent an indirect value.

If argument of type T_IND contains a number, there is no need in auxiliar symbols:

ld    5, r7

If indirect argument has a label as value, it must have LABEL_CHAR (:) in front of it:

ld    :label, r7

Delimiter character:

To separate arguments a delimiter character is used. It is defined in SEPARATOR_CHAR constant as ,:

ld    21, r7

Operations

Operation live

Operation code Operation name Argument #1 Argument #2 Argument #3
1 live T_DIR

Description

Objectives of the operation:

  1. Reports that cursor, which performed this live operation, is alive.

  2. If argument of operation live is identical to the number of player saved in the cursor's registryr1, it reports that player is alive. If registry r1 of the carriage executing operation live equals -2, and the argument of opeartion live equals to -2, virtual machine counts player 2 as alive.

What is a cursor?

Here is a brief explanation of the notion. In more detail it will be described in the chapter "Virtual machine".

A cursor is a process that executes the instruction on which it stands.

Let's assume we run our virtual machine with players loaded into it's memory. Every player will obtain a cursor (a process), which will be placed at the beginning of the players code.

3 champions. 3 sectors of memory with loaded executable code. 3 cursors (processes).

Each cursor contains some information:

  • PC (Program Counter) - position of the cursor (process) in memory

  • Registries (r1 ... r16) - variables that store cursor's data; their amount is defined by the constant REG_NUMBER

  • Flag carry - a special boolean variable that affects operation zjmp and can take one of the two values 1 and 0.

  • Number of the cycle in which cursor reported live last time - this information is used to check whether cursor is still alive or not.

In fact, cursor contains more information, but it will be discussed later.

Operation ld

Operation code Operation name Argument #1 Argument #2 Argument #3
2 ld T_DIR / T_IND T_REG

Description

This operation loads value into the registry of the cursor. But its behavior depends on the type of the first argument:

  • How it works with the Argument #1 of type T_DIR

If the first argument is T_DIR, then it will be loaded into the registry as is.

Actions performed:

  1. Write the number from the first argument into the registry received as second argument.

  2. If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.

  • How it works with the Argument #1 of type T_IND

If the first argument is of type T_IND, it represents an address.

If we receive argument of this type, we must truncate it with a modulo operation: <FIRST_ARGUMENT> % IDX_MOD.

What is IDX_MOD?

IDX_MOD is another constant from the file op.h. It's value is defined as (MEM_SIZE / 8), where MEM_SIZE is the size of memory in bytes. The virtual machine's memory size is MEM_SIZE. In this memory the champions will fight.

What is the constant IDX_MOD for? It is used to limit the maximum distance a cursor can jump in the memory. In the file op.h the MEM_SIZE is initialized with the value (4 * 1024), so IDX_MOD equals to 512.

This means a cursor can't move more than 512 bytes away from the current position.

After argument of type T_IND has been truncated, we use it as a relative address in memory - how many bytes forward or backward relative to the current position of the cursor is the position we need.

Actions performed:

  1. Calculate address: current position of the cursor + <FIRST_ARGUMENT> % IDX_MOD.

  2. Read four bytes starting from the obtained address.

  3. Write the value from the step 2 into the registry passed as the second argument.

  4. If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.

Why do we read exactly four bytes?

The registry size and direct value size are defined as 4 in the file op.h:

# define REG_SIZE    4
# define DIR_SIZE    REG_SIZE

We go to the address we've calculated with the argument of type 'T_IND' to read a value. To read the value "as is". That is, to read a value of type T_DIR. And we have to save the read value into the registry. To guarantee the success of the operation, the size of read number must be compatible to the size of the registry.

Also, later we will discover that it is possible to write values from the registry to the address in memory. That is why the size of number read from the memory and size of the registry must be compatible in both directions.

So we read as many bytes as the registry can store.

Operation st

Operation code Operation name Argument #1 Argument #2 Argument #3
3 st T_REG T_REG / T_IND

Description

This operation writes the value from the registry passed to it as the first parameter, but destination depends on the type of the second argument:

  • How it works with the Argument #2 of type T_REG
st    r7, r11

In this case value from the registry 7 is written to the registry 11

  • How it works with the Argument #2 of type T_IND

Type T_IND is related to the memory addresses, so st workflow is:

  1. Truncate indirect value by modulo: % IDX_MOD.

  2. Calculate the address: current address of the cursor + <SECOND_ARGUMENT> % IDX_MOD.

  3. Write the value from the registry passed as first argument into the memory address calculated in the previous step.

Operation add

Operation code Operation name Argument #1 Argument #2 Argument #3
4 add T_REG T_REG T_REG

Description

Arguments of this operation are of the same type T_REG.

  • How it works:
  1. Summ up value of registry number passed to it as first argument with the value of the registry number passed as second argument.

  2. Write the result of sum into the registry number passed to it as third parameter.

  3. If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.

Operation sub

Operation code Operation name Argument #1 Argument #2 Argument #3
5 sub T_REG T_REG T_REG

Description

Arguments of this operation are of the same type T_REG.

  • How it works:
  1. Substract from the value of registry number passed to it as first argument, the value of the registry number passed as second argument.

  2. Write the result of sum into the registry number passed to it as third parameter.

  3. If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.

Operation and

Operation code Operation name Argument #1 Argument #2 Argument #3
6 and T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG

Description

BITWISE AND - Wikipedia

  • How it works:

It performs a bitwise AND operation for the values ​​of the first two arguments and writes the result into the register passed as the third argument

If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.

First and second arguments can be of different types. Here is how to get values we need:

  • Argument #1 or Argument #2 — T_REG

In this case we take the value from the registry passed as argument.

  • Argument #1 or Argument #2 — T_DIR

In this case we take the value passed as argument.

  • Argument #1 or Argument #2 — T_IND

Calculate the address where to read the value from: current cursor position + <ARGUMENT> % IDX_MOD.

Read four bytes from the memory starting with the address calculated in the previous step. It will be the value we need.

Operation or

Operation code Operation name Argument #1 Argument #2 Argument #3
7 or T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG

Description

BITWISE OR - Wikipedia

  • How it works:

Identical to the operation AND described above.

Operation xor

Operation code Operation name Argument #1 Argument #2 Argument #3
8 xor T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG

Description

BITWISE XOR - Wikipedia

  • How it works:

Identical to the operation AND described above.

Operation zjmp

Operation code Operation name Argument #1 Argument #2 Argument #3
9 zjmp T_DIR

Description

This is that function affected by the value of the flag carry.

  • How it works:

If carry is equal to 1, this operation performs a 'jump'. It moves cursor to the address: current position + <FIRST_ARGUMENT> % IDX_MOD.

This operation allows us to jump back and forth to different places in memory and not to execute everything by order.

If carry is equal to 0, 'jump' is not performed.

Operation ldi

Operation code Operation name Argument #1 Argument #2 Argument #3
10 ldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG

Description

This operation saves a value into the registry that was passed as third argument.

  • How it works:

To get the value, it has to read four bytes from the memory. The address of the bytes to be read is calculated as follows:

current position + `(<VALUE_OF_FIRST_ARGUMENT> + <VALUE_OF_SECOND_ARGUMENT>) % IDX_MOD`.

First and second arguments can be of different types. Here is how to get values we need:

  • Argument #1 or Argument #2 — T_REG

Value from the regisry number passed as argument

  • Argument #1 or Argument #2 — T_DIR

Value passed as argument

  • Argument #1 — T_IND

Read four bytes from the address: current position + <FIRST_ARGUMENT> % IDX_MOD.

Operation sti

Operation code Operation name Argument #1 Argument #2 Argument #3
11 sti T_REG T_REG / T_DIR / T_IND T_REG / T_DIR

Description

This operation writes the value from the registry that was passed as first argument.

  • How it works:

Address in memory to write the value to: current position + (<VALUE_OF_SECOND_ARGUMENT> + <VALUE_OF_THIRD_ARGUMENT>) % IDX_MOD.

How to get values is described above.

Operation fork

Operation code Operation name Argument #1 Argument #2 Argument #3
12 fork T_DIR

Description

Operation fork creates duplicate of current cursor (process) and places it to the address <FIRST_ARGUMENT> % IDX_MOD.

** The duplicate cursor is identical to the initial one excepting the position**.

Operation lld

Operation code Operation name Argument #1 Argument #2 Argument #3
13 lld T_DIR / T_IND T_REG

Description

It is a more powerful version of operation ld (see above).

The only difference between them is that in case of T_IND type for the first argument, we have to read four bytes from the address: current position + <FIRST_ARGUMENT>. No modulo truncation required.

the problems of original virtual machine

Original VM corewar, unfortunately malfunctions, and reads two bytes instead of four. Perhaps a this bug is explained by the same lines as the problems in the provided files:

... we might have mistaken a bottle of water for a bottle of vodka.

With the argument of type T_DIR it's identical to operation ld.

Operation lldi

Operation code Operation name Argument #1 Argument #2 Argument #3
14 lldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG

Description

It is a more powerful version of operation ldi (see above).

We have to read four bytes from the address: current position + (<VALUE_OF_FIRST_ARGUMENT> + <VALUE_OF_SECOND_ARGUMENT>). No modulo truncation required.

To get values of arguments of type T_IND we still need to make modulo:

<VALUE_OF_FIRST_ARGUMENT>: read four bytes from current position + <FIRST_ARGUMENT> % IDX_MOD

Operation lfork

Operation code Operation name Argument #1 Argument #2 Argument #3
15 lfork T_DIR

Description

It is a more powerful version of operation fork (see above).

No modulo truncation required.

Operation aff

Operation code Operation name Argument #1 Argument #2 Argument #3
16 aff T_REG

Description

This operation takes the value from the registry passed as argument, casts it to the type char and prints ot to the standard output.

(char)(value)

aff in original corewar

In the original vm corewar by default aff is switched off. To see it's output we have to use flag -a.

Operations table

Code Name Argument #1 Argument #2 Argument #3 Changes carry Description
1 live T_DIR No alive
2 ld T_DIR / T_IND T_REG Yes load
3 st T_REG T_REG / T_IND No store
4 add T_REG T_REG T_REG Yes addition
5 sub T_REG T_REG T_REG Yes subtraction
6 and T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG Yes bitwise AND (&)
7 or T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG Yes bitwise OR (|)
8 xor T_REG / T_DIR / T_IND T_REG / T_DIR / T_IND T_REG Yes bitwise XOR (^)
9 zjmp T_DIR No jump if non-zero
10 ldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG No load index
11 sti T_REG T_REG / T_DIR / T_IND T_REG / T_DIR No store index
12 fork T_DIR No fork
13 lld T_DIR / T_IND T_REG Yes long load
14 lldi T_REG / T_DIR / T_IND T_REG / T_DIR T_REG Yes long load index
15 lfork T_DIR No long fork
16 aff T_REG No aff

Cycles before execution (cycles to wait)

One more detail about operations.

Another important parameter is cycles to wait.

This is amount of cycles a process waits before it executes the operation.

For example, if a curssor (process) gets on the operation fork, it must stay idle for 800 cycles before it actually can execute the operation.

And operation ld stops the process for only five cycles.

This parameter is used to create the game mechanics, in which the most effective and useful functions have the highest cost.

Code Name Cycles to wait
1 live 10
2 ld 5
3 st 5
4 add 10
5 sub 10
6 and 6
7 or 6
8 xor 6
9 zjmp 20
10 ldi 25
11 sti 25
12 fork 800
13 lld 10
14 lldi 50
15 lfork 1000
16 aff 2

THE COMPLETE OPERATIONS TABLE