Skip to content

DS ( Data Structure ) AST ( Abstract Syntax Tree )

License

Notifications You must be signed in to change notification settings

erkobridee/ds-ast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ds-ast

DS ( Data Structure ) AST (Abstract Syntax Tree)

Tip

This knowledge is really useful and it helps whenever you need to process any kind text.

Implementations

Definitions

Lexer

A lexer transforms a sequence of characters into a sequence of tokens.

A compiler lexer is a crucial component in the compilation process of a programming language. It is responsible for breaking down the source code into smaller, meaningful units called tokens or lexemes. These tokens are then fed into the parser, which constructs the abstract syntax tree (AST) of the program.

enum TokenType {
  TOKEN_TYPE = 'TOKEN_TYPE',
  ...
}

type TTokenSpec = [RegExp, TokenType];


/**
 * The line and column values are useful for debugging propose and also
 * to have a better error messaging that points out where the error happens
 *
 * The start and end values are useful to UI implementation that informantion
 * enables to select the given text by its start and end cursor position
 *
 * An example of the start and end cursor position to highlight text could be
 * seen at: https://astexplorer.net/
 */
interface TokenLocation {
  line: number;
  column: number;

  start: number;
  end: number;
}

interface Token {
  type: TokenType;
  lexeme: string;

  /**
   *  This information is useful for debugging and UI implementation of a code editor
   */
  location?: TokenLocation;
}

Key Features of a Compiler Lexer

  • Tokenization: The lexer converts the source code into tokens, which are the smallest syntactic units of the language. These tokens can be identifiers, keywords, literals, operators, or other special characters.

  • Regular Expressions: The lexer uses regular expressions to define the patterns for identifying these tokens. This approach allows for efficient and flexible token recognition.

  • State Transition Table: The lexer can be implemented using a state transition table, which is a table-driven approach that directly jumps to follow-up states via goto statements. This approach can produce faster lexers than hand-coded ones.

Parser

A parser is a software component that takes input data (typically text) and builds a data structure, often a parse tree or abstract syntax tree (AST), giving a structural representation of the input while checking for correct syntax. It is a crucial part of the compilation process, particularly in compiler design.

RDP - Recursive Descent Parser

AST

An Abstract Syntax Tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree-like representation of the source code, abstracting away the syntax and semantics of the programming language. The AST is designed to preserve essential information such as variable types, the location of each declaration, the order of executable statements, left and right components of binary operations, and identifiers and their assigned values.

Tools

References

Theory

The Language of languages

BNF - Backus-Naur Form

EBNF - Extended Backus-Naur Form

Compiler

Lexer

Parser

AST

Implementation

ANTLR - ANother Tool for Language Recognition