Skip to content

Commit

Permalink
New % syntax for directives, clean-up parse.par
Browse files Browse the repository at this point in the history
- Directives now start with `%`, which is more convenient to yacc;
  The old style `#` is still allowed, but should not be used anymore.
- `parse.par` cleaned-up to unecessary parts, e.g. userdef options
- Examples converted into new `%`-directive format
  • Loading branch information
phorward committed Nov 1, 2023
1 parent 69a3b4d commit 7e1ff59
Show file tree
Hide file tree
Showing 14 changed files with 8,250 additions and 16,549 deletions.
3 changes: 3 additions & 0 deletions Makefile.gnu
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ clean:
src/proto.h:
lib/pproto src/*.c | awk "/int _parse/ { next } { print }" >$@

src/parse.c src/parse.h: src/parse.par
unicc -o src/parse src/parse.par

make_install:
cp Makefile.gnu Makefile

Expand Down
53 changes: 27 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@

**unicc** is a parser generator that compiles an extended grammar definition into program source code that parses the described grammar. Since UniCC is target language independent, it can be configured via template definition files to generate parsers in any programming language.

UniCC natively supports the programming languages **C**, **C++**, **Python** and **JavaScript**. Parse tables can also be generated in **JSON** and **XML**.
UniCC natively supports the programming languages **C**, **C++**, **Python** and **JavaScript**. Parse tables can also be generated to **JSON**. Parsers for other programming languages can be easily adapted.

UniCC can generate both scannerless parsers and parsers with a separate scanner. The more powerful scannerless parsing is the default and allows the barrier between the grammar and its tokens to be broken, leaving the tokens under the full control of the context-free grammar. Scannerless parsing requires that the provided grammar is rewritten internally according to the whitespace and lexeme settings.
UniCC is capable to generate both scannerless parsers and parsers with a separate scanner. The more powerful scannerless parsing is the default and allows the barrier between the grammar and its tokens to be broken, leaving the tokens under the full control of the context-free grammar. Scannerless parsing requires that the provided grammar is rewritten internally according to the whitespace and lexeme settings.

## Examples

Expand All @@ -27,10 +27,10 @@ Below is the full definition of a simple, universal grammar example that can be
This example uses the automatic abstract syntax tree construction syntax to define nodes and leafs of the resulting syntax tree.

```unicc
#whitespaces ' \t';
%whitespaces ' \t';
#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';
@int '0-9'+ = int;
Expand All @@ -53,17 +53,17 @@ add
int (1337)
```

Next is a (more complex) version of the four-function arithmetic syntax including their calculation semantics, for integer values. In this example, the scannerless parsing capabilities of UniCC are used to parse the **int** value from its single characters, so the symbol **int** is configured to be handled as a `lexeme`, which influences the behavior how whitespace is handled.
Next is a (more complex) version of the four-function arithmetic syntax including their calculation semantics, for integer values. In this example, the scannerless parsing capabilities of UniCC are used to parse the **int** value from its single characters, so the symbol **int** is configured to be handled as a `lexeme`, which influences the behavior of how whitespace is handled.

```unicc
#!language C; // <- target language!
%!language C; // <- target language!
#whitespaces ' \t';
#lexeme int;
#default action [* @@ = @1 *];
%whitespaces ' \t';
%lexeme int;
%default action [* @@ = @1 *];
#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';
calc$ : expr [* printf( "= %d\n", @expr ) *]
;
Expand All @@ -81,12 +81,16 @@ int : '0-9' [* @@ = @1 - '0' *]
;
```

To build and run this example, run the following commands
To build this example, run the following commands

```
```bash
$ unicc expr.par
$ cc -o expr expr.c
./expr -sl
```

Afterwards, you can run the expression parser like this
```bash
$ ./expr -sl
42 * 23 + 1337
= 2303
```
Expand All @@ -103,12 +107,9 @@ UniCC provides the following features and tools:
- Generates standalone (dependency-less) parsers in
- C
- C++
- Python 2 (deprecated)
- Python 3
- Python (>= 2.7, tested until 3.11)
- JavaScript (ES2018)
- Provides facilities to generate parse tables as
- JSON
- XML (deprecated)
- Provides facilities to generate parse tables into JSON
- Scannerless parser supported by default
- Full Unicode processing built-in
- Grammar prototyping features
Expand All @@ -127,16 +128,16 @@ The [UniCC User's Manual](http://downloads.phorward-software.com/unicc/unicc.pdf

UniCC can be build and installed like any GNU-style program, with

```sh
./configure
make
make install
```bash
$ . /configure
$ make
$ make install
```

Alternatively, the dev-toolchain can be used, by just calling on any recent Linux system.

```sh
make -f Makefile.gnu
```bash
$ make -f Makefile.gnu
```

## License
Expand Down
14 changes: 7 additions & 7 deletions examples/bas.par
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
//Some grammar-related directives
#whitespaces ' \t';
#lexeme integer;
#default action [* @@ = @1; *];
#case insensitive strings on;
%whitespaces ' \t';
%lexeme integer;
%default action [* @@ = @1; *];
%case insensitive strings on;

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

#prologue
%prologue
[*
#include <stdlib.h>
#include <stdio.h>
Expand Down
10 changes: 5 additions & 5 deletions examples/c.par
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
Rewritten for UniCC by Jan Max Meyer, 2011
*/

#!mode scannerless ;
#!language "C" ;
%!mode scannerless ;
%!language "C" ;

// Parser Configuration
#whitespaces @white ;
%whitespaces @white ;

@white ' \t\v\r\n\f'+
| "/*" .* "*/"
Expand Down Expand Up @@ -221,7 +221,7 @@ type_specifier
| enum_specifier
| TYPE_NAME
;



struct_or_union_specifier
Expand Down Expand Up @@ -444,6 +444,6 @@ function_definition
IDENTIFIER
TYPE_NAME
: @IDENTIFIER ;

/* Decision has to be done semantically if
IDENTIFIER or TYPE_NAME is the case. */
4 changes: 2 additions & 2 deletions examples/dates.par
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#whitespaces ' \t';
#lexeme integer title;
%whitespaces ' \t';
%lexeme integer title;

appointment$ : date title
| title date
Expand Down
6 changes: 3 additions & 3 deletions examples/expr.ast.par
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#whitespaces ' \t';
%whitespaces ' \t';

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

@int '0-9'+ = int;

Expand Down
12 changes: 6 additions & 6 deletions examples/expr.c.par
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#!language C;
%!language C;

#whitespaces ' \t';
#lexeme int;
#default action [* @@ = @1 *];
%whitespaces ' \t';
%lexeme int;
%default action [* @@ = @1 *];

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

calc$ : expr [* printf( "= %d\n", @expr ) *]
;
Expand Down
14 changes: 7 additions & 7 deletions examples/expr.cpp.par
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
#!language "C++";
%!language "C++";

#whitespaces ' \t';
#lexeme int;
#default action [* @@ = @1 *];
%whitespaces ' \t';
%lexeme int;
%default action [* @@ = @1 *];

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

#prologue [*
%prologue [*
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
Expand Down
12 changes: 6 additions & 6 deletions examples/expr.js.par
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#!language javascript;
%!language javascript;

#whitespaces ' \t';
#lexeme int;
#default action [* @@ = @1 *];
%whitespaces ' \t';
%lexeme int;
%default action [* @@ = @1 *];

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

calc$ : expr [* console.log("= %d", @expr);
@@ = @expr;
Expand Down
12 changes: 6 additions & 6 deletions examples/expr.py.par
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#!language Python;
%!language Python;

#whitespaces ' \t';
#lexeme int;
#default action [*@@ = @1*];
%whitespaces ' \t';
%lexeme int;
%default action [*@@ = @1*];

#left '+' '-';
#left '*' '/';
%left '+' '-';
%left '*' '/';

calc$ : expr [*print("= %d" % @expr)*]
;
Expand Down
14 changes: 7 additions & 7 deletions examples/xpl.par
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
//Meta information
#prefix "xpl";
%prefix "xpl";

//Precedence and associativity
#left "=";
%left "=";

#left "=="
%left "=="
"!="
"<="
">="
'>'
'<'
;

#left '+'
%left '+'
'-'
;

#left '*'
%left '*'
'/'
;

Expand All @@ -39,11 +39,11 @@ real : real_integer '.' real_fraction
real_integer : real_integer '0-9'
| '0-9'
;

real_fraction : real_fraction '0-9'
| '0-9'
;

//Whitespace grammar construct
#whitespaces whitespace
;
Expand Down
Loading

0 comments on commit 7e1ff59

Please sign in to comment.