New % syntax for directives, clean-up parse.par

- Directives now start with `%`, which is more convenient to yacc; The old style `#` is still allowed, but should not be used anymore. - `parse.par` cleaned-up to unecessary parts, e.g. userdef options - Examples converted into new `%`-directive format
phorward · Nov 1, 2023 · 7e1ff59 · 7e1ff59
1 parent 69a3b4d
commit 7e1ff59
Show file tree

Hide file tree

Showing 14 changed files with 8,250 additions and 16,549 deletions.
diff --git a/Makefile.gnu b/Makefile.gnu
@@ -33,6 +33,9 @@ clean:
 src/proto.h:
 	lib/pproto src/*.c | awk "/int _parse/ { next } { print }" >$@
 
+src/parse.c src/parse.h: src/parse.par
+	unicc -o src/parse src/parse.par
+
 make_install:
 	cp Makefile.gnu Makefile
 

diff --git a/README.md b/README.md
@@ -16,9 +16,9 @@
 
 **unicc** is a parser generator that compiles an extended grammar definition into program source code that parses the described grammar. Since UniCC is target language independent, it can be configured via template definition files to generate parsers in any programming language.
 
-UniCC natively supports the programming languages **C**, **C++**, **Python** and **JavaScript**. Parse tables can also be generated in **JSON** and **XML**.
+UniCC natively supports the programming languages **C**, **C++**, **Python** and **JavaScript**. Parse tables can also be generated to **JSON**. Parsers for other programming languages can be easily adapted.
 
-UniCC can generate both scannerless parsers and parsers with a separate scanner. The more powerful scannerless parsing is the default and allows the barrier between the grammar and its tokens to be broken, leaving the tokens under the full control of the context-free grammar. Scannerless parsing requires that the provided grammar is rewritten internally according to the whitespace and lexeme settings.
+UniCC is capable to generate both scannerless parsers and parsers with a separate scanner. The more powerful scannerless parsing is the default and allows the barrier between the grammar and its tokens to be broken, leaving the tokens under the full control of the context-free grammar. Scannerless parsing requires that the provided grammar is rewritten internally according to the whitespace and lexeme settings.
 
 ## Examples
 
@@ -27,10 +27,10 @@ Below is the full definition of a simple, universal grammar example that can be
 This example uses the automatic abstract syntax tree construction syntax to define nodes and leafs of the resulting syntax tree.
 
 ```unicc
-#whitespaces    ' \t';
+%whitespaces    ' \t';
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 @int            '0-9'+           = int;
 
@@ -53,17 +53,17 @@ add
  int (1337)
 ```
 
-Next is a (more complex) version of the four-function arithmetic syntax including their calculation semantics, for integer values. In this example, the scannerless parsing capabilities of UniCC are used to parse the **int** value from its single characters, so the symbol **int** is configured to be handled as a `lexeme`, which influences the behavior how whitespace is handled.
+Next is a (more complex) version of the four-function arithmetic syntax including their calculation semantics, for integer values. In this example, the scannerless parsing capabilities of UniCC are used to parse the **int** value from its single characters, so the symbol **int** is configured to be handled as a `lexeme`, which influences the behavior of how whitespace is handled.
 
 ```unicc
-#!language      C;	// <- target language!
+%!language      C;	// <- target language!
 
-#whitespaces    ' \t';
-#lexeme         int;
-#default action [* @@ = @1 *];
+%whitespaces    ' \t';
+%lexeme         int;
+%default action [* @@ = @1 *];
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 calc$           : expr                 [* printf( "= %d\n", @expr ) *]
                 ;
@@ -81,12 +81,16 @@ int             : '0-9'                [* @@ = @1 - '0' *]
                 ;
 ```
 
-To build and run this example, run the following commands
+To build this example, run the following commands
 
-```
+```bash
 $ unicc expr.par
 $ cc -o expr expr.c
-./expr -sl
+```
+
+Afterwards, you can run the expression parser like this
+```bash
+$ ./expr -sl
 42 * 23 + 1337
 = 2303
 ```
@@ -103,12 +107,9 @@ UniCC provides the following features and tools:
 - Generates standalone (dependency-less) parsers in
   - C
   - C++
-  - Python 2 (deprecated)
-  - Python 3
+  - Python (>= 2.7, tested until 3.11)
   - JavaScript (ES2018)
-- Provides facilities to generate parse tables as
-  - JSON
-  - XML (deprecated)
+- Provides facilities to generate parse tables into JSON
 - Scannerless parser supported by default
 - Full Unicode processing built-in
 - Grammar prototyping features
@@ -127,16 +128,16 @@ The [UniCC User's Manual](http://downloads.phorward-software.com/unicc/unicc.pdf
 
 UniCC can be build and installed like any GNU-style program, with
 
-```sh
-./configure
-make
-make install
+```bash
+$ . /configure
+$ make
+$ make install
 ```
 
 Alternatively, the dev-toolchain can be used, by just calling on any recent Linux system.
 
-```sh
-make -f Makefile.gnu
+```bash
+$ make -f Makefile.gnu
 ```
 
 ## License

diff --git a/examples/bas.par b/examples/bas.par
@@ -1,13 +1,13 @@
 //Some grammar-related directives
-#whitespaces                ' \t';
-#lexeme                     integer;
-#default action             [* @@ = @1; *];
-#case insensitive strings   on;
+%whitespaces                ' \t';
+%lexeme                     integer;
+%default action             [* @@ = @1; *];
+%case insensitive strings   on;
 
-#left                       '+' '-';
-#left                       '*' '/';
+%left                       '+' '-';
+%left                       '*' '/';
 
-#prologue
+%prologue
 [*
 #include <stdlib.h>
 #include <stdio.h>

diff --git a/examples/c.par b/examples/c.par
@@ -8,11 +8,11 @@
 	Rewritten for UniCC by Jan Max Meyer, 2011
 */
 
-#!mode scannerless ;
-#!language	"C" ;
+%!mode scannerless ;
+%!language	"C" ;
 
 // Parser Configuration
-#whitespaces 			@white ;
+%whitespaces 			@white ;
 
 @white					' \t\v\r\n\f'+
 						| "/*" .* "*/"
@@ -221,7 +221,7 @@ type_specifier
 	| enum_specifier
 	| TYPE_NAME
 	;
-	
+
 
 
 struct_or_union_specifier
@@ -444,6 +444,6 @@ function_definition
 IDENTIFIER
 TYPE_NAME
 	: 	@IDENTIFIER ;
-	
+
 	/* 	Decision has to be done semantically if
 		IDENTIFIER or TYPE_NAME is the case. */
diff --git a/examples/dates.par b/examples/dates.par
@@ -1,5 +1,5 @@
-#whitespaces   ' \t';
-#lexeme        integer title;
+%whitespaces   ' \t';
+%lexeme        integer title;
 
 appointment$   : date title
                | title date

diff --git a/examples/expr.ast.par b/examples/expr.ast.par
@@ -1,7 +1,7 @@
-#whitespaces    ' \t';
+%whitespaces    ' \t';
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 @int            '0-9'+           = int;
 

diff --git a/examples/expr.c.par b/examples/expr.c.par
@@ -1,11 +1,11 @@
-#!language      C;
+%!language      C;
 
-#whitespaces    ' \t';
-#lexeme         int;
-#default action [* @@ = @1 *];
+%whitespaces    ' \t';
+%lexeme         int;
+%default action [* @@ = @1 *];
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 calc$           : expr                 [* printf( "= %d\n", @expr ) *]
                 ;

diff --git a/examples/expr.cpp.par b/examples/expr.cpp.par
@@ -1,13 +1,13 @@
-#!language      "C++";
+%!language      "C++";
 
-#whitespaces    ' \t';
-#lexeme         int;
-#default action	[* @@ = @1 *];
+%whitespaces    ' \t';
+%lexeme         int;
+%default action	[* @@ = @1 *];
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
-#prologue		[*
+%prologue		[*
     #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>

diff --git a/examples/expr.js.par b/examples/expr.js.par
@@ -1,11 +1,11 @@
-#!language     javascript;
+%!language     javascript;
 
-#whitespaces    ' \t';
-#lexeme         int;
-#default action	[* @@ = @1 *];
+%whitespaces    ' \t';
+%lexeme         int;
+%default action	[* @@ = @1 *];
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 calc$           : expr                 [* console.log("= %d", @expr);
                                             @@ = @expr;

diff --git a/examples/expr.py.par b/examples/expr.py.par
@@ -1,11 +1,11 @@
-#!language      Python;
+%!language      Python;
 
-#whitespaces    ' \t';
-#lexeme         int;
-#default action	[*@@ = @1*];
+%whitespaces    ' \t';
+%lexeme         int;
+%default action	[*@@ = @1*];
 
-#left           '+' '-';
-#left           '*' '/';
+%left           '+' '-';
+%left           '*' '/';
 
 calc$           : expr                 [*print("= %d" % @expr)*]
                 ;

diff --git a/examples/xpl.par b/examples/xpl.par
@@ -1,22 +1,22 @@
 //Meta information
-#prefix             "xpl";
+%prefix             "xpl";
 
 //Precedence and associativity
-#left               "=";
+%left               "=";
 
-#left               "=="
+%left               "=="
                     "!="
                     "<="
                     ">="
                     '>'
                     '<'
                     ;
 
-#left               '+'
+%left               '+'
                     '-'
                     ;
 
-#left               '*'
+%left               '*'
                     '/'
                     ;
 
@@ -39,11 +39,11 @@ real                :       real_integer '.' real_fraction
 real_integer        :       real_integer '0-9'
                     |       '0-9'
                     ;
-                    
+
 real_fraction       :       real_fraction '0-9'
                     |       '0-9'
                     ;
-                    
+
 //Whitespace grammar construct
 #whitespaces        whitespace
                     ;