Constant folding for Python and other runtimes #3701

KvanTTT · 2022-05-07T20:01:29Z

@SimonStPeter I've created a benchmark with respective grammar and have got the following result:

no_const_folding: 2010734 ns
const_folding: 907140 ns
ratio: 0.4511

I haven't tested on real grammars, but It's already signficant improvement, more 2x faster and it should be considered. Also, resulting file > 20 % less than the file with not folded constants.

There is no improvement in speed for JavaScript.

fixes #3698, fixes #3699

@parrt I suggest using constant folding for all runtimes because:

It reduces the size of generated parsers (more than 20% as I've checked for Python and JavaScript)
It eliminates very long lines that are hard to read by human and text editors. Just compare (there are a lot of similar lines for SQL parsers):
- Current: if not(((((_la - 1)) & ~0x3f) == 0 and ((1 << (_la - 1)) & ((1 << (PParser.T0 - 1)) | (1 << (PParser.T1 - 1)) | (1 << (PParser.T2 - 1)) | (1 << (PParser.T3 - 1)) | (1 << (PParser.T4 - 1)) | (1 << (PParser.T5 - 1)) | (1 << (PParser.T6 - 1)) | (1 << (PParser.T7 - 1)) | (1 << (PParser.T8 - 1)) | (1 << (PParser.T9 - 1)) | (1 << (PParser.T10 - 1)) | (1 << (PParser.T11 - 1)) | (1 << (PParser.T12 - 1)) | (1 << (PParser.T13 - 1)) | (1 << (PParser.T14 - 1)) | (1 << (PParser.T15 - 1)) | (1 << (PParser.T16 - 1)) | (1 << (PParser.T17 - 1)) | (1 << (PParser.T18 - 1)) | (1 << (PParser.T19 - 1)) | (1 << (PParser.T20 - 1)) | (1 << (PParser.T21 - 1)) | (1 << (PParser.T22 - 1)) | (1 << (PParser.T23 - 1)) | (1 << (PParser.T24 - 1)) | (1 << (PParser.T25 - 1)) | (1 << (PParser.T26 - 1)) | (1 << (PParser.T27 - 1)) | (1 << (PParser.T28 - 1)) | (1 << (PParser.T29 - 1)) | (1 << (PParser.T30 - 1)) | (1 << (PParser.T31 - 1)) | (1 << (PParser.T32 - 1)) | (1 << (PParser.T33 - 1)) | (1 << (PParser.T34 - 1)) | (1 << (PParser.T35 - 1)) | (1 << (PParser.T36 - 1)) | (1 << (PParser.T37 - 1)) | (1 << (PParser.T38 - 1)) | (1 << (PParser.T39 - 1)) | (1 << (PParser.T40 - 1)) | (1 << (PParser.T41 - 1)) | (1 << (PParser.T42 - 1)) | (1 << (PParser.T43 - 1)) | (1 << (PParser.T44 - 1)) | (1 << (PParser.T45 - 1)) | (1 << (PParser.T46 - 1)) | (1 << (PParser.T47 - 1)) | (1 << (PParser.T48 - 1)) | (1 << (PParser.T49 - 1)) | (1 << (PParser.T50 - 1)) | (1 << (PParser.T51 - 1)) | (1 << (PParser.T52 - 1)) | (1 << (PParser.T53 - 1)) | (1 << (PParser.T54 - 1)) | (1 << (PParser.T55 - 1)) | (1 << (PParser.T56 - 1)) | (1 << (PParser.T57 - 1)) | (1 << (PParser.T58 - 1)) | (1 << (PParser.T59 - 1)) | (1 << (PParser.T60 - 1)) | (1 << (PParser.T61 - 1)) | (1 << (PParser.T62 - 1)) | (1 << (PParser.T63 - 1)))) != 0)):
- Folded: if not((((_la - 1) & ~0x3f) == 0 and (1 << (_la - 1)) & -1) != 0):
Actually such expressions almost don't clarity how parser works in debug mode (if it's important).
It improves compilation speed (for compilable languages) or interpretation performance (for dynamic languages such as Python or JavaScript)

KvanTTT · 2022-05-07T20:17:55Z

CI improvement for Python: 12s vs >13s vs (>10%)

parrt · 2022-05-07T20:25:38Z

Looks cool. I'll check as soon as I can.

SimonStPeter · 2022-05-07T21:52:08Z

Good stuff. I'd be really interested to see how it works on a large grammar. If you've fixed the python perf problem that would be Most Excellent Indeed. Fingers crossed. cheers jan

…

On 07/05/2022 21:02, Ivan Kochurkin wrote: @SimonStPeter <https://github.com/SimonStPeter> I've created a bechmark <https://github.com/KvanTTT/AntlrBenchmarks/blob/master/ConstantFoldingBenchmark/checkParsers.py> with respective grammar and have got the following result: |no_const_folding: 2010734 ns const_folding: 907140 ns ratio: 0.4511 | I haven't tested on real grammars but It's already signficant improvement, more 2x faster and it should be considered. Also, resulting file > 20 % less than the file with not folded constants. There is no improvement in speed for JavaScript. fixes #3698 <#3698>, fixes #3699 <#3699> @parrt <https://github.com/parrt> I suggest use constant folding for all runtimes because: * It reduces the size of generated parsers (more than 20% as I've checked for Python and JavaScript) * It eliminates very long lines that are hard to read by human and text editors. Just compare: o Current: |if not(((((_la - 1)) & ~0x3f) == 0 and ((1 << (_la - 1)) & ((1 << (PParser.T0 - 1)) | (1 << (PParser.T1 - 1)) | (1 << (PParser.T2 - 1)) | (1 << (PParser.T3 - 1)) | (1 << (PParser.T4 - 1)) | (1 << (PParser.T5 - 1)) | (1 << (PParser.T6 - 1)) | (1 << (PParser.T7 - 1)) | (1 << (PParser.T8 - 1)) | (1 << (PParser.T9 - 1)) | (1 << (PParser.T10 - 1)) | (1 << (PParser.T11 - 1)) | (1 << (PParser.T12 - 1)) | (1 << (PParser.T13 - 1)) | (1 << (PParser.T14 - 1)) | (1 << (PParser.T15 - 1)) | (1 << (PParser.T16 - 1)) | (1 << (PParser.T17 - 1)) | (1 << (PParser.T18 - 1)) | (1 << (PParser.T19 - 1)) | (1 << (PParser.T20 - 1)) | (1 << (PParser.T21 - 1)) | (1 << (PParser.T22 - 1)) | (1 << (PParser.T23 - 1)) | (1 << (PParser.T24 - 1)) | (1 << (PParser.T25 - 1)) | (1 << (PParser.T26 - 1)) | (1 << (PParser.T27 - 1)) | (1 << (PParser.T28 - 1)) | (1 << (PParser.T29 - 1)) | (1 << (PParser.T30 - 1)) | (1 << (PParser.T31 - 1)) | (1 << (PParser.T32 - 1)) | (1 << (PParser.T33 - 1)) | (1 << (PParser.T34 - 1)) | (1 << (PParser.T35 - 1)) | (1 << (PParser.T36 - 1)) | (1 << (PParser.T37 - 1)) | (1 << (PParser.T38 - 1)) | (1 << (PParser.T39 - 1)) | (1 << (PParser.T40 - 1)) | (1 << (PParser.T41 - 1)) | (1 << (PParser.T42 - 1)) | (1 << (PParser.T43 - 1)) | (1 << (PParser.T44 - 1)) | (1 << (PParser.T45 - 1)) | (1 << (PParser.T46 - 1)) | (1 << (PParser.T47 - 1)) | (1 << (PParser.T48 - 1)) | (1 << (PParser.T49 - 1)) | (1 << (PParser.T50 - 1)) | (1 << (PParser.T51 - 1)) | (1 << (PParser.T52 - 1)) | (1 << (PParser.T53 - 1)) | (1 << (PParser.T54 - 1)) | (1 << (PParser.T55 - 1)) | (1 << (PParser.T56 - 1)) | (1 << (PParser.T57 - 1)) | (1 << (PParser.T58 - 1)) | (1 << (PParser.T59 - 1)) | (1 << (PParser.T60 - 1)) | (1 << (PParser.T61 - 1)) | (1 << (PParser.T62 - 1)) | (1 << (PParser.T63 - 1)))) != 0)):| o Folded: |if not((((_la - 1) & ~0x3f) == 0 and (1 << (_la - 1)) & -1) != 0):| * Actually such expressions don't clarity how parser works in debug mode. * It improves compilation speed (for compilable languages) or interpretation performance (for Python) ------------------------------------------------------------------------ You can view, comment on, or merge this pull request online at: #3701 <#3701> Commit Summary * f4b2fe0 <f4b2fe0> Fold constants for Python and JavaScript runtimes, fixes #3698, fixes #3699 File Changes (5 files <https://github.com/antlr/antlr4/pull/3701/files>) * *M* runtime-testsuite/test/org/antlr/v4/test/runtime/GeneratedLexerDescriptors.java <https://github.com/antlr/antlr4/pull/3701/files#diff-859a1117c3f13396162959e824858c01d179a61f3280ab64fd68c9c9490b2729> (38) * *M* tool/resources/org/antlr/v4/tool/templates/codegen/JavaScript/JavaScript.stg <https://github.com/antlr/antlr4/pull/3701/files#diff-3a264c84c42865f82b0840e2dbacb7bc141d8bc92b7199f3cc14780063b17813> (4) * *M* tool/resources/org/antlr/v4/tool/templates/codegen/Python2/Python2.stg <https://github.com/antlr/antlr4/pull/3701/files#diff-5704e73d866662c1b798f5f182bc3b42d0d4bd65623c1b6b288acc7fbe510afd> (4) * *M* tool/resources/org/antlr/v4/tool/templates/codegen/Python3/Python3.stg <https://github.com/antlr/antlr4/pull/3701/files#diff-5095eaa7b541d61912c44e0261e6b1907b71b6f48a7b1eb7d997c233d2de0b47> (4) * *M* tool/src/org/antlr/v4/codegen/model/TestSetInline.java <https://github.com/antlr/antlr4/pull/3701/files#diff-8a7295a1afd0032c88ec98f46531cf0c4a8428826e0402329ad916d4c81f2184> (44) Patch Links: * https://github.com/antlr/antlr4/pull/3701.patch <https://github.com/antlr/antlr4/pull/3701.patch> * https://github.com/antlr/antlr4/pull/3701.diff <https://github.com/antlr/antlr4/pull/3701.diff> — Reply to this email directly, view it on GitHub <#3701>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNF57LFP2NPDHRHRTYMKZTVI3D4VANCNFSM5VK3FXWQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

KvanTTT · 2022-05-08T15:41:55Z

Also I've completed the following:

Go: getInlineTestSetWordSize 32 -> 64 (go actually supports long integers)
Dart: get rid of BigInt
Swift: optimize TestSetInline (also should improve performance)

parrt · 2022-06-25T17:17:10Z

Should we look at this again after completing your other testing upgrades? @KvanTTT

KvanTTT · 2022-06-25T17:23:14Z

Yes, I think it makes sense to get rid of very long lines in generated sources.

KvanTTT · 2022-07-03T18:24:30Z

Eventually, it's ready for review.

tool/resources/org/antlr/v4/tool/templates/codegen/Python2/Python2.stg

KvanTTT · 2022-08-23T18:27:50Z

@parrt any chance it will be merged?

tool/src/org/antlr/v4/codegen/model/Choice.java

tool/src/org/antlr/v4/codegen/Target.java

parrt · 2022-08-27T20:01:07Z

Happy to merge once you add the parentheses and change the name of the method! thanks!

tool/resources/org/antlr/v4/tool/templates/codegen/Java/Java.stg

parrt · 2022-08-28T20:58:45Z

tool/src/org/antlr/v4/codegen/target/Python3Target.java

@@ -87,4 +87,7 @@ public boolean wantsBaseVisitor() {
 	public boolean supportsOverloadedMethods() {
 		return false;
 	}
+
+	@Override
+	public boolean supportsConstants() { return false; }


Hang on. I'm just not getting this. Why not just always do this if it's better? All targets can handle bitsets right?

Also, this is turned off for Python but Python is what we want to fix. Maybe flip the boolean sense of the function name so True means do this new stuff.

Yeah, I asked this via email (which got lost), so again:

Hi, silly question - why conditionally fold, why not just always fold for all targets? Seems odd, am I misunderstanding something?

(worth having a 'do not fold' option for debugging purposes, so you can examine the output, but only for human use).

cheers

jan

Further, as I can't find the right post and my other email also disappeared, there's a question about the nomenclature of 'folding' and what a 'constant' is:

Ivan says > "is being folded to constant 0xFFFF. But Python doesn't support constant propagation since Python is unable to determine whether a variable is constant or not."

My take is folding = evaluation and a constant here is more accurately described as a literal.

I love the smell of bikeshedding in the morning...

Constant folding is executed for all runtimes during generation. It's used for folding long expressions with bitwise operations. Expressions like this:

if not(((((_la - 1)) & ~0x3f) == 0 and ((1 << (_la - 1)) & ((1 << (PParser.T0 - 1)) | (1 << (PParser.T1 - 1)) | (1 << (PParser.T2 - 1)) | (1 << (PParser.T3 - 1)) | (1 << (PParser.T4 - 1)) | (1 << (PParser.T5 - 1)) | (1 << (PParser.T6 - 1)) | (1 << (PParser.T7 - 1)) | (1 << (PParser.T8 - 1)) | (1 << (PParser.T9 - 1)) | (1 << (PParser.T10 - 1)) | (1 << (PParser.T11 - 1)) | (1 << (PParser.T12 - 1)) | (1 << (PParser.T13 - 1)) | (1 << (PParser.T14 - 1)) | (1 << (PParser.T15 - 1)) | (1 << (PParser.T16 - 1)) | (1 << (PParser.T17 - 1))

Constant propagation is used only for targets that don't support constant concept (Python, JavaScript), it differs from folding. For example, in the following code (it's not related to bitwise stuff):

self.enterRule(localctx, 0, self.RULE_r)

RULE_r will be replaced with constant literal:

self.enterRule(localctx, 0, 0)

but it will be saved for compilable languages. I thought it would be more clear without affecting performance, but we can use constant propagation for all targets as well (together with constant folding).

Very interesting. Yes I can imagine in python that would be much much slower because it is doing symbol look up. Is there a way to distinguish between singletons like match(C.TYPEDEF) and sets as you have above? In other words how about we do your constant propagation but we limited to sets. Once you go beyond one token it's unlikely people are trying to read that for debugging purposes.

Is there a way to distinguish between singletons like match(C.TYPEDEF) and sets as you have above?

Yes, it's possible, but I implemented constant propagation for such cases because they are located in parser code and may affect the performance in theory as well. I can do it only for sets if you wish.

yeah, readability is important and match(T) should be left as-is. thanks!

Done. I managed to fix it without introducing the new member supportsConstant or similar.

Some tests are failing, I'll take a look.

Go: getInlineTestSetWordSize 32 -> 64 Dart: get rid of BigInt Swift: optimize TestSetInline Python: fixes antlr#3698 JavaScript: fixes antlr#3699 Signed-off-by: Ivan Kochurkin <[email protected]>

Update getMultiTokenAlternativeDescriptor test fixes antlr#3703 Signed-off-by: Ivan Kochurkin <[email protected]>

KvanTTT · 2022-08-31T00:03:22Z

Eventually done. Some fails on C# tests, but are unrelated.

parrt · 2022-08-31T20:53:11Z

Eventually done. Some fails on C# tests, but are unrelated.

rerunning C# now...

parrt · 2022-08-31T21:07:23Z

Nice work! I'll merge it in and then verify the Java gen'd code still looks OK haha

SimonStPeter · 2022-10-11T07:02:56Z

It's arguably being 'evaluated' so perhaps 'folding' -> 'evaluation'. Also these are constants, but more precisely I'd say they are literal constants, or literals for short. So maybe 'constant' -> 'literal' jan

…

On 28/08/2022 12:23, Ivan Kochurkin wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In tool/src/org/antlr/v4/codegen/model/Choice.java <#3701 (comment)>: > for (IntervalSet s : altLookSets) { - altLook.add(factory.getGenerator().getTarget().getTokenTypesAsTargetLabels(factory.getGrammar(), s.toArray())); + if (target.supportsConstants()) { Not sure it's correct naming. Python supports constant folding, i.e. the following expression |(1 << 1) | (1 << 2) | (1 << 3) | (1 << 4) | (1 << 5) | (1 << 6) | \ (1 << 7) | (1 << 8) | (1 << 9) | (1 << 10) | (1 << 11) | (1 << 12) | \ (1 << 13) | (1 << 14) | (1 << 15) | (1 << 16) | is being folded to constant |0xFFFF|. But Python doesn't support constant propagation since Python is unable to determine whether a variable is constant or not. — Reply to this email directly, view it on GitHub <#3701 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNF57NDAG5JGNVYEKXFX53V3ND4BANCNFSM5VK3FXWQ>. You are receiving this because you were mentioned.Message ID: ***@***.***>

parrt · 2022-10-22T01:06:53Z

@KvanTTT I have a warning from C++ where stuff like:

((_la & ~ 0x3fULL) == 0) &&
      ((1ULL << _la) & -41056) != 0 || (((_la - 64) & ~ 0x3fULL) == 0) &&
      ((1ULL << (_la - 64)) & 32767) != 0

gets warnings

$ clang++ -Werror,-Wlogical-op-parentheses t.cc
warning: unknown -Werror warning specifier: '-Werror,-Wlogical-op-parentheses' [-Wunknown-warning-option]
1 warning generated.
parrt-macbookpro2:tmp parrt$ clang++ -Werror -Wlogical-op-parentheses t.cc
t.cc:3:34: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses]
    if (((_la & ~ 0x3fULL) == 0) &&
        ~~~~~~~~~~~~~~~~~~~~~~~~~^~
t.cc:3:34: note: place parentheses around the '&&' expression to silence this warning
    if (((_la & ~ 0x3fULL) == 0) &&
                                 ^

The -41056 makes me nervous. Are you sure that word size is portably correct, given the ULL usage?

parrt · 2022-10-22T01:13:53Z

Seems like C++ would change this template to wrap in parens:

bitsetBitfieldComparison(s, bits) ::= <<
<testShiftInRange({<offsetShift(s.varName, bits.shift)>})> &&
  ((1ULL \<\< <offsetShift(s.varName, bits.shift)>) & <bits.calculated>) != 0
>>

Some companies flip this warning to an error.

…issues where compilers complain about not having parens around higher-precedence && vs || operators. A tweak to antlr#3701 Signed-off-by: Terence Parr <[email protected]>

KvanTTT force-pushed the constant-folding branch from f4b2fe0 to b32214b Compare May 7, 2022 20:09

KvanTTT force-pushed the constant-folding branch 2 times, most recently from 2e2ccb5 to 0b22d5a Compare May 8, 2022 15:21

KvanTTT changed the title ~~Constant folding for Python and JavaScript runtimes~~ Constant folding for Python and other runtimes May 8, 2022

KvanTTT mentioned this pull request May 8, 2022

[Performance] Constants for Python runtime should be folded #3698

Closed

KvanTTT mentioned this pull request Jun 28, 2022

JavaScript target usability and browser support #2477

Closed

KvanTTT force-pushed the constant-folding branch 2 times, most recently from 6caa65d to 2967c8f Compare July 3, 2022 18:23

ericvergnaud reviewed Jul 4, 2022

View reviewed changes

tool/resources/org/antlr/v4/tool/templates/codegen/Python2/Python2.stg Show resolved Hide resolved

parrt reviewed Aug 26, 2022

View reviewed changes

tool/src/org/antlr/v4/codegen/model/Choice.java Outdated Show resolved Hide resolved

parrt reviewed Aug 26, 2022

View reviewed changes

tool/src/org/antlr/v4/codegen/Target.java Outdated Show resolved Hide resolved

parrt mentioned this pull request Aug 27, 2022

Head towards 4.11 release #3840

Closed

10 tasks

parrt reviewed Aug 28, 2022

View reviewed changes

tool/resources/org/antlr/v4/tool/templates/codegen/Java/Java.stg Outdated Show resolved Hide resolved

parrt reviewed Aug 28, 2022

View reviewed changes

KvanTTT force-pushed the constant-folding branch from 2967c8f to 4d57135 Compare August 30, 2022 21:45

KvanTTT added 2 commits August 31, 2022 00:58

Fold constants in generated code for all runtimes

884625d

Go: getInlineTestSetWordSize 32 -> 64 Dart: get rid of BigInt Swift: optimize TestSetInline Python: fixes antlr#3698 JavaScript: fixes antlr#3699 Signed-off-by: Ivan Kochurkin <[email protected]>

Use int literals instead of refs for Python and JavaScript

8bb8a85

Update getMultiTokenAlternativeDescriptor test fixes antlr#3703 Signed-off-by: Ivan Kochurkin <[email protected]>

KvanTTT force-pushed the constant-folding branch from 4d57135 to 8bb8a85 Compare August 30, 2022 23:21

parrt added the code-gen label Aug 31, 2022

parrt added target:java target:javascript target:csharp target:cpp target:go target:dart target:python2 target:python3 target:swift target:php labels Aug 31, 2022

parrt added this to the 4.11 milestone Aug 31, 2022

parrt added the comp:performance label Aug 31, 2022

parrt merged commit 6a2cd79 into antlr:dev Aug 31, 2022

KvanTTT deleted the constant-folding branch August 31, 2022 21:43

parrt mentioned this pull request Oct 22, 2022

Tweak folding to wrap && operations in parentheses #3936

Merged

admin-token-bot mentioned this pull request Jul 7, 2024

[Snyk] Upgrade antlr4 from 4.7.2 to 4.13.1 mongodb-js/compass#6001

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant folding for Python and other runtimes #3701

Constant folding for Python and other runtimes #3701

KvanTTT commented May 7, 2022 •

edited

Loading

KvanTTT commented May 7, 2022

parrt commented May 7, 2022

SimonStPeter commented May 7, 2022 via email

KvanTTT commented May 8, 2022 •

edited

Loading

parrt commented Jun 25, 2022

KvanTTT commented Jun 25, 2022

KvanTTT commented Jul 3, 2022

KvanTTT commented Aug 23, 2022

parrt commented Aug 27, 2022

parrt Aug 28, 2022

parrt Aug 28, 2022

SimonSntPeter Aug 28, 2022

SimonSntPeter Aug 28, 2022

KvanTTT Aug 28, 2022 •

edited

Loading

parrt Aug 30, 2022

KvanTTT Aug 30, 2022

parrt Aug 30, 2022

KvanTTT Aug 30, 2022

KvanTTT Aug 30, 2022

KvanTTT commented Aug 31, 2022

parrt commented Aug 31, 2022

parrt commented Aug 31, 2022

SimonStPeter commented Oct 11, 2022 via email

parrt commented Oct 22, 2022

parrt commented Oct 22, 2022

Constant folding for Python and other runtimes #3701

Constant folding for Python and other runtimes #3701

Conversation

KvanTTT commented May 7, 2022 • edited Loading

KvanTTT commented May 7, 2022

parrt commented May 7, 2022

SimonStPeter commented May 7, 2022 via email

KvanTTT commented May 8, 2022 • edited Loading

parrt commented Jun 25, 2022

KvanTTT commented Jun 25, 2022

KvanTTT commented Jul 3, 2022

KvanTTT commented Aug 23, 2022

parrt commented Aug 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KvanTTT Aug 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KvanTTT commented Aug 31, 2022

parrt commented Aug 31, 2022

parrt commented Aug 31, 2022

SimonStPeter commented Oct 11, 2022 via email

parrt commented Oct 22, 2022

parrt commented Oct 22, 2022

KvanTTT commented May 7, 2022 •

edited

Loading

KvanTTT commented May 8, 2022 •

edited

Loading

KvanTTT Aug 28, 2022 •

edited

Loading