Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate between context declarations and context references #58

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

skaupper
Copy link

This pull request fixes #16, among other things.

The context keyword is used for context declarations (which were already implemented) and context references (which were still missing). To disambiguate between these two (and to avoid having to deal with this issue at a higher level again), I implemented a lookahead mechanism.

Context.StartBlock contains a set of states which only check whether the token stream describes a context declaration, reference or neither. These states do not alter the tokens nor do they generate new blocks by themselves.
As soon as it is decideable, Context.StartBlock hands control over to either Context.ReferenceStartBlock or Context.DeclarationStartBlock respectively.

Since the TokenToBlockParser only supported looking at a token a single time, I created a second parser instance, which iterates over everything from TokenMarker (which holds the ContextKeyword token) up to the current token. The method TokenToBlockParser.ReparseFromTokenMarker can be used to do exactly that.

I also implemented a parser method TokenToBlockParser.HandleNonCodeTokens which can create non-code blocks (i.e. all kinds of whitespaces and comments) since the LRM allows them basically everywhere. While the LRM does not technically list delimited comments (/* ... */) as separators (§15.3), they effectively act as separators insofar as they separate adjacent lexical elements.

I would love to know your thoughts about these changes and if/how you want to integrate them!

Sebastian Kaupper added 2 commits September 20, 2023 15:23
…tions

Add functions to the TokenToBlockParser which can be generally useful:

HandleNonCodeTokens creates NonCode (i.e. whitespace and comment)
blocks. The LRM allows these blocks basically everywhere.
Note: Delimited comments are strictly speaking not separators
according to the LRM (§15.3)!

ReparseFromTokenMarker allows to iterate over a set of tokens a second
time. This is useful if you cannot decide the block type based on a
single token. `Context.StartBlock` searches ahead (without modifying
tokens or adding blocks) until it can decide whether the `context`
keyword is used in a context declaration or a context reference.

Additionally, some changes are made to satisfy the static type checker.
# parserState.PushState = ExpressionBlockEndedByLoopORToORDownto.stateExpression
# return
# elif token == ';':
parserState.NewToken = BoundaryToken(fromExistingToken=token)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why you commented out this block in the first place?

@classmethod
def stateContextKeyword(cls, parserState: TokenToBlockParser):
cls.stateWhitespace1(parserState)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you named all your states after the token which was expected in the previous state, which I find a little counter-intuitive.

For the sake of a consistent interface I kept that naming scheme for the first state, and named all subsequent states after the token they are expecting right now.

token = parserState.Token

if isinstance(token, ContextKeyword):
parserState.NextState = cls.stateWhitespace
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not create the ReferenceStartBlock here, so the behaviour of stateContextKeyword can be the same whether it is called from here, or from within a DeclarationBody.

return

# This condition is also guaranteed. Otherwise `StartBlock.stateContextKeyword` would have thrown an error.
assert False, "Expected whitespace after keyword CONTEXT."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used failing assertions for unreachable code locations. If the interpreter gets there, there is either a bug somewhere, or the assessment that this location is unreachable is wrong.

If you do not want that kind of behaviour we can replace them with BlockParserExceptions for example.

def stateLibOrContextName(cls, parserState: TokenToBlockParser):
token = parserState.Token

if parserState.HandleNonCodeTokens(None):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my take on handling non-code tokens, without repeating the same 20 lines in every other state. If a block type is passed as the first parameter, an (multi part) instance of that type is created before any whitespace/comment block.

return

if isinstance(token, WordToken) and (token == "is"):
parserState.ReparseFromTokenMarker(DeclarationStartBlock.stateFromStartBlock)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state machine decided, that the block can only represent a context declaration. ReparseFromTokenMarker only needs to know in which state the second parsing should be started. The first token passed is the TokenMarker itself.

@skaupper skaupper changed the title Context references Disambiguate between context declarations and context references Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant