-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about handling Hollerith format fields #4
Comments
A problem with Hollerith fields is that regular expressions can't match them in one pass. They take the form of Hollerith fields are only legal as constant arguments in subroutine calls, in |
@aptthorpe:
This seems like a very good strategy! I think it should not interfere with the generator. A character scan should suffice. This area is very performance-sensitive, but it should not incur a huge performance hit, since you are only handling the fallback in the exception handling part. Go for it! The only thing: it would also be good to add a Edited to add: Come to think of it: could you maybe derive from RegexLeger, |
I have a working solution and there'd be no problem toggling it on or off with |
Here's my current modification: def line_tokens(self, line, lineno=None):
"""Tokenizes text using the groups in the regex specified
Iterates through all matches of the regex on `text`, returning the
highest matching category with the associated token (group text).
"""
unscanned_line = line
scanmore = True
c2 = 0
while scanmore:
# Assume this for-loop will consume all of unscanned_line
scanmore = False
try:
for match in self._finditer(unscanned_line):
cat = match.lastindex
yield lineno, match.start(cat), cat, match.group(cat)
c2 = match.end(cat)
except IndexError:
# Match failed; try to match a Hollerith string
# Note: Hollerith strings cannot be recognized by a simple
# or single regex
mhl = re.match(r"(\d+)H", line[c2:])
if mhl:
hstr_len = int(mhl.group(1))
hstr_start = mhl.end(0)
hs1 = c2 + hstr_start
hs2 = hs1 + hstr_len
# Strings are code 9
# Q: Should string extend from c2:hs2 vs hs1:hs2? (include nH prefix?)
# TODO: Trap len(line) < hs2
yield lineno, hs1, 9, unscanned_line[hs1:hs2]
# Grab remainder of line to continue scanning
unscanned_line = unscanned_line[hs2:]
# Continue scanning unless the Hollerith string consumed the entire
# remainder of the line (nothing left to scan)
scanmore = (len(unscanned_line) > 0)
else:
raise LexerError(None, lineno, match.start(), match.end(), line,
"invalid token") |
fsource lex
does not recognize the token1H1
in the lineFORMAT (1H1)
This should be expected behavior since
fsource
is advertised as handling F77 and later, and Hollerith fields were considered obsolete by F77.I am working on a project to detect and evaluate problematic constructs in legacy code such as Hollerith fields, alternate return points,
ENTRY
statements, character data stored in non-character variables, etc. My question is where I should focus my efforts to extendlexer.py
to recognize these difficult and obsolete constructs? After a quick scan oflexer.py
, it looks like I should modifyformattok
inget_lexer_regex()
- does this seem reasonable?The text was updated successfully, but these errors were encountered: