-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscenalleous cli improvement for convert #385
Conversation
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #385 +/- ##
==========================================
+ Coverage 93.19% 93.22% +0.02%
==========================================
Files 17 17
Lines 2440 2450 +10
Branches 544 547 +3
==========================================
+ Hits 2274 2284 +10
Misses 95 95
Partials 71 71 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, but I think we don't need to implement these checks ourselves!
@roedoejet @dhdaines This PR is ready to (re)review, with the Includes documentation, and an unrelated change to CI I did months ago but that got lost in a PR that never got merged. Plus making |
This replaces the formerly hidden feature of heuristically detecting existing .txt files. Also fix test_update_schema to: - be quiet unless there's an error - correctly catch errors and display the problem filename when there is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for bringing this up the other day - lgtm @joanise !
lines = sys.stdin | ||
else: | ||
try: | ||
to_close = lines = open(input_text, encoding="utf8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use with open(input_text)
? just to prevent the indenting? I prefer using with
just in case something happens later that prevents to_close
from actually being closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if the input is a text string, you're not going to be inside that with
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always use with
, it has to be impossible to use it before I resolve to using something else.
if tok is None: | ||
tok = True # Tokenize by default | ||
custom_tokenizer = make_tokenizer(tok_lang) if tok_lang else None | ||
# Transduce!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol yes. this is the right number of exclamation marks.
tg = transducer(line) | ||
if check: | ||
transducer.check(tg, display_warnings=True) | ||
outputs = [tg.output_string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming this is copy/pasted from the old place, so not checking carefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you hide whitespace changes you'll see this is old code, just shifted because of the new try:
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
PR Goal?
misc small improvements:
-
and/dev/stdin
as stdin (the latter on Linux only)Feedback sought?
standard code review - use SemanticDiff to make it easier to read, though:
https://app.semanticdiff.com/gh/roedoejet/g2p/pull/385
Priority?
low
Tests added?
yup
How to test?
g2p convert my_file.txt fra fra-ipa
should continue to work as expected, processing the contents ofmy_file.txt
echo blah blah | g2p convert - fra fra-ipa
should now work (real use case: replace echo by a meaningful process)g2p convert <(echo blah blah) fra fra-ipa
now also worksThe last two cases came up as I was preparing and processing files for EveryVoice, where I wanted to cut a column out of a psv file and pass it to g2p without having the same temporary files on disk. That's what triggered this patch in the first place.
Confidence?
high
Version change?
Possibly a minor since we're adding a feature, but really I think of this as just a patch: this is how it should have worked in the first place.
We've done a bunch of changes since 2.0.0, though, so we're probably due for releasing 2.1.0, especially for api/v2.