Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscenalleous cli improvement for convert #385

Merged
merged 5 commits into from
Jul 12, 2024
Merged

Miscenalleous cli improvement for convert #385

merged 5 commits into from
Jul 12, 2024

Conversation

joanise
Copy link
Collaborator

@joanise joanise commented Jul 5, 2024

PR Goal?

misc small improvements:

  • accept - and /dev/stdin as stdin (the latter on Linux only)
  • when reading a file or stdin, read it and process it line by line - that was just being lazy when I slurped the file up front
  • make some unit tests more quiet

Feedback sought?

standard code review - use SemanticDiff to make it easier to read, though:
https://app.semanticdiff.com/gh/roedoejet/g2p/pull/385

Priority?

low

Tests added?

yup

How to test?

  • g2p convert my_file.txt fra fra-ipa should continue to work as expected, processing the contents of my_file.txt
  • echo blah blah | g2p convert - fra fra-ipa should now work (real use case: replace echo by a meaningful process)
  • on Linux only: g2p convert <(echo blah blah) fra fra-ipa now also works

The last two cases came up as I was preparing and processing files for EveryVoice, where I wanted to cut a column out of a psv file and pass it to g2p without having the same temporary files on disk. That's what triggered this patch in the first place.

Confidence?

high

Version change?

Possibly a minor since we're adding a feature, but really I think of this as just a patch: this is how it should have worked in the first place.

We've done a bunch of changes since 2.0.0, though, so we're probably due for releasing 2.1.0, especially for api/v2.

Copy link
Contributor

github-actions bot commented Jul 5, 2024

CLI load time: 0:00.05
Pull Request HEAD: f0cf073d8e6b3a953577eedb381a9bb13290b312
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.22%. Comparing base (e6a1280) to head (f0cf073).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #385      +/-   ##
==========================================
+ Coverage   93.19%   93.22%   +0.02%     
==========================================
  Files          17       17              
  Lines        2440     2450      +10     
  Branches      544      547       +3     
==========================================
+ Hits         2274     2284      +10     
  Misses         95       95              
  Partials       71       71              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@dhdaines dhdaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but I think we don't need to implement these checks ourselves!

g2p/cli.py Outdated Show resolved Hide resolved
@joanise
Copy link
Collaborator Author

joanise commented Jul 11, 2024

@roedoejet @dhdaines This PR is ready to (re)review, with the --file option we agreed on yesterday for g2p convert reading from a file.

Includes documentation, and an unrelated change to CI I did months ago but that got lost in a PR that never got merged.

Plus making test_update_schema() quiet except when there are errors, and making those errors more helpful when the are some.

This replaces the formerly hidden feature of heuristically detecting
existing .txt files.

Also fix test_update_schema to:
 - be quiet unless there's an error
 - correctly catch errors and display the problem filename when there is
   an error.
Copy link
Owner

@roedoejet roedoejet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for bringing this up the other day - lgtm @joanise !

lines = sys.stdin
else:
try:
to_close = lines = open(input_text, encoding="utf8")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use with open(input_text)? just to prevent the indenting? I prefer using with just in case something happens later that prevents to_close from actually being closed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if the input is a text string, you're not going to be inside that with block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always use with, it has to be impossible to use it before I resolve to using something else.

if tok is None:
tok = True # Tokenize by default
custom_tokenizer = make_tokenizer(tok_lang) if tok_lang else None
# Transduce!!!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol yes. this is the right number of exclamation marks.

tg = transducer(line)
if check:
transducer.check(tg, display_warnings=True)
outputs = [tg.output_string]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming this is copy/pasted from the old place, so not checking carefully.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you hide whitespace changes you'll see this is old code, just shifted because of the new try: block.

@dhdaines dhdaines self-requested a review July 12, 2024 15:32
Copy link
Collaborator

@dhdaines dhdaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@joanise joanise merged commit f0cf073 into main Jul 12, 2024
8 checks passed
@joanise joanise deleted the dev.ej/misc branch July 12, 2024 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants