Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print error and stop the script if mandatory Metadata key is not set #1806

Merged
merged 7 commits into from
Apr 5, 2023

Conversation

pavel-karatsiuba
Copy link
Contributor

@pavel-karatsiuba pavel-karatsiuba commented Mar 8, 2023

If the mandatory Metadata key is not set then print Error and stop the script

Fixes: #1805

@codecov
Copy link

codecov bot commented Mar 8, 2023

Codecov Report

Patch coverage: 57.69% and project coverage change: -0.01 ⚠️

Comparison is base (0018c9f) 70.90% compared to head (f070b80) 70.90%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1806      +/-   ##
==========================================
- Coverage   70.90%   70.90%   -0.01%     
==========================================
  Files          23       24       +1     
  Lines        2609     2622      +13     
  Branches      593      594       +1     
==========================================
+ Hits         1850     1859       +9     
- Misses        653      657       +4     
  Partials      106      106              
Impacted Files Coverage Δ
src/mwoffliner.lib.ts 72.04% <54.83%> (+0.67%) ⬆️
src/util/metaData.ts 61.90% <61.90%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kelson42
Copy link
Collaborator

kelson42 commented Mar 8, 2023

@pavel-karatsiuba It should not be an exception. An exception is a unamanged unexcpected situation in a software. It should be handled properly. That means as early as possible and A proper error message should be given to the user so he understands what is missing.

@pavel-karatsiuba pavel-karatsiuba changed the title throw exception if mandatory Metadata key is not set Print error and stop the script if mandatory Metadata key is not set Mar 9, 2023
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see many weaknesses with your implementation:

  • The checking is done pretty late, it get all the page details downloaded before checking this (which is useless)
  • I know this is not strictly the topic of this ticket... but have you remarked that if the Description comes from the Mediawiki, then nothing stops it to be more than 80 charaters?! I kind of expect that this works fine as well.
  • If error, the MWoffliner exits with code 0 which is wrong. Remark that if I a give a wrong --customZimDescription on the command line then it fails with exit code 2 (which is correct)
  • If there is an error then I see something like this (not only I would just prefer to stop pretty quickly but on the top of this the messages are misleading):
[info] [2023-03-12T13:56:24.324Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Ccoordinates&rdlimit=max&rdnamespace=0&colimit=max&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=Interlingua]
[log] [2023-03-12T13:56:24.967Z] Got [407 / 407] articles chunk from namespace [0]
[info] [2023-03-12T13:56:24.967Z] Getting JSON from [https://bm.wikipedia.org/w/api.php?action=query&format=json&prop=redirects%7Crevisions%7Ccoordinates&rdlimit=max&rdnamespace=0&colimit=max&rawcontinue=true&generator=allpages&gapfilterredir=nonredirects&gaplimit=max&gapnamespace=0&gapcontinue=Nsoro]
[log] [2023-03-12T13:56:25.585Z] Got [413 / 820] articles chunk from namespace [0]
[log] [2023-03-12T13:56:25.586Z] A total of [1206] articles has been found in namespace [0]
[log] [2023-03-12T13:56:25.586Z] Got ArticleIDs in 2.09 seconds
[log] [2023-03-12T13:56:25.586Z] Total articles found in Redis: 1207
[warn] [2023-03-12T13:56:25.586Z] Couldn't find strings file for [bm], falling back to [en]
[log] [2023-03-12T13:56:25.587Z] Doing dump
[log] [2023-03-12T13:56:25.587Z] Writing zim to [/home/kelson/code/mwoffliner/out/wikipedia_bm_all_2023-03.zim]
[error] [2023-03-12T13:56:25.587Z] Metadata "{key}" is required
[log] [2023-03-12T13:56:25.587Z] Finished dump
[log] [2023-03-12T13:56:25.587Z] Closing HTTP agents...
[log] [2023-03-12T13:56:25.588Z] All dumping(s) finished with success.
[log] [2023-03-12T13:56:25.588Z] Flushing Redis DBs
[info] [2023-03-12T13:56:25.588Z] Finished running mwoffliner after [5s]
[log] [2023-03-12T13:56:25.588Z] Exiting with code [0]
[log] [2023-03-12T13:56:25.589Z] Deleting temporary directory [/dev/shm/mwoffliner-1678629383338]

You need to rethink the architecture of this. First you need to consider that ZIM metadata come AFAIK either from the mediawiki metadata (one short HTTP request at start) or from the command line. Considering this I could propose to:

  • Isolate the string metadata constraints in a form of a const table (mandatory, max length, ...)
  • Use this table in the sanitizing of command line arguments (when necessary)
  • Isolate the metadata sanitizing in a dedicated module (mwoffliner.lib.ts is already quite big)
  • Reuse the string metadata constraints (created earlier) to check if all the metadata are correct
  • Exit in a similar manner if wrong input via command line or just wrong metadata.

@pavel-karatsiuba
Copy link
Contributor Author

@kelson42 If I add sanitizing the Description as we made it for the command line parameter, then many downloads will fail. Maybe I need to trim the Description to 80 characters?

@kelson42
Copy link
Collaborator

kelson42 commented Mar 18, 2023

@pavel-karatsiuba obviously you can... even if is is a super minor improvment.... if an improvment at all (strongly suspect Mediawiki to do it already). Won't change anything related to not passing the metadata sanitation.

@kelson42
Copy link
Collaborator

@pavel-karatsiuba Any news here? Please rebase/fix conflict before doing anything else.

@pavel-karatsiuba pavel-karatsiuba force-pushed the check-metadata-mandatory-keys branch 2 times, most recently from 3ae267e to ebafe88 Compare March 22, 2023 16:48
@pavel-karatsiuba
Copy link
Contributor Author

@kelson42 Ready to review. With the last change, I fixed the test because the new sanitizing rules do not allow Description higher than 80 characters.

@kelson42 kelson42 force-pushed the check-metadata-mandatory-keys branch from ebafe88 to ea927ca Compare March 23, 2023 07:04
@kelson42 kelson42 force-pushed the check-metadata-mandatory-keys branch from ea927ca to 2a0c288 Compare March 30, 2023 10:55
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this table in the sanitizing of command line arguments (when necessary)

This is not complete, for example max size of LongDescription is not checked

src/util/metaData.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still work left... See https://github.com/openzim/zim-tools/pull/339/files, this is the kind of result I was expected here as well.

@pavel-karatsiuba pavel-karatsiuba force-pushed the check-metadata-mandatory-keys branch 2 times, most recently from ea67452 to 72977cb Compare March 31, 2023 18:03
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kelson42 kelson42 merged commit 8565233 into main Apr 5, 2023
@kelson42 kelson42 deleted the check-metadata-mandatory-keys branch April 5, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

It should not be possible to create a ZIM with and empty Metadata "Description"
2 participants