Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProductDataset fails on current dataset due to max field size limit. #258

Open
gitting opened this issue Aug 2, 2024 · 0 comments
Open

Comments

@gitting
Copy link

gitting commented Aug 2, 2024

What

openfoodfacts/dataset.py module uses csv library that has a default max field size
that is less than what is present in the current database

Examining the dataset shows the following max sizes for certain fields

field_name line_num max_len
'ingredients_text' 1553941 4688767
'abbreviated_product_name' 1599144 3853892
'popularity_tags' 2320801 61351
'ingredients_tags' 2243263 8742
'brands' 2033179 5844
'packaging_text' 2855888 4116
'quantity' 3303336 4077
'categories' 3105696 3156
'packaging' 3239679 2525
'origins' 2146147 2486
'packaging_tags' 2313190 2138
'countries_tags' 442421 2113
'url' 945261 2012
'purchase_places' 1313875 2010
'generic_name' 360070 1634
'additives_en' 3224866 1409
'categories_tags' 2790830 1369
'labels_tags' 935752 1348
'labels' 995985 1048
'data_quality_errors_tags' 2360119 993
'traces' 2610037 874
'allergens' 2151395 640
'cities_tags' 2698907 596
'manufacturing_places' 1298082 558
'states' 30 556

The ingredients_text has the largest length of 4688767 on line 1553941.

Setting the limit larger than that value, as in follows csv.field_size_limit(5_000_000)
allows for the API to work.

Steps to reproduce the behavior:

  1. Attempt to run the example
  2. Observer the error similar to:
  File "/private/tmp/food/.venv/lib/python3.12/site-packages/openfoodfacts/dataset.py", line 108, in count
    for _ in self:
  File "/private/tmp/food/.venv/lib/python3.12/site-packages/openfoodfacts/dataset.py", line 102, in _csv_iterator
    for row in reader:
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/csv.py", line 116, in __next__
    row = next(self.reader)
          ^^^^^^^^^^^^^^^^^
_csv.Error: field larger than field limit (131072)

Platform (Desktop, Mobile, Hunger Games)

  • OS: MacOS Sonoma 14.2
  • Platform: Desktop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants