IndexError: list index out of range #96

vinayak-mehta · 2018-12-11T15:18:08Z

I'm trying to extract tables from the following PDFs: [1] and [2] using the code in this gist. The code fails with the following traceback:

Traceback (most recent call last):
  File "plumb.py", line 10, in 
    table = p0.extract_table()
  File "/home/vinayak/.local/share/virtualenvs/camelot-pdfplumber/local/lib/python2.7/site-packages/pdfplumber/page.py", line 185, in extract_table
    largest = list(sorted(tables, key=sorter))[0]
IndexError: list index out of range

I'm on Python 2.7 and pdfplumber 0.5.11. Any pointers on how I can solve this?

jsvine · 2019-04-15T02:50:31Z

Hi @vinayak-mehta, and my apologies for the slowness in responding. It appears that pdfplumber's default table-extraction settings don't identify a table in the on the first page of that PDF, and so the list of tables is empty, triggering that IndexError. It's worth taking a look at the library's table-extraction options. In this case, it looks like you'd want to change vertical_strategy and horizontal_strategy.

jsvine closed this as completed Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #96

IndexError: list index out of range #96

vinayak-mehta commented Dec 11, 2018 •

edited

Loading

jsvine commented Apr 15, 2019

IndexError: list index out of range #96

IndexError: list index out of range #96

Comments

vinayak-mehta commented Dec 11, 2018 • edited Loading

jsvine commented Apr 15, 2019

vinayak-mehta commented Dec 11, 2018 •

edited

Loading