Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: cannot reindex from a duplicate axis #49

Open
dinosg opened this issue Aug 25, 2021 · 9 comments
Open

ValueError: cannot reindex from a duplicate axis #49

dinosg opened this issue Aug 25, 2021 · 9 comments

Comments

@dinosg
Copy link

dinosg commented Aug 25, 2021

then maup.assign just crashes... after spending a while getting thru the assignments. example:

In [10]: assign1 = maup.assign(blocks20, vtds10)
100%|██████████| 8941/8941 [11:36<00:00, 12.85it/s]
Traceback (most recent call last):

File "", line 1, in
assign1 = maup.assign(blocks20, vtds10)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/crs.py", line 14, in wrapped
return f(*args, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 12, in assign
assignment = assign_by_covering(sources, targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 22, in assign_by_covering
return indexed_sources.assign(targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/indexed_geometries.py", line 42, in assign
assignment = pandas.concat(groups).reindex(self.index)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4579, in reindex
return super().reindex(index=index, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4810, in reindex
axes, level, limit, tolerance, method, fill_value, copy

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4834, in _reindex_axes
allow_dups=False,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4880, in _reindex_with_indexers
copy=copy,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 663, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

@InnovativeInventor
Copy link
Member

InnovativeInventor commented Aug 25, 2021

See #41. I think we should make the error message more useful, but this is likely a problem with your source or target geometries containing overlaps. If these are indeed Census blocks/vtds, then you likely have duplicates. Let me know if this helps!

@dinosg
Copy link
Author

dinosg commented Aug 26, 2021

these ARE census blocks being mapped to VTD's. However the VTD's (for Texas, from the MGGG state archive for 2010) got 'buffered' to avoid a point defect that prevented a Graph getting made. possibly using the straight MGGG vtd archive for TX could be a workaround - we'll see.

Idea being to then map the 2010 census blocks to the 2020 census vtd's so I can have a database with the 2010 AND 2020 population stats all in 1 place so I can do interesting population change comparisons

@InnovativeInventor
Copy link
Member

VEST already did this, I believe.

@dinosg
Copy link
Author

dinosg commented Aug 26, 2021

you have a link for that repo? What I see at the general link https://dataverse.harvard.edu/file.xhtml?fileId=5007853&version=17.0 is stuff on 2020 election results but not obviously combining 2010 and 2020 demographics. Missing PA incidentally. I just got the comprehensive precinct results from PA sec'y of state - anyone I should send those to so they can integrate it with other datasets?

@dinosg
Copy link
Author

dinosg commented Aug 26, 2021

there also is their repo with "crosswalks"
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/T9VMJO
but not updated w/ 2020 census pop data, just the block shapes and 2019 ACS data.

@InnovativeInventor
Copy link
Member

InnovativeInventor commented Sep 15, 2021

Sorry for the delay -- I thought that VEST had this data prepared, but maybe not.

@InnovativeInventor
Copy link
Member

InnovativeInventor commented Sep 15, 2021

#48 should silence the issue for you, I believe. Could you share the shapefiles that you're using? Also, did you make sure that your source and target shapefiles have the same projection?

@brodiak9000
Copy link

I had this same issue. I checked the axes of my dataframes and no duplicates exist.

ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_3132/626298941.py in
1 variables = ["POP100"]
2
----> 3 assignment = maup.assign(blocks, precincts)
4 precincts[variables] = blocks[variables].groupby(assignment).sum()
5 precincts[variables].head()

~\anaconda3\lib\site-packages\maup\crs.py in wrapped(*args, **kwargs)
12 )
13 )
---> 14 return f(*args, **kwargs)
15
16 return wrapped

~\anaconda3\lib\site-packages\maup\assign.py in assign(sources, targets)
10 target that covers the most of its area.
11 """
---> 12 assignment = assign_by_covering(sources, targets)
13 unassigned = sources[assignment.isna()]
14 assignments_by_area = assign_by_area(unassigned, targets)

~\anaconda3\lib\site-packages\maup\assign.py in assign_by_covering(sources, targets)
20 def assign_by_covering(sources, targets):
21 indexed_sources = IndexedGeometries(sources)
---> 22 return indexed_sources.assign(targets)
23
24

~\anaconda3\lib\site-packages\maup\indexed_geometries.py in assign(self, targets)
46 )
47 ]
---> 48 assignment = pandas.concat(groups).reindex(self.index)
49 return assignment
50

~\anaconda3\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
4578 )
4579 def reindex(self, index=None, **kwargs):
-> 4580 return super().reindex(index=index, **kwargs)
4581
4582 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"])

~\anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
4816
4817 # perform the reindex on the axes
-> 4818 return self._reindex_axes(
4819 axes, level, limit, tolerance, method, fill_value, copy
4820 ).finalize(self, method="reindex")

~\anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
4837
4838 axis = self._get_axis_number(a)
-> 4839 obj = obj._reindex_with_indexers(
4840 {axis: [new_index, indexer]},
4841 fill_value=fill_value,

~\anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
4881
4882 # TODO: speed up on homogeneous DataFrame objects
-> 4883 new_data = new_data.reindex_indexer(
4884 index,
4885 indexer,

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice)
668 # some axes don't allow reindexing with dups
669 if not allow_dups:
--> 670 self.axes[axis]._validate_can_reindex(indexer)
671
672 if axis >= self.ndim:

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _validate_can_reindex(self, indexer)
3783 # trying to reindex on an axis with duplicates
3784 if not self._index_as_unique and len(indexer):
-> 3785 raise ValueError("cannot reindex from a duplicate axis")
3786
3787 def reindex(

ValueError: cannot reindex from a duplicate axis

@dinosg
Copy link
Author

dinosg commented Sep 29, 2021

the shapefiles I used were at:
https://github.com/mggg-states/TX-shapefiles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants