Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python -m eagle.ge_data.allocation --outdir [path of data] #104

Closed
zpz915 opened this issue Jul 24, 2024 · 2 comments
Closed

python -m eagle.ge_data.allocation --outdir [path of data] #104

zpz915 opened this issue Jul 24, 2024 · 2 comments

Comments

@zpz915
Copy link

zpz915 commented Jul 24, 2024

when i generate traindata python -m eagle.ge_data.allocation --outdir [path of data],but i meet the error
Uploading 1721797678355.png…

@zpz915
Copy link
Author

zpz915 commented Jul 24, 2024

Traceback (most recent call last):
File "/root/autodl-tmp/EAGLE/eagle/ge_data/ge_data_all_vicuna.py", line 148, in
ds = build_dataset_rank(bigtokenizer)
File "/root/autodl-tmp/EAGLE/eagle/ge_data/ge_data_all_vicuna.py", line 130, in build_dataset_rank
ds1 = ds1.map(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3253, in map
for rank, done, content in iflatmap_unordered(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.8/site-packages/datasets/utils/py_utils.py", line 718, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "/root/miniconda3/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
raise self._value
IndexError: list index out of range

@Liyuhui-12
Copy link
Collaborator

There might be an issue with parallel processing. You can try using num_proc=1 to see if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants