Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAGLE-2 is slower than EAGlE-1 #88

Closed
yjdy opened this issue Jul 2, 2024 · 8 comments
Closed

EAGLE-2 is slower than EAGlE-1 #88

yjdy opened this issue Jul 2, 2024 · 8 comments

Comments

@yjdy
Copy link

yjdy commented Jul 2, 2024

Thanks for this great repo. I have test EAGLE-1 and EAGLE-2 on vicuna-7b.
But I found that EAGLE-2 is slower than EAGLE-1, 69 tokens/s and 66 tokens/s respectively.
The inference test is on MT-bentch, using V100 with 32G memory, and batch size is 1.

Is it normal?

Best regards

@yjdy
Copy link
Author

yjdy commented Jul 2, 2024

I make a mistake above, the inference speed of EAGLE-2 is 66 and EAGLE-1 is 69. Besides, temperature is 0

@hongyanz
Copy link
Contributor

hongyanz commented Jul 2, 2024

It is not normal. Can you provide more details (e.g., if you are running something else on your machine, what is your environment, which codes are your running)? Without them, it is hard to debug your code.

@yjdy
Copy link
Author

yjdy commented Jul 3, 2024

Some details of my environment are listed as follow:
1 GPU V100 32G memory
python 3.10.14
CUDA 11.7
Driver Version: 515.65.01
torch 2.1.0
triton 2.1.0
transformers 4.36.2

I just run the evaluation script gen_ea_answer_vicuna.py as suggested in Readme
batch size= 1
temperature=0

@Liyuhui-12
Copy link
Collaborator

The possible reason is that total_token was not set correctly.

@yjdy
Copy link
Author

yjdy commented Jul 5, 2024

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

@Lucas-TY
Copy link

Lucas-TY commented Jul 5, 2024

HI, the benchmark can't record new token correctly, I don't know if that's normal.

python -m eagle.evaluation.gen_ea_answer_vicuna\
        --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
        --base-model-path lmsys/vicuna-7b-v1.3

python -m eagle.evaluation.gen_baseline_answer_vicuna\
		 --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
		 --base-model-path lmsys/vicuna-7b-v1.3
{"question_id": 81, "answer_id": "TP4CRrbLYBqFHdQqoeb7ug", "model_id": "ess-vicuna-70b-fp16-baseline-temperature-1.0", "choices": [{"index": 0, "turns": ["....... "idxs": [603, 603], "new_tokens": [0, 0], "wall_time": [8.09636378288269, 7.946403741836548]}], "tstamp": 1720166094.253764}

@Liyuhui-12
Copy link
Collaborator

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

Overall, the smaller the model and the more powerful the computational capacity, the larger this value should be.

@Liyuhui-12
Copy link
Collaborator

HI, the benchmark can't record new token correctly, I don't know if that's normal.

It is normal for the baseline not to return new tokens.

@hongyanz hongyanz closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants