EAGLE-2 is slower than EAGlE-1 #88

yjdy · 2024-07-02T01:26:32Z

Thanks for this great repo. I have test EAGLE-1 and EAGLE-2 on vicuna-7b.
But I found that EAGLE-2 is slower than EAGLE-1, 69 tokens/s and 66 tokens/s respectively.
The inference test is on MT-bentch, using V100 with 32G memory, and batch size is 1.

Is it normal?

Best regards

yjdy · 2024-07-02T02:02:47Z

I make a mistake above, the inference speed of EAGLE-2 is 66 and EAGLE-1 is 69. Besides, temperature is 0

hongyanz · 2024-07-02T13:06:14Z

It is not normal. Can you provide more details (e.g., if you are running something else on your machine, what is your environment, which codes are your running)? Without them, it is hard to debug your code.

yjdy · 2024-07-03T02:26:06Z

Some details of my environment are listed as follow:
1 GPU V100 32G memory
python 3.10.14
CUDA 11.7
Driver Version: 515.65.01
torch 2.1.0
triton 2.1.0
transformers 4.36.2

I just run the evaluation script gen_ea_answer_vicuna.py as suggested in Readme
batch size= 1
temperature=0

Liyuhui-12 · 2024-07-04T03:01:06Z

The possible reason is that total_token was not set correctly.

yjdy · 2024-07-05T01:38:11Z

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

Lucas-TY · 2024-07-05T12:06:33Z

HI, the benchmark can't record new token correctly, I don't know if that's normal.

python -m eagle.evaluation.gen_ea_answer_vicuna\
        --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
        --base-model-path lmsys/vicuna-7b-v1.3

python -m eagle.evaluation.gen_baseline_answer_vicuna\
		 --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
		 --base-model-path lmsys/vicuna-7b-v1.3

{"question_id": 81, "answer_id": "TP4CRrbLYBqFHdQqoeb7ug", "model_id": "ess-vicuna-70b-fp16-baseline-temperature-1.0", "choices": [{"index": 0, "turns": ["....... "idxs": [603, 603], "new_tokens": [0, 0], "wall_time": [8.09636378288269, 7.946403741836548]}], "tstamp": 1720166094.253764}

Liyuhui-12 · 2024-07-10T16:17:29Z

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

Overall, the smaller the model and the more powerful the computational capacity, the larger this value should be.

Liyuhui-12 · 2024-07-10T16:18:29Z

HI, the benchmark can't record new token correctly, I don't know if that's normal.

It is normal for the baseline not to return new tokens.

hongyanz closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EAGLE-2 is slower than EAGlE-1 #88

EAGLE-2 is slower than EAGlE-1 #88

yjdy commented Jul 2, 2024

yjdy commented Jul 2, 2024

hongyanz commented Jul 2, 2024 •

edited

Loading

yjdy commented Jul 3, 2024 •

edited

Loading

Liyuhui-12 commented Jul 4, 2024

yjdy commented Jul 5, 2024

Lucas-TY commented Jul 5, 2024

Liyuhui-12 commented Jul 10, 2024

Liyuhui-12 commented Jul 10, 2024

EAGLE-2 is slower than EAGlE-1 #88

EAGLE-2 is slower than EAGlE-1 #88

Comments

yjdy commented Jul 2, 2024

yjdy commented Jul 2, 2024

hongyanz commented Jul 2, 2024 • edited Loading

yjdy commented Jul 3, 2024 • edited Loading

Liyuhui-12 commented Jul 4, 2024

yjdy commented Jul 5, 2024

Lucas-TY commented Jul 5, 2024

Liyuhui-12 commented Jul 10, 2024

Liyuhui-12 commented Jul 10, 2024

hongyanz commented Jul 2, 2024 •

edited

Loading

yjdy commented Jul 3, 2024 •

edited

Loading