error 'accept_length' in Eagle1 or 2? #95

haiduo · 2024-07-15T09:52:44Z

EAGLE/eagle/evaluation/gen_ea_alpha_vicuna.py

Line 93 in 3585477

best_candidate, accept_length,sample_p = evaluate_posterior(

It looks like a bug? The calculation of accept_length only considers the last step of the last conversation, not the average result. The results I reproduced are basically accept_length=2.
Experimental settings: LLM=vicuna-7b, test: MT-bench

Lucas-TY · 2024-07-15T18:47:24Z

They didn't release the average accept length script, you can simply dump them into jsonl file and calculate them yourself.

haiduo · 2024-07-16T01:26:41Z

They didn't release the average accept length script, you can simply dump them into jsonl file and calculate them yourself.

Yes, I used the modified script to calculate the average length, but the result was 2.0. But the paper lists different from mine, as follows:

And I suspect that the calculation of n-α has similar problems. This is my reproduced results:

yanjunplay · 2024-07-24T00:25:16Z

Hi @haiduo, curious, did you have any update on this? :-)

haiduo · 2024-07-24T02:20:20Z

Hi @haiduo, curious, did you have any update on this? :-)

Hello. I try other devices(different GPUs) and the conclusions are basically the same as mine. I also look at the implementations of other open source frameworks and they are the same as mine. So I have reason to believe that the author either doesn't open source the complete actual test codes or his paper results are problematic.
In addition, the baseline and eagle results in the table above are reversed, and the unit is token/s.

yanjunplay · 2024-07-24T02:34:53Z

Thanks @haiduo for replying me. I just checked the Spec Bench dashboard https://github.com/hemingkx/Spec-Bench/blob/main/Leaderboard.md from the link you shared. Their "Accepted Tokens" numbers for EAGLE are all 3+. You mean the logic there was correct? Then I am a bit confused how do they get 3+ while we can only get ~2 here for EAGLE 2? Have you tried the Spec-Bench scripts? Although Spec Bench did the bench mark on EAGLE 1 but I will be surprised that EAGLE 2 is so much worse then EAGLE 1. I also would like to debug together.

haiduo · 2024-07-24T02:52:38Z

Thank you for your comment @yanjunplay . I don't reproduce the results of Spec Bench in time (but I am going to do so), I just look at the experiment of that code. In addition, the above results are problematic when using the code of eagle2 to test the accept length directly. I use the code of eagle1 (according to the author's previous reply in the issue). If there is a problem with my reproduction, it may be that I don't use chain speculation (used by the paper), but only use eagle1's "gen_ea_alpha_vicuna.py". In fact, its test is a tree with 26 nodes, that is, tree speculation. But even so, why can the values of accept rate (0-alpha, 1-alpha, 2-alpha) be correct? I am confused. BTW, the author's open-source eagle2 can only train and test the speedup results currently, so the results I reproduced are based on eagle1 without any changes.
Finally, I need to ask you about the question I raised before. The author implements it differently in eagle2 and eagle1. I don't know if q(x)=1 still meets the same distribution assumption of speculative sampling.

yanjunplay · 2024-07-24T03:12:04Z

@haiduo do you use wechat? Maybe we can quickly discuss a bit.

haiduo · 2024-07-24T03:15:14Z

@haiduo do you use wechat? Maybe we can quickly discuss a bit. Mine wechat account: macazi

That's good!

qwedaq · 2024-08-08T02:14:39Z

Thank you for your comment @yanjunplay . I don't reproduce the results of Spec Bench in time (but I am going to do so), I just look at the experiment of that code. In addition, the above results are problematic when using the code of eagle2 to test the accept length directly. I use the code of eagle1 (according to the author's previous reply in the issue). If there is a problem with my reproduction, it may be that I don't use chain speculation (used by the paper), but only use eagle1's "gen_ea_alpha_vicuna.py". In fact, its test is a tree with 26 nodes, that is, tree speculation. But even so, why can the values of accept rate (0-alpha, 1-alpha, 2-alpha) be correct? I am confused. BTW, the author's open-source eagle2 can only train and test the speedup results currently, so the results I reproduced are based on eagle1 without any changes. Finally, I need to ask you about the question I raised before. The author implements it differently in eagle2 and eagle1. I don't know if q(x)=1 still meets the same distribution assumption of speculative sampling.

Hi @haiduo, were you able to get any answer as to why q(x)=1.0 in EAGLE2?

haiduo · 2024-08-08T02:55:02Z

Hi @qwedaq, although we don't receive a reply from the author, we deduce it later and find that in the non-repeat sampling mode of Eagle2, q(x)=1.0 is a special case of speculative decoding, which only applies to Eagle2. It is not reasonable for Eagle1. Therefore, Eagle2 should have no problem doing this in theory, but I haven't had time to try it out to see how the actual generated quality is. You can try to use other benchmarks to measure its score.

qwedaq · 2024-08-08T02:59:03Z

Hi @qwedaq, although we don't receive a reply from the author, we deduce it later and find that in the non-repeat sampling mode of Eagle2, q(x)=1.0 is a special case of speculative decoding, which only applies to Eagle2. It is not reasonable for Eagle1. Therefore, Eagle2 should have no problem doing this in theory, but I haven't had time to try it out to see how the actual generated quality is. You can try to use other benchmarks to measure its score.

Thank you for your quick response. I am a bit new to speculative decoding can you please elaborate on what you mean by non-repeat sampling model of EAGLE2?

haiduo · 2024-08-08T03:08:45Z

Hi @qwedaq, although we don't receive a reply from the author, we deduce it later and find that in the non-repeat sampling mode of Eagle2, q(x)=1.0 is a special case of speculative decoding, which only applies to Eagle2. It is not reasonable for Eagle1. Therefore, Eagle2 should have no problem doing this in theory, but I haven't had time to try it out to see how the actual generated quality is. You can try to use other benchmarks to measure its score.

Thank you for your quick response. I am a bit new to speculative decoding can you please elaborate on what you mean by non-repeat sampling model of EAGLE2?

Firstly, you may need to read the two papers: "Fast Inference from Transformers via Speculative Decoding" and "Accelerating Large Language Model Decoding with Speculative Sampling". Secondly, my understanding of "sampling without replacement" or "non-repeat" is similar to what we describe in probability statistics: each time a sample is drawn, whether accepted or not, it is excluded from the total sample pool before the next round of sampling. Hope to help you.

qwedaq · 2024-08-08T03:11:19Z

Got it. Will read the papers you mentioned. Thank you again :)

haiduo closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error 'accept_length' in Eagle1 or 2? #95

error 'accept_length' in Eagle1 or 2? #95

haiduo commented Jul 15, 2024

Lucas-TY commented Jul 15, 2024

haiduo commented Jul 16, 2024 •

edited

Loading

yanjunplay commented Jul 24, 2024

haiduo commented Jul 24, 2024

yanjunplay commented Jul 24, 2024 •

edited

Loading

haiduo commented Jul 24, 2024 •

edited

Loading

yanjunplay commented Jul 24, 2024 •

edited

Loading

haiduo commented Jul 24, 2024

qwedaq commented Aug 8, 2024 •

edited

Loading

haiduo commented Aug 8, 2024

qwedaq commented Aug 8, 2024

haiduo commented Aug 8, 2024

qwedaq commented Aug 8, 2024

error 'accept_length' in Eagle1 or 2? #95

error 'accept_length' in Eagle1 or 2? #95

Comments

haiduo commented Jul 15, 2024

Lucas-TY commented Jul 15, 2024

haiduo commented Jul 16, 2024 • edited Loading

yanjunplay commented Jul 24, 2024

haiduo commented Jul 24, 2024

yanjunplay commented Jul 24, 2024 • edited Loading

haiduo commented Jul 24, 2024 • edited Loading

yanjunplay commented Jul 24, 2024 • edited Loading

haiduo commented Jul 24, 2024

qwedaq commented Aug 8, 2024 • edited Loading

haiduo commented Aug 8, 2024

qwedaq commented Aug 8, 2024

haiduo commented Aug 8, 2024

qwedaq commented Aug 8, 2024

haiduo commented Jul 16, 2024 •

edited

Loading

yanjunplay commented Jul 24, 2024 •

edited

Loading

haiduo commented Jul 24, 2024 •

edited

Loading

yanjunplay commented Jul 24, 2024 •

edited

Loading

qwedaq commented Aug 8, 2024 •

edited

Loading