Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about EAGLE-2 #110

Closed
Yipeng1994 opened this issue Jul 26, 2024 · 5 comments
Closed

Question about EAGLE-2 #110

Yipeng1994 opened this issue Jul 26, 2024 · 5 comments

Comments

@Yipeng1994
Copy link

How to prove that EAGLE-2 ensures that the distribution of the generated text remains unchange?

Suppose you use a draft model that is identical to the target model. We have a distribution p(.|.) for the target model and q(.|.) for the draft model.

p=q.

Suppose the probability for the next token after "I" is "am" (0.51) and "have" (0.49). After top-k and reranking, we attempt to accept the first token "am" with a probability:

min(1, q("am")/p("am")) = min(1, 0.51/0.51) = 1

In this scenario, the target model has a probability of 0.49 to accept the token "have." However, when EAGLE-2 is applied, the probability of accepting "have" becomes 0, illustrating how EAGLE-2 modifies the sampling behavior of the target model.

It seems that EAGLE-2 turns the sampling strategy of the target model into the greedy sampling strategy of the draft model.

I am uncertain if I have overlooked any details in the paper that would clarify this conclusion. Could you assist me in understanding the precise mechanics of how EAGLE-2 operates to preserve the distribution while implementing top-k filtering and reranking? Your insight would be greatly appreciated in elucidating this aspect of EAGLE-2's functionality.

@Liyuhui-12
Copy link
Collaborator

When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.

@yanjunplay
Copy link
Contributor

yanjunplay commented Jul 28, 2024

@Yipeng1994 I think your question sounds like not for EAGLE-2, it's more like for the speculative sampling which was proposed in https://arxiv.org/pdf/2211.17192
EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1

@Yipeng1994
Copy link
Author

When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.

I see.
It appears that EAGLE-2 does not directly enhance the acceptance rate for individual tokens. Instead, it utilizes a greater number of draft tokens with a higher summary of confidence scores to improve the average acceptance length per forward pass.

One more question: for the remaining tokens, which in this case is "have", does EAGLE-2 resort to naive sampling instead of using the formula

min(1, q(x)/p(x))

any longer? @Liyuhui-12

EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1

Thanks for your suggestion. I will read through it. @yanjunplay

@Liyuhui-12
Copy link
Collaborator

Liyuhui-12 commented Aug 2, 2024

You can refer to Appendix B of the EAGLE paper https://arxiv.org/abs/2401.15077 for the specific sampling algorithm. It is a recursive version of the original speculative sampling paper https://arxiv.org/pdf/2211.17192

@scott306lr
Copy link

EAGLE employs top-k sampling for verification, sampling tokens in order from left to right.

Therefore, in the multi-round speculative sampling (EAGLE Appendix B method) for loop (i ≤ k),
the probability of the draft model sampling current token x[i] should always be 1 (with the probability of other tokens being 0).
As a result, the acceptance rate of every token will always be r < target_prob(x[i])/1. (while updating target_prob by setting target_prob(rejected tokens) to 0 and normalize every step)

Is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants