Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any hint to improve average accept length for fine tuned LLAMA 70b? #106

Closed
yanjunplay opened this issue Jul 24, 2024 · 4 comments
Closed

Comments

@yanjunplay
Copy link
Contributor

yanjunplay commented Jul 24, 2024

Hello, me again. :-D

As we discussed, I trained a EAGLE-2 model with ShareGPT data my fine tuned Llama 70B model.
I got reasonable speed up and accept length, but the numbers were still lower than the baseline (setup 1 mentioned below). Any hint for me to future tune the models to maximize the impact?

Comparison setup:

1 (baseline). Open sourced llama3 70B https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct + https://huggingface.co/yuhuili/EAGLE-LLaMA3-Instruct-70B
2. My fine tuned llama3 70B + my trained EAGLE-2 (with fine tuned llama weights)

The following are the results with MT-bench (I just used the code and data from the repo with 80 questions)

  • Accept length for setup 1: 2.996 (12234 total decoding steps and 36611 total accept length)
  • Accept length for setup 2: ~2.34

This is the code I used to calculate average accept length yanjunplay@6f6201e

BTW, my calculation formula of accept length might be different from other benchmarks, so I think I care more about the relative number here. :-D

Thanks again!

Update:

BTW, my numbers are different from SpecBench since SpecBench also count the token generated by base(target) model.
So if if we use the SpecBench method, the accept length should +1 and be 3.996 and 3.34 accordingly.

@Liyuhui-12
Copy link
Collaborator

I suspect one possible reason is that your fine-tuned LLAMA 70B's distribution differs significantly from ShareGPT, more than the original LLAMA 70B, possibly due to fine-tuning in a specific domain. One potential solution is to generate data using your fine-tuned LLAMA 70B instead of using the fixed text from ShareGPT.

@yanjunplay
Copy link
Contributor Author

Thanks @Liyuhui-12 !

QQ, is running python -m eagle.ge_data.allocation --outdir [path of data] with my fine-tuned LLAMA 70 base model going to do the trick? I actually did that to generate the training data.

@Liyuhui-12
Copy link
Collaborator

This script extracts features from fixed text; you need to generate the text first.

@yanjunplay
Copy link
Contributor Author

@Liyuhui-12 Thanks a lot! Let me try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants