Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: support Qwen2-57B-A14B #7835

Merged
merged 4 commits into from
Jun 17, 2024
Merged

update: support Qwen2-57B-A14B #7835

merged 4 commits into from
Jun 17, 2024

Conversation

legraphista
Copy link
Contributor

Added support for keys moe_intermediate_size and shared_expert_intermediate_size for Qwen2-57B-A14B

caveat:
since self.gguf_writer.add_feed_forward_length was getting called by super().set_gguf_parameters(), I had to copy-paste the existing code to update it properly

@CISC
Copy link
Contributor

CISC commented Jun 8, 2024

You should fix up llama.cpp too.

Specifically, here:

auto n_ff_exp = n_ff / hparams.n_expert_used;

do the same as DeepSeekV2 here:
ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);

and here:
const uint32_t n_ff_exp = hparams.n_ff_exp;

and here:
LLAMA_LOG_INFO("%s: n_ff_exp = %d\n", __func__, hparams.n_ff_exp);

@legraphista
Copy link
Contributor Author

legraphista commented Jun 9, 2024

You should fix up llama.cpp too.

Specifically, here:

auto n_ff_exp = n_ff / hparams.n_expert_used;

do the same as DeepSeekV2 here:

ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);

and here:

const uint32_t n_ff_exp = hparams.n_ff_exp;

and here:

LLAMA_LOG_INFO("%s: n_ff_exp = %d\n", __func__, hparams.n_ff_exp);

I'm sorry, as I'm not that well versed in this topic, can you please explain why those changes need to happen? I'm already setting the expert feed-forward size from the moe_intermediate_size param and the feed-forward size from shared_expert_intermediate_size.

@CISC
Copy link
Contributor

CISC commented Jun 9, 2024

I'm sorry, as I'm not that well versed in this topic, can you please explain why those changes need to happen? I'm already setting the expert feed-forward size from the moe_intermediate_size param and the feed-forward size from shared_expert_intermediate_size.

In the first line I referenced you can see it doesn't actually read the expert feed-forward size you are setting, it's calculated from feed-forward and n_experts_used. Right now that's not an issue as the result will be the same, however that leaves setting the expert feed-forward moot.

For backwards and forwards compatibility you need to read that value from metadata if it exists, and do the calculation if it doesn't.

@legraphista
Copy link
Contributor Author

In the first line I referenced you can see it doesn't actually read the expert feed-forward size you are setting, it's calculated from feed-forward and n_experts_used. Right now that's not an issue as the result will be the same, however that leaves setting the expert feed-forward moot.

For backwards and forwards compatibility you need to read that value from metadata if it exists, and do the calculation if it doesn't.

Ah, I see. Thank you for the explanation. I will add the changes as soon as I get an opportunity.

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shared_exp are now properly calculated
@legraphista
Copy link
Contributor Author

I've pushed the llama.cpp changes as requested & cleanup up the convert-hf-to-gguf.py file, as the workaround was not needed anymore

@legraphista legraphista changed the title update: convert-hf-to-gguf.py to support Qwen2-57B-A14B update: support Qwen2-57B-A14B Jun 10, 2024
llama.cpp Show resolved Hide resolved
@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 12, 2024
@ggerganov
Copy link
Owner

Fix the lint checks (EditorConfig and flake8)

@CISC could you review/test the changes?

Comment on lines 1667 to 1669
if (shared_expert_intermediate_size := self.find_hparam(["shared_expert_intermediate_size","intermediate_size", "n_inner"])) is not None:
self.gguf_writer.add_feed_forward_length(shared_expert_intermediate_size)
logger.info(f"gguf: feed forward length = {shared_expert_intermediate_size}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

@legraphista legraphista Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not removed, but cleaned up.
add_feed_forward_length is called by super().set_gguf_parameters() as it was before my PR.

add_expert_feed_forward_length is the only new thing that is added, and is the one taken into account here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5814 and here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5820

since LLM_KV_EXPERT_FEED_FORWARD_LENGTH is the only one taken into account, if felt irrelevant to set LLM_KV_FEED_FORWARD_LENGTH from shared_expert_intermediate_size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

However, looking at Qwen1.5-MoE-A2.7Bs config.json:

"intermediate_size": 5632,
"moe_intermediate_size": 1408,
"shared_expert_intermediate_size": 5632,
"num_experts_per_tok": 4,

and Qwen2-57B-A14B has the following values:

"intermediate_size": 18944,
"moe_intermediate_size": 2560,
"shared_expert_intermediate_size": 20480,
"num_experts_per_tok": 8,

Although I'm still not sure what Qwen2's intermediate_size refers to, since we are now keeping that as feed_forward_length, would it not make sense to store shared_expert_intermediate_size also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure as what though, @ggerganov any thoughts?

Trying to see if there are any good docs on these values...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure what intermediate_size is either. For Qwen1.5-MoE-A2.7B the values look correct, that's why I'm inclined to say that intermediate_size = 18944 could have been a mistake.

As for storing shared_expert_intermediate_size, I see no specific slot for it, apart from overriding LLM_KV_FEED_FORWARD_LENGTH which might not be the best option (even tho' it's unused)

Copy link
Contributor Author

@legraphista legraphista Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CISC What do you think the next steps should be?

We can leave this PR as is since it works, and then start a new one with a proper impl?

Otherwise, adding mlp_only_layers: int[], shared_expert_intermediate_size: int and decoder_sparse_step: int to the config (and converter) is probably out of scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a shared_expert_feed_forward_length is within the scope of this PR.

The sparse layer stuff should be another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, would you like to simply override LLM_KV_FEED_FORWARD_LENGTH with shared_expert_intermediate_size or create a new one, say LLM_KV_SHARED_EXPERT_FEED_FORWARD_LENGTH?

Copy link
Contributor

@CISC CISC Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter since you need to keep the original value for the other PR. Or, well, at least it will make things a little less confusing.

Copy link
Collaborator

@compilade compilade Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlp_only_layers: int[], and decoder_sparse_step: int

@legraphista I've had to deal with models which have both MoE layers and MLP-only layers in #7531 (Jamba). A new metadata key-value pair is not needed to identify these layers. The easy way is to check for the presence of layers.ffn_gate_inp, which is only there on MoE layers, and when building the compute graph, build_llama can be a good inspiration for how to check this per-layer.

@legraphista
Copy link
Contributor Author

i've updated the GGUFWriter, convert-hf-to-gguf.py and llama.cpp to add support for LLM_KV_SHARED_EXPERT_FEED_FORWARD_LENGTH

@legraphista legraphista requested a review from CISC June 12, 2024 14:02
Copy link
Contributor

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{ LLM_KV_EXPERT_SHARED_COUNT, "%s.expert_shared_count" },
{ LLM_KV_EXPERT_WEIGHTS_SCALE, "%s.expert_weights_scale" },
{ LLM_KV_POOLING_TYPE , "%s.pooling_type" },
{ LLM_KV_LOGIT_SCALE, "%s.logit_scale" },

{ LLM_KV_ATTENTION_HEAD_COUNT, "%s.attention.head_count" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to align all of them, before and after, up to @ggerganov

llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH

n_ff_exp and n_ff_shexp are now properly calculated
Copy link
Contributor

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go! :)

@ggerganov ggerganov requested a review from compilade June 14, 2024 14:18
@ggerganov
Copy link
Owner

We can merge after @compilade approves

@compilade compilade added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jun 14, 2024
@slaren slaren merged commit a94e6ff into ggerganov:master Jun 17, 2024
62 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge ready indicates that this may be ready to merge soon and is just holding out in case of objections python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants