update: support Qwen2-57B-A14B #7835

legraphista · 2024-06-08T11:19:38Z

Added support for keys moe_intermediate_size and shared_expert_intermediate_size for Qwen2-57B-A14B

caveat:
since self.gguf_writer.add_feed_forward_length was getting called by super().set_gguf_parameters(), I had to copy-paste the existing code to update it properly

CISC · 2024-06-08T12:13:26Z

You should fix up llama.cpp too.

Specifically, here:

llama.cpp/llama.cpp

Line 5808 in c9ee711

auto n_ff_exp = n_ff / hparams.n_expert_used;

do the same as DeepSeekV2 here:

llama.cpp/llama.cpp

Line 4484 in c9ee711

ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);

and here:

llama.cpp/llama.cpp

Line 6352 in c9ee711

const uint32_t n_ff_exp = hparams.n_ff_exp;

and here:

llama.cpp/llama.cpp

Line 5017 in c9ee711

    
           LLAMA_LOG_INFO("%s: n_ff_exp             = %d\n",     __func__, hparams.n_ff_exp);

legraphista · 2024-06-09T15:26:33Z

You should fix up llama.cpp too.

Specifically, here:

llama.cpp/llama.cpp

Line 5808 in c9ee711

auto n_ff_exp = n_ff / hparams.n_expert_used;

do the same as DeepSeekV2 here:

llama.cpp/llama.cpp

Line 4484 in c9ee711

ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);

and here:

llama.cpp/llama.cpp

Line 6352 in c9ee711

const uint32_t n_ff_exp = hparams.n_ff_exp;

and here:

llama.cpp/llama.cpp

Line 5017 in c9ee711

LLAMA_LOG_INFO("%s: n_ff_exp = %d\n", __func__, hparams.n_ff_exp);

I'm sorry, as I'm not that well versed in this topic, can you please explain why those changes need to happen? I'm already setting the expert feed-forward size from the moe_intermediate_size param and the feed-forward size from shared_expert_intermediate_size.

CISC · 2024-06-09T17:07:14Z

I'm sorry, as I'm not that well versed in this topic, can you please explain why those changes need to happen? I'm already setting the expert feed-forward size from the moe_intermediate_size param and the feed-forward size from shared_expert_intermediate_size.

In the first line I referenced you can see it doesn't actually read the expert feed-forward size you are setting, it's calculated from feed-forward and n_experts_used. Right now that's not an issue as the result will be the same, however that leaves setting the expert feed-forward moot.

For backwards and forwards compatibility you need to read that value from metadata if it exists, and do the calculation if it doesn't.

legraphista · 2024-06-09T17:11:57Z

In the first line I referenced you can see it doesn't actually read the expert feed-forward size you are setting, it's calculated from feed-forward and n_experts_used. Right now that's not an issue as the result will be the same, however that leaves setting the expert feed-forward moot.

For backwards and forwards compatibility you need to read that value from metadata if it exists, and do the calculation if it doesn't.

Ah, I see. Thank you for the explanation. I will add the changes as soon as I get an opportunity.

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated

legraphista · 2024-06-10T17:55:25Z

I've pushed the llama.cpp changes as requested & cleanup up the convert-hf-to-gguf.py file, as the workaround was not needed anymore

llama.cpp

ggerganov · 2024-06-12T11:21:13Z

Fix the lint checks (EditorConfig and flake8)

@CISC could you review/test the changes?

CISC · 2024-06-12T11:33:36Z

convert-hf-to-gguf.py

-        if (shared_expert_intermediate_size := self.find_hparam(["shared_expert_intermediate_size","intermediate_size", "n_inner"])) is not None:
-            self.gguf_writer.add_feed_forward_length(shared_expert_intermediate_size)
-            logger.info(f"gguf: feed forward length = {shared_expert_intermediate_size}")


Why was this removed?

not removed, but cleaned up.
add_feed_forward_length is called by super().set_gguf_parameters() as it was before my PR.

add_expert_feed_forward_length is the only new thing that is added, and is the one taken into account here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5814 and here https://github.com/ggerganov/llama.cpp/pull/7835/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5820

since LLM_KV_EXPERT_FEED_FORWARD_LENGTH is the only one taken into account, if felt irrelevant to set LLM_KV_FEED_FORWARD_LENGTH from shared_expert_intermediate_size

I see.

However, looking at Qwen1.5-MoE-A2.7Bs config.json:

"intermediate_size": 5632, "moe_intermediate_size": 1408, "shared_expert_intermediate_size": 5632, "num_experts_per_tok": 4,

and Qwen2-57B-A14B has the following values:

"intermediate_size": 18944, "moe_intermediate_size": 2560, "shared_expert_intermediate_size": 20480, "num_experts_per_tok": 8,

Although I'm still not sure what Qwen2's intermediate_size refers to, since we are now keeping that as feed_forward_length, would it not make sense to store shared_expert_intermediate_size also?

Not sure as what though, @ggerganov any thoughts?

Trying to see if there are any good docs on these values...

I'm unsure what intermediate_size is either. For Qwen1.5-MoE-A2.7B the values look correct, that's why I'm inclined to say that intermediate_size = 18944 could have been a mistake.

As for storing shared_expert_intermediate_size, I see no specific slot for it, apart from overriding LLM_KV_FEED_FORWARD_LENGTH which might not be the best option (even tho' it's unused)

@CISC What do you think the next steps should be?

We can leave this PR as is since it works, and then start a new one with a proper impl?

Otherwise, adding mlp_only_layers: int[], shared_expert_intermediate_size: int and decoder_sparse_step: int to the config (and converter) is probably out of scope.

I think adding a shared_expert_feed_forward_length is within the scope of this PR.

The sparse layer stuff should be another PR.

Ok, would you like to simply override LLM_KV_FEED_FORWARD_LENGTH with shared_expert_intermediate_size or create a new one, say LLM_KV_SHARED_EXPERT_FEED_FORWARD_LENGTH?

The latter since you need to keep the original value for the other PR. Or, well, at least it will make things a little less confusing.

mlp_only_layers: int[], and decoder_sparse_step: int

@legraphista I've had to deal with models which have both MoE layers and MLP-only layers in #7531 (Jamba). A new metadata key-value pair is not needed to identify these layers. The easy way is to check for the presence of layers.ffn_gate_inp, which is only there on MoE layers, and when building the compute graph, build_llama can be a good inspiration for how to check this per-layer.

legraphista · 2024-06-12T14:02:16Z

i've updated the GGUFWriter, convert-hf-to-gguf.py and llama.cpp to add support for LLM_KV_SHARED_EXPERT_FEED_FORWARD_LENGTH

CISC

LGTM

CISC · 2024-06-12T14:27:54Z

llama.cpp

+    { LLM_KV_EXPERT_SHARED_COUNT,               "%s.expert_shared_count"               },
+    { LLM_KV_EXPERT_WEIGHTS_SCALE,              "%s.expert_weights_scale"              },
+    { LLM_KV_POOLING_TYPE ,                     "%s.pooling_type"                      },
+    { LLM_KV_LOGIT_SCALE,                       "%s.logit_scale"                       },

    { LLM_KV_ATTENTION_HEAD_COUNT,          "%s.attention.head_count"             },


You might have to align all of them, before and after, up to @ggerganov

llama.cpp

gguf-py/gguf/constants.py

llama.cpp

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated

CISC

Good to go! :)

ggerganov · 2024-06-14T14:19:02Z

We can merge after @compilade approves

update: convert-hf-to-gguf.py to support Qwen2-57B-A14B

284cec4

legraphista mentioned this pull request Jun 8, 2024

Bug: QWEN2 MoE imatrix contains nan's after generating it #7816

Closed

github-actions bot added the python python script changes label Jun 8, 2024

fix: QWEN2MOE support for expert_feed_forward_length

aa8a7cd

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated

legraphista changed the title ~~update: convert-hf-to-gguf.py to support Qwen2-57B-A14B~~ update: support Qwen2-57B-A14B Jun 10, 2024

ggerganov reviewed Jun 11, 2024

View reviewed changes

llama.cpp Show resolved Hide resolved

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 12, 2024

update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM

06531cb

legraphista force-pushed the master branch from 173788d to 06531cb Compare June 12, 2024 11:30

CISC reviewed Jun 12, 2024

View reviewed changes

legraphista requested a review from CISC June 12, 2024 14:02

CISC approved these changes Jun 12, 2024

View reviewed changes

CISC reviewed Jun 12, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

compilade reviewed Jun 12, 2024

View reviewed changes

gguf-py/gguf/constants.py Outdated Show resolved Hide resolved

ggerganov reviewed Jun 14, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

ggerganov approved these changes Jun 14, 2024

View reviewed changes

legraphista force-pushed the master branch from 559b778 to 46777dd Compare June 14, 2024 10:38

CISC reviewed Jun 14, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

legraphista force-pushed the master branch from 46777dd to 7fbb7a1 Compare June 14, 2024 11:29

fix: QWEN2MOE support for expert_feed_forward_length

d945226

previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated

legraphista force-pushed the master branch from 7fbb7a1 to d945226 Compare June 14, 2024 11:38

CISC approved these changes Jun 14, 2024

View reviewed changes

ggerganov requested a review from compilade June 14, 2024 14:18

compilade approved these changes Jun 14, 2024

View reviewed changes

compilade added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jun 14, 2024

slaren approved these changes Jun 17, 2024

View reviewed changes

slaren merged commit a94e6ff into ggerganov:master Jun 17, 2024
62 of 67 checks passed

CoreJa mentioned this pull request Jun 20, 2024

Update llama.cpp to support qwen2-57B-A14B pls ollama/ollama#5157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: support Qwen2-57B-A14B #7835

update: support Qwen2-57B-A14B #7835

legraphista commented Jun 8, 2024

CISC commented Jun 8, 2024 •

edited

Loading

legraphista commented Jun 9, 2024 •

edited

Loading

CISC commented Jun 9, 2024

legraphista commented Jun 9, 2024

legraphista commented Jun 10, 2024

ggerganov commented Jun 12, 2024

CISC Jun 12, 2024

legraphista Jun 12, 2024 •

edited

Loading

CISC Jun 12, 2024

CISC Jun 12, 2024

legraphista Jun 12, 2024

legraphista Jun 12, 2024 •

edited

Loading

CISC Jun 12, 2024

legraphista Jun 12, 2024

CISC Jun 12, 2024 •

edited

Loading

compilade Jun 12, 2024 •

edited

Loading

legraphista commented Jun 12, 2024

CISC left a comment

CISC Jun 12, 2024

CISC left a comment

ggerganov commented Jun 14, 2024

update: support Qwen2-57B-A14B #7835

update: support Qwen2-57B-A14B #7835

Conversation

legraphista commented Jun 8, 2024

CISC commented Jun 8, 2024 • edited Loading

legraphista commented Jun 9, 2024 • edited Loading

CISC commented Jun 9, 2024

legraphista commented Jun 9, 2024

legraphista commented Jun 10, 2024

ggerganov commented Jun 12, 2024

CISC Jun 12, 2024

Choose a reason for hiding this comment

legraphista Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

CISC Jun 12, 2024

Choose a reason for hiding this comment

CISC Jun 12, 2024

Choose a reason for hiding this comment

legraphista Jun 12, 2024

Choose a reason for hiding this comment

legraphista Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

CISC Jun 12, 2024

Choose a reason for hiding this comment

legraphista Jun 12, 2024

Choose a reason for hiding this comment

CISC Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

compilade Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

legraphista commented Jun 12, 2024

CISC left a comment

Choose a reason for hiding this comment

CISC Jun 12, 2024

Choose a reason for hiding this comment

CISC left a comment

Choose a reason for hiding this comment

ggerganov commented Jun 14, 2024

CISC commented Jun 8, 2024 •

edited

Loading

legraphista commented Jun 9, 2024 •

edited

Loading

legraphista Jun 12, 2024 •

edited

Loading

legraphista Jun 12, 2024 •

edited

Loading

CISC Jun 12, 2024 •

edited

Loading

compilade Jun 12, 2024 •

edited

Loading