Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf-py : fix some metadata name extraction edge cases #8591

Merged
merged 5 commits into from
Jul 21, 2024

Conversation

compilade
Copy link
Collaborator

@compilade compilade commented Jul 19, 2024

Should fix the problem mentioned in #8579 (comment) by @maziyarpanahi, which was caused by #7499 not pruning the empty parts of a name.

I've also made convert_lora_to_gguf.py use the LoRA adapter directory for the model card path.

TODO

  • Test conversion with a LoRA adapter with convert_lora_to_gguf.py and check if the metadata is correct.

* convert_lora : use the lora dir for the model card path
@compilade compilade requested a review from mofosyne July 19, 2024 16:47
@github-actions github-actions bot added the python python script changes label Jul 19, 2024
@compilade compilade added bugfix fixes an issue or bug Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels Jul 19, 2024
Multiple finetune versions are now joined together,
and the removal of the basename annotation on trailing versions
is more robust.
@compilade
Copy link
Collaborator Author

@mofosyne Regarding the use of title case for the finetune part when generating the "standard" name,

finetune = f"-{finetune_string.strip().title().replace(' ', '-')}" if finetune_string is not None else ""

I think it looks a bit weird for a few models I've recently tried converting:

  • Gemma-2-9B-It-F16.gguf (seems from Italy instead of instruct-tuned)
    • with original case for finetune: Gemma-2-9B-it-F16.gguf
    • with original case for the basename too: gemma-2-9B-it-F16.gguf
  • Gemma-2B-Zephyr-Dpo-F16.gguf (dpo should not be capitalized)
    • with original case for finetune: Gemma-2B-zephyr-dpo-F16.gguf
    • with orignial case for the basename too: gemma-2B-zephyr-dpo-F16.gguf

Should the original case be preserved for finetune?
What about the basename part?

Do you think I should change this in this PR or another one?


(out of scope, but still want to comment about this)

Also note that the naming convention at https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention should be updated to reflect the format used in #7499 (especially the finetune part which is missing in the spec). If the word-only size labels are too complicated for the spec, they could still be detected when converting, but replaced with the calculated size label instead.

Also I don't think we're close to having any model with Quadrillions of parameter so I suggest removing this, especially since there is no example of what suffix is normally used for that case. (might be P for peta- as in the SI prefixes)

Let me know what you think about this.

@mofosyne
Copy link
Collaborator

@compilade well if it's an edge case then it's part of the PR, depending on how urget this fix needs to get in. Is it breaking anyone's workflow currently, if not then we can take our time making it better.

I don't think there is any one right answer here... but we can assume anything read to the kv store is of the correct capitalisation... as for anything heuristically obtained... it's the wild west.

@mofosyne
Copy link
Collaborator

mofosyne commented Jul 20, 2024

@compilade and yes the gguf doc is outdated and from what I understand the consensus is that code is the main spec, so our implementation supersedes the document. We should update it.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

As for the Quadrillion, I'll suggest keeping it in... I would generally like to have +1 ahead of whatever we can immediately conceptionalise in the near future. And the near future is likely Trillion so +1 would be quadrillion. As per name of large number (wikipedia)

@compilade
Copy link
Collaborator Author

compilade commented Jul 20, 2024

And the near future is likely Trillion so +1 would be quadrillion.

@mofosyne

I think a quadrillion can safely be 1000T instead.

Personally I think we're quite far from making models with 10 times more parameters than the number of neurons in a human brain.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

Totally agree. The heuristics are there to turn non-compliant but commonly used names into compliant names. The spec only needs to deal with the resulting names, not the mess they come from.

@mofosyne
Copy link
Collaborator

mofosyne commented Jul 20, 2024

@compilade

@mofosyne

I think a quadrillion can safely be 1000T instead.

Personally I think we're quite far from making models with 10 times more parameters than the number of neurons in a human brain.

Well people in the past thought that 640K is ought to be enough for anyone and I like to be optimistic for the development of humanity that we can create something like that in the future. Not going to fight to hard to keep it, but I'll rather it stay.

Also I don't think the heuristics approach we have is a spec, it's a nice to have feature in the converter python. But the gguf spec for naming should contain only what we are aiming for people to stick to.

Totally agree. The heuristics are there to turn non-compliant but commonly used names into compliant names. The spec only needs to deal with the resulting names, not the mess they come from.

Glad we are on the same page... we just gotta remember to do it...


So... will merge soon if no objection to this PR pops up

@mofosyne mofosyne added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jul 20, 2024
@mofosyne
Copy link
Collaborator

Oh noticed there is this todo

Test conversion with a LoRA adapter with convert_lora_to_gguf.py and check if the metadata is correct.

Not familiar with this so will hold off and let @compilade handle it

The default filename was previously hardcoded.

* convert_hf : Model.fname_out can no longer be None
Some models use acronyms in lowercase,
which can't be title-cased like other words,
so it's best to simply use the same case
as in the original model name.

Note that the size label still has an uppercased suffix
to make it distinguishable from the context size of a finetune.
@compilade
Copy link
Collaborator Author

I've tested LoRA generation, and it seems fine.

Here's some gguf-dump.py output for https://huggingface.co/grimjim/Llama-3-Instruct-abliteration-LoRA-8B:

INFO:gguf-dump:* Loading: /srv/LLMstash/tmp/metadata-names/Llama-3-8B-Instruct-abliteration-F16-LoRA.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 42 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 450
      3: UINT64     |        1 | GGUF.kv_count = 39
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.type = 'adapter'
      6: STRING     |        1 | adapter.type = 'lora'
      7: STRING     |        1 | general.name = 'Llama 3 Instruct Abliteration LoRA 8B'
      8: STRING     |        1 | general.finetune = 'Instruct-abliteration'
      9: STRING     |        1 | general.basename = 'Llama-3'
     10: STRING     |        1 | general.size_label = '8B'
     11: STRING     |        1 | general.license = 'llama3'
     12: UINT32     |        1 | general.base_model.count = 2
     13: STRING     |        1 | general.base_model.0.name = 'Meta Llama 3 8B Instruct'
     14: STRING     |        1 | general.base_model.0.organization = 'Meta Llama'
     15: STRING     |        1 | general.base_model.0.repo_url = 'https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct'
     16: STRING     |        1 | general.base_model.1.name = 'Meta Llama 3 8B Instruct Abliterated v3'
     17: STRING     |        1 | general.base_model.1.version = 'v3'
     18: STRING     |        1 | general.base_model.1.organization = 'Failspy'
     19: STRING     |        1 | general.base_model.1.repo_url = 'https://huggingface.co/failspy/Meta-Llama-3-8B-Instruct-abli'
     20: [STRING]   |        2 | general.tags
     21: FLOAT32    |        1 | adapter.lora.alpha = 32.0
...(then follows hparams and the tokenizer)...

There's still some title-casing of the metadata which I didn't remove, but at least it's easy to see that the metadata from the LoRA adapter's model card was correctly used to link to the base models (even if the complete links are truncated in the dump output).

I don't think the hparams and the tokenizer are really used in GGUF LoRA adapters though, but that's out of the scope of this PR.

I consider this merge ready, and will merge after #8597.

@compilade compilade merged commit 328884f into master Jul 21, 2024
12 checks passed
@compilade
Copy link
Collaborator Author

@mofosyne, I recently tried converting https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct, and got surprised that the default name chosen for --outtype q8_0 is cosmo2-135M-webinst-sc2-Q8_0.gguf.

I think maybe _name_or_path from config.json should not be used for the name extraction. The whole SmolLM suite seems to use weird _name_or_path values, especially the non-Instruct variants.

https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct:

  "_name_or_path": "HuggingFaceTB/cosmo2-135M-webinst-sc2",

https://huggingface.co/HuggingFaceTB/SmolLM-135M:

  "_name_or_path": "/fsx/elie_bakouch/checkpoints/final-149M/600000",

@mofosyne
Copy link
Collaborator

mofosyne commented Jul 27, 2024

@compilade urgh, perhaps we should remove _name_or_path from the heuristics.

edit: Also I see people in https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/761#669e1119f073a1a9276f7f90 discussing pulling metadata.json... not sure if I'm happy with that as I thought that was suppose to be the model card's job or the gguf kv store. But this may be an argument for us 'defining' the model card standard in terms of parsing from it... as the huggingface current approach is pretty barebone. This would then encourage model card writers to start filling in their model card with more metadata perhaps?

(On the other hand... there can be an argument for a metadata.json file... that I've haven't thought of yet)

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* gguf-py : fix some metadata name extraction edge cases

* convert_lora : use the lora dir for the model card path

* gguf-py : more metadata edge cases fixes

Multiple finetune versions are now joined together,
and the removal of the basename annotation on trailing versions
is more robust.

* gguf-py : add more name metadata extraction tests

* convert_lora : fix default filename

The default filename was previously hardcoded.

* convert_hf : Model.fname_out can no longer be None

* gguf-py : do not use title case for naming convention

Some models use acronyms in lowercase,
which can't be title-cased like other words,
so it's best to simply use the same case
as in the original model name.

Note that the size label still has an uppercased suffix
to make it distinguishable from the context size of a finetune.
@compilade
Copy link
Collaborator Author

But this may be an argument for us 'defining' the model card standard in terms of parsing from it... as the huggingface current approach is pretty barebone. This would then encourage model card writers to start filling in their model card with more metadata perhaps?

@mofosyne

Yes, using the model card metadata for overrides seems better than a metadata.json tied to the internal GGUF metadata key names, especially if the metadata is useful for more than only llama.cpp-related projects.

But there are 2 model cards to consider: the original model card, and the model card of the converted model, which I guess could be taken from the directory where the GGUF model is exported, but some people use a single directory for all their conversions, which might be problematic for fields like the license.

Something else which could be useful is to offer an option to generate an initial model card for GGUF conversions based on the original model card with some extra metadata deduced from the model and/or update the metadata of an existing model card.
There are downsides with generating a model card, however, like users not always modifying it afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug merge ready indicates that this may be ready to merge soon and is just holding out in case of objections python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants