When '--gather_all_token_logits' is enabled, the first token appears to be abnormal." #639

StarrickLiu · 2023-12-12T09:34:29Z

Problem Desciption:

When building the engine with the '--gather_all_token_logits' option, there seems to be an issue.

If constructed with '--gather_all_token_logits', there is a high probability of garbled characters appearing in the first token.

However, if built without '--gather_all_token_logits' while keeping other parameters consistent, the first token appears normal.

llama1 7B:

With --gather_all_token_logits'

Build Command:

python3 build.py --model_dir=/path/to/llama-7b-hf/ \
                  --dtype bfloat16 \
                  --use_gpt_attention_plugin bfloat16 \
                  --use_gemm_plugin bfloat16 \
                  --output_dir /path/to/llama-7b-trt/0.6.1-cf-pe1-gatl-mb-bf16-8_gpu-8k-2k-bs4 \
                  --world_size 8 \
                  --tp_size 8 \
                  --max_input_len 8192 \
                  --max_output_len 2048 \
                  --max_batch_size 4 \
                  --remove_input_padding \
                  --enable_context_fmha \
                  --parallel_build \
                  --multi_block_mode \
                  --gather_all_token_logits \
                  --use_parallel_embedding \
                  --embedding_sharding_dim 1

Without --gather_all_token_logits'

Build Command:

python3 build.py --model_dir=/path/to/llama-7b-hf/ \
                  --dtype bfloat16 \
                  --use_gpt_attention_plugin bfloat16 \
                  --use_gemm_plugin bfloat16 \
                  --output_dir /path/to/llama-7b-trt/0.6.1-cf-pe1-mb-bf16-8_gpu-8k-2k-bs4 \
                  --world_size 8 \
                  --tp_size 8 \
                  --max_input_len 8192 \
                  --max_output_len 2048 \
                  --max_batch_size 4 \
                  --remove_input_padding \
                  --enable_context_fmha \
                  --parallel_build \
                  --multi_block_mode \
                  --use_parallel_embedding \
                  --embedding_sharding_dim 1

Worth noting is that this issue has been tested in previous versions as well as in version 0.6.1.

The text was updated successfully, but these errors were encountered:

byshiue · 2023-12-15T14:30:06Z

It should be a bug of 0.6.1, and should be fixed in latest main branch. Please take a try.

StarrickLiu · 2023-12-18T08:38:07Z

In testing with the new version, everything is fine. Thank you.

salaki · 2024-02-20T21:27:13Z

@StarrickLiu, wondering if you successfully got the logtis. Is the logits for outputed tokens or all token for vocab?

byshiue self-assigned this Dec 15, 2023

byshiue added the triaged Issue has been triaged by maintainers label Dec 15, 2023

kaiyux mentioned this issue Dec 15, 2023

Update TensorRT-LLM #667

Merged

StarrickLiu closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When '--gather_all_token_logits' is enabled, the first token appears to be abnormal." #639

When '--gather_all_token_logits' is enabled, the first token appears to be abnormal." #639

StarrickLiu commented Dec 12, 2023 •

edited

Loading

byshiue commented Dec 15, 2023

StarrickLiu commented Dec 18, 2023

salaki commented Feb 20, 2024

When '--gather_all_token_logits' is enabled, the first token appears to be abnormal." #639

When '--gather_all_token_logits' is enabled, the first token appears to be abnormal." #639

Comments

StarrickLiu commented Dec 12, 2023 • edited Loading

Problem Desciption:

llama1 7B:

With --gather_all_token_logits'

Without --gather_all_token_logits'

byshiue commented Dec 15, 2023

StarrickLiu commented Dec 18, 2023

salaki commented Feb 20, 2024

StarrickLiu commented Dec 12, 2023 •

edited

Loading