Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return-generation-logits bug when fp8 enabled #2088

Open
binhtranmcs opened this issue Aug 6, 2024 · 1 comment
Open

return-generation-logits bug when fp8 enabled #2088

binhtranmcs opened this issue Aug 6, 2024 · 1 comment
Labels

Comments

@binhtranmcs
Copy link

binhtranmcs commented Aug 6, 2024

I am running llama3 model on an rtx4090 with fp8 quantization. In the result, outputTokenIds seems to be correct but the generationLogits are all wrong. I also tested the same model without quantization and the returned logits are all correct, so I guess there is something wrong when returning the logits with fp8 enable.

How I tested: I deployed the model using tritonserver with tensorrtllm_backend. I changed the bls backend a bit to get the softmax of the generationLogits as well as the tokens generated. I made a call using client.txt and got the result in log.txt.

Command to run the client: python3 client.py -p "hello how are you" --model-name tensorrt_llm_bls --request-id testid --verbose -o 10 --return-generation-logits.

Please have a look. Thanks in advance!

Copy link

github-actions bot commented Sep 6, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

@github-actions github-actions bot added the stale label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant