[Error] Static dimension mismatch while setting input shape. while running llama 3 8B #2071
Closed
2 of 4 tasks
Labels
bug
Something isn't working
System Info
CPU architecture : 86_64
CPU/Host memory size 187GB
GPU properties
GPU name : A10
GPU memory size : 24 GB
Clock frequencies used (if applicable)
Libraries
TensorRT-LLM tag : v0.10.0
Model : Lama 3 8B
Container used :
nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
Engine is build with below cmd:
Who can help?
@byshiue @kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Docker image :
llama 3 8b and quantization scripts from examples of tensorrt_llm
Expected behavior
Model inferencing running sucessfully
actual behavior
Failing with the error
additional notes
I experimented with these configs, but they didn't make any difference
--max_input_len=4096
--max_batch_size=1
--max_num_tokens=4096
The text was updated successfully, but these errors were encountered: