Random generation when inferring llama 2 with and without LoRA in the same batch #2096
Closed
2 of 4 tasks
Labels
bug
Something isn't working
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Run everything with TensorRT-LLM v0.11.0 in its container
Using
examples/llama/convert_checkpoint.py
run:Build the engine:
examples/run.py
scriptThis inference produces correct result:
However, the next one produces random generation when mixing LoRA usage in the same batch
Full output:
run.log
Expected behavior
The model produces a meaningful result when combining requests with and without lora in the same batch
actual behavior
When combining requests with and without lora in the same batch the model produces a random result for the request without lora
additional notes
The text was updated successfully, but these errors were encountered: