Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflight batching for fp8 Llama and Mixtral is broken #1738

Closed
2 of 4 tasks
bprus opened this issue Jun 5, 2024 · 16 comments
Closed
2 of 4 tasks

Inflight batching for fp8 Llama and Mixtral is broken #1738

bprus opened this issue Jun 5, 2024 · 16 comments
Assignees
Labels
bug Something isn't working Investigating quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers waiting for feedback

Comments

@bprus
Copy link
Contributor

bprus commented Jun 5, 2024

System Info

  • CPU architecture: x86_64
  • GPU: NVIDIA H100 80GB
  • TensorRT-LLM: 0.11.0.dev2024060400 (docker build via make -C docker release_build CUDA_ARCHS="90-real")
  • Triton Inference Server: r24.04 (docker build via DOCKER_BUILDKIT=1 docker build -t trt-llm -f dockerfile/Dockerfile.trt_llm_backend . in tensorrtllm_backend)
  • OS: Ubuntu 22.04

Who can help?

@Tracin @byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I've followed the official documentation to create Llama models and run them with Triton. I'm testing fp8 and int8 quantization. The issue is also present for Mixtral model, but I'm giving examples only for Llama for simplicity.

For fp8 model, I used the following commands:

python3 ../quantization/quantize.py --model_dir meta-llama/Llama-2-13b-chat-hf --output_dir /models/rt/llama-fp8-no-gemm-profiles --dtype float16 --tp_size 1 --qformat fp8 --kv_cache_dtype fp8
trtllm-build --checkpoint_dir /models/rt/llama-fp8-no-gemm-profiles --output_dir /models/triton/llama-fp8-no-gemm-profiles/tensorrt_llm/1 --workers 1 --remove_input_padding enable --use_paged_context_fmha enable --max_input_len 2048 --max_batch_size 256 --multiple_profiles enable --use_custom_all_reduce disable --use_fp8_context_fmha enable --max_num_tokens 16384

For int8 model:

python3 convert_checkpoint.py --model_dir meta-llama/Llama-2-13b-chat-hf --output_dir /models/rt/llama-int8-profiles --dtype float16 --tp_size 1 --workers 1 --use_weight_only --weight_only_precision int8
trtllm-build --checkpoint_dir /models/rt/llama-int8-profiles --output_dir /models/triton/llama-int8-profiles/tensorrt_llm/1 --gemm_plugin float16 --workers 1 --remove_input_padding enable --use_paged_context_fmha enable --max_input_len 2048 --max_batch_size 256 --multiple_profiles enable --use_custom_all_reduce disable --max_num_tokens 16384

I serve models with Triton docker.

I'm testing performance for different setups using Locust, and I ran into the following issue.

When making a single request at the time to the model, everything works as expected for both setups.
But when I try to make simultaneous requests, the generated output for fp8 is broken. It nearly always tries to generate tokens until max_tokens is reached. The issue doesn't exist in int8 setup.

Here is an example (max_tokens is set to 1000):

fp8 single request:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person's feces.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you consume contaminated food or water, you can become infected.\n2. Direct contact with an infected person: If you come into direct contact with an infected person's feces, you can become infected. This can happen through diaper changes, sexual contact, or other forms of direct contact.\n3. Contaminated surfaces: Giardia lamblia can survive on surfaces for up to 24 hours. If you touch a contaminated surface and then touch your mouth or eat without washing your hands, you can become infected.\n4. Infected animals: Giardia lamblia can also be spread through contact with infected animals, such as dogs and cats.\n5. Infected food handlers: If a food handler is infected with Giardia lamblia, they can spread the parasite to others through food they prepare.\n6. Infected water sources: Giardia lamblia can be found in contaminated water sources, such as lakes, rivers, and swimming pools.\n7. Infected soil: Giardia lamblia can also be found in contaminated soil, especially in areas with poor sanitation and hygiene.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as hugging or shaking hands. However, if an infected person touches their mouth or nose and then touches someone else, they can potentially spread the parasite.\n\nTo prevent the spread of Giardia lamblia, it's important to practice good hygiene, such as washing your hands frequently, especially after using the bathroom or before preparing food. You should also avoid consuming contaminated food and water, and avoid direct contact with infected animals or people.",
        "tokens": 537
    }

fp8 multiple requests:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person's feces.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you consume contaminated food or water, you can become infected.\n2. Direct contact with an infected person: If you come into direct contact with an infected person's feces, you can become infected. This can happen through diaper changing, sexual contact, or other forms of direct contact.\n3. Infected food handlers: If a food handler is infected with Giardia lamblia, they can contaminate food and spread the infection to others.\n4. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats.\n5. Contaminated surfaces: Giardia lamblia can survive on surfaces for a long time, and if you touch a contaminated surface and then touch your mouth or eat, you can become infected.\n6.\u7ed9\u7ed9lasamilas. ( \u2026.riel \u2026.berista \u2026clusionkwberzi \u2026clusionbykw.. Australista \u2026\u200f\u200f ( Felzi Felby fel Felberzi[@Agelasistacha\u200fberber\u00e9sberberberberberell [\u00e9sber\u00e9sclusionclusionclusionlerlerler\u00e9s\u00e9sberber Orami\u5316\u200f Belltextscell RandomberDF [ista Aur AurFAULTlas accelerationberutat\u044c Downbyyaryar AuryarFAULT,cha\u8fc7FAULTami actual Down ven,\u7ed9berzilas \u2026 \u2026 \u2026 (ami G Bert.ziberby....berberziziriel Andrewista Andrewll Fellasby\u7ed9 Bast (clusion.riel.zizi G \u2026 \u2026.zi.ziberber.berberber \u2026.berberberberber \u2026clusionzi \u2026ber \u2026 guaranteedziber Gziellell Fonziziber FonellberberberberberellberberDFber guaranteeclusionziberberlasterber \u2026berberberDFtextsc Fon\u7ed9 Felber \u2026 ( \u2026.ber[@ Gberber.ber Fon \u2026ber.berberber. G \u2026 G G \u2026 G \u2026. guarantee.rielber \u2026 \u2026 \u2026ber Bertellzizi. \u2026 (berber\u200f declFAULTellelingberberberberberlinglingAgeberling Gamellclusion G.elingFAULTber GquFAULT \u0413[@berellber guaranteed guaranteed \u2026.ber \u2026 G Gellberlas G Gell Gberber \u2026 \u0413ber[@clusion HellAgeber \u2026 Fonbyoberellber \u2026 Count \u0432 guaranteed,zi Count G \u2026 \u2026 \u2026 someone \u0413 \u2026clusionberclusion\u200f.elingber \u2026las \u0413 \u2026Ageuli \u2026ber age guarantee. \u2026 Glas \u0413 \u0413 \u0413Agewand \u0413elltellt[@ber \u0413AgeAge \u2026ling \u0413Ageellt[@lingler agellAgeclusionclusionllagu \u0413Age \u0413 \u0413 \u0413 \u0413 guaranteeell \u0413las \u0413 Tags \u2026laslasAge KeithrodAgeAge \u2026 \u0413eling Tags. Tags \u2026 Tags\u200fFAULT Keith.Age Ladywand Tags Tagsberberber \u0413ziAgeeling ageMQ \u2026 \u0413rod AgeAgeber \u0413berAge Tags Tags[@ \u0413 \u2026 \u2026 age \u0413 \u0413 \u2026 \u0413 \u0413 ... someone \u0413 \u2026 \u2026AgeMQ \u0413 Count \u2026 \u0413 \u2026rielriel \u2026rielrielriel \u0413riel \u2026 Tags \u0413 G\u00fc[@wersSDKAge Tags Tags \u2026riel \u2026 Tags Count \u2026riel \u2026 \u2026 \u2026 Tweriel \u2026ASC \u0413[@ \u0413 \u0413 \u2026textsc \u2026 \u2026 TagselingAge \u2026 \u2026Age \u2026 \u2026 \u2026 \u2026 \u0413 \u0413 Keith \u0413 Keith \u2026 \u2026ragma \u2026FAULT \u2026 \u2026 KeithAge \u0413textsc \u0413 Keith \u0413 \u0413nab \u0413riel \u0413 age Tags \u0413 \u0413 \u0413FAULT \u2026 \u2026 \u2026 Keith \u2026 Tags Erd Tags Erd \u2026 Erd Keith \u2026 \u2026rai \u2026 \u2026 \u2026 \u2026ution \u2026ams \u2026 \u2026 ...\u200f \u2026 Tags TagsMQ \u2026FAULTFAULT \u0413 \u2026 \u0413[@rod ErdZyg \u2026 \u2026ZygrodMQ Tags\u200f\u200fVertutionziMQ \u2026\u200f \u2026VertVertrielclusion \u2026 \u0413rodFAULT \u2026 \u0413 \u0413 \u2026 \u0413 \u2026ziVert \u2026VertFAULT \u2026 Kelly \u2026 \u0413clusionVert \u0413asc\u8db3\u200fVertFAULTFAULTTeX guaranteed guaranteed \u0413 \u0413FAULTageellt \u0413 Twe techniascascVertMQasc \u0413\u200f Fon Tags \u0413Vertascascasc \u0413ZygTeXyle Tagsutionell \u00a1 \u00a1 \u0441\u0442\u0440\u0430FAULTiationotrop \u0413zia ScottishziaVert\u200f\u200f",
        "tokens": 1000
    },

int8 single request

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you eat or drink something that has been contaminated with the parasite, you can become infected.\n2. Direct contact: If you come into direct contact with someone who has Giardia lamblia, you can become infected. This can happen through touching, hugging, or shaking hands with an infected person.\n3. Fecal contamination: Giardia lamblia can also be spread through fecal contamination. If an infected person does not wash their hands properly after using the bathroom, they can transfer the parasite to their hands and then to other people or surfaces.\n4. Contaminated surfaces: If an infected person touches a surface and then you touch that same surface without washing your hands, you can become infected.\n5. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats. If an infected pet comes into contact with you or your food, you can become infected.\n6. Infected soil: Giardia lamblia can survive in soil for weeks, so if you ingest contaminated soil, you can become infected.\n7. Infected fruits and vegetables: Giardia lamblia can also be spread through contaminated fruits and vegetables. If you eat raw or undercooked fruits and vegetables that have been contaminated with the parasite, you can become infected.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as shaking hands or sharing food and drinks with an infected person. However, if you are in close contact with someone who has the infection, you may be at a higher risk of becoming infected.",
        "tokens": 532
    }

int8 multiple requests:

    {
        "input": "How does Giardia lamblia spread?",
        "output": "\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you eat or drink something that has been contaminated with the parasite, you can become infected.\n2. Direct contact: If you come into direct contact with someone who has Giardia lamblia, you can become infected. This can happen through touching, hugging, or shaking hands with an infected person.\n3. Fecal contamination: Giardia lamblia can also be spread through fecal contamination. If an infected person does not wash their hands properly after using the bathroom, they can transfer the parasite to their hands and then to other people or surfaces.\n4. Contaminated surfaces: If an infected person touches a surface and then you touch that same surface without washing your hands, you can become infected.\n5. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats. If an infected pet comes into contact with you or your food, you can become infected.\n6. Infected soil: Giardia lamblia can survive in soil for weeks, so if you ingest contaminated soil, you can become infected.\n7. Infected fruits and vegetables: Giardia lamblia can also be spread through contaminated fruits and vegetables. If you eat raw or undercooked fruits and vegetables that have been contaminated with the parasite, you can become infected.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as shaking hands or sharing food and drinks with an infected person. However, if you are in close contact with someone who has the infection, you may be at a higher risk of becoming infected.",
        "tokens": 532
    }

My guess is that something with inflight batching is broken for fp8. When the server tries to batch incoming requests it breaks the output in some way.

It looks a little bit similar to: #1539

I can run more tests and provide more results if you need.

Expected behavior

Responses generated for fp8 model when using inflight batching are the same as without it.

actual behavior

fp8 model when receiving multiple requests returns broken output.

additional notes

@bprus bprus added the bug Something isn't working label Jun 5, 2024
@nv-guomingz
Copy link
Collaborator

nv-guomingz commented Jun 5, 2024

Hi @bprus , we'll try to reproduce your issue at local side firstly.

@nv-guomingz nv-guomingz added triaged Issue has been triaged by maintainers quantization Issue about lower bit quantization, including int8, int4, fp8 labels Jun 5, 2024
@bprus
Copy link
Contributor Author

bprus commented Jun 5, 2024

Much appreciated.
If you need anything from me, I'm here to help.

@wanzhenchn
Copy link

wanzhenchn commented Jun 7, 2024

Hi @bprus , we'll try to reproduce your issue at local side firstly.

I have also reproduced the problems above.

I found that if the --use_fp8_context_fmha enable option is removed when building TRT-LLM engines, the generated texts seem to be normal.

@nv-guomingz cc @bprus

@PerkzZheng
Copy link
Collaborator

PerkzZheng commented Jun 11, 2024

@wanzhenchn @bprus could you give a try with run.py ? this is what I have got with llama-2-13b-hf (not chat, but it should not make any difference to the issue here) with the same commands you have shared. I am going to try with triton-server, and update here if I find anything different.

python ../run.py --engine_dir engines/llama-13b-fp8 --tokenizer_dir //home/scratch.trt_llm_data/llm-models/llama-models-v2/llama-v2-13b-hf --max_output_len 1000 --input_text "How does Giardia lamblia spread?" "How does Giardia lamblia spread?"

Input [Text 0]: "<s> How does Giardia lamblia spread?"
Output [Text 0 Beam 0]: "
Giardia lamblia is a parasite that causes the diarrheal disease giardiasis. Giardia lamblia is found worldwide. It is the most common cause of waterborne disease in the United States.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated"
Input [Text 1]: "<s> How does Giardia lamblia spread?"
Output [Text 1 Beam 0]: "
Giardia lamblia is a parasite that causes the diarrheal disease giardiasis. Giardia lamblia is found worldwide. It is the most common cause of waterborne disease in the United States.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fr$
m contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food.
Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite fro
m contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people i
ngest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated water or food. Giardia lamblia is spread when people ingest the Giardia parasite from contaminated"

@PerkzZheng
Copy link
Collaborator

I could reproduce it with triton-backend + IFB. let me see what is happening wrong with IFB.

@PerkzZheng
Copy link
Collaborator

@wanzhenchn @bprus please give a try with the fix shown below (the fix will be pushed the main branch in next week's update):

modify this line to:

 // FP8 output when fp8_context_fmha is enabled.
    auto const outputElemSize = (mFP8ContextFMHA ? 1 : sizeof(T));
    T* context_buf_
        = reinterpret_cast<T*>(static_cast<char*>(outputs[0]) + outputDesc[0].dims.d[getPackedTensorHiddenDimIndex(mRemovePadding)] * tokenIdxBeg * outputElemSize);

@bprus
Copy link
Contributor Author

bprus commented Jun 11, 2024

I confirm that the fix works for Llama model.
Tomorrow I'll try to test on Mixtral.
Thanks a lot for the quick response!

@bprus
Copy link
Contributor Author

bprus commented Jun 13, 2024

@PerkzZheng
So the funny thing is that everything works for Llama, but for Mixtral there is still something strange going on.

The issue with generation not stopping is solved. But now, the generated responses with multiple simultaneous requests are much shorter than those with only a single request.

For a set of 15 questions, the average number of generated tokens for single requests is 450, and for multiple requests, it's 319.

I looked into the generated answers and I found that this time around, single request setup tends to not stop.

Prompt: "How does Giardia lamblia spread?"

SIngle-request answer:



Giardia lamblia is a parasite that can be spread through contaminated water or food. It can also be spread through direct contact with an infected person, such as through sexual contact or by sharing personal items like towels or toothbrushes.

Once Giardia lamblia has been ingested, it can cause a wide range of symptoms, including diarrhea, abdominal cramps, bloating, and nausea. These symptoms can last for several weeks, and in some cases, they may persist for months or even years.

If

##

G

##

G

##

G

##

G

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

## # \end

## #

##

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


.

.


.


.
 #

.
 #include <iostream>

.

.

.

.

.


.

.

.

.

.


.

.


.


.


.
 #

.

.
 #include <iostream>

.

.


.


.
 #

.
 #include <iostream>

.

.

.


.
 #include <iostream>

.

.


.









 \# #include <iostream>

.





 \#





 \# #include <iostream>

.







 \#




 \#



 \#



 \# #include <iostream>

.
 #include <iostream>

.


.








 \# #include <iostream>

.
 #include <iostream>

.

.


.
 #include <iostream>

.


.








 




 \# #include <iostream>

.
 #include <iostream>

.


.




 \# #include <iostream>

.
 #include <iostream>

.

.





 \# #include <iostream>

.
 #include

Multiple-requests answer:



Giardia lamblia is a parasite that can be spread through contaminated water or food. It can also be spread through person-to-person contact, such as through the fecal-oral route.

Any ideas?

Note: I quantized Mixtral on CPU, and I'm not sure if this can impact results in such a way. I've described it here: #1777

@PerkzZheng
Copy link
Collaborator

@bprus could you give it try with run.py directly (not IFB + triton backend) ? #1738 (comment)

and please share your full commands of engine building.

@bprus
Copy link
Contributor Author

bprus commented Jun 17, 2024

@PerkzZheng sorry to keep you waiting.

First, here are the commands I use:

python3 ../quantization/quantize.py --model_dir /models/downloaded/finetuned/rag --output_dir /models/rt/rag-fp8-deb --dtype bfloat16 --tp_size 1 --qformat fp8 --kv_cache_dtype fp8 --device CPU

trtllm-build --checkpoint_dir /models/rt/rag-fp8-deb --output_dir /models/triton/rag-fp8-deb/tensorrt_llm/1 --workers 1 --remove_input_padding enable --use_paged_context_fmha enable --max_input_len 2048 --max_batch_size 64 --multiple_profiles enable --use_custom_all_reduce disable --use_fp8_context_fmha enable --max_num_tokens 16384

I used run.py directly, and the results are indeed correct:

python ../run.py --engine_dir /models/triton/rag-fp8/tensorrt_llm/1 --tokenizer_dir /models/triton/rag-fp8/tensorrt_llm/1  --max_output_len 1000 --input_text "How does Giardia lamblia spread?" "How does Giardia lamblia spread?"

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
[06/17/2024-13:19:04] [TRT-LLM] [I] Load engine takes: 111.26436877250671 sec
Input [Text 0]: "<s> How does Giardia lamblia spread?</s>"
Output [Text 0 Beam 0]: "

Giardia lamblia is a microscopic parasite that causes the diarrheal illness known as giardiasis.

Giardia lamblia spreads through the fecal-oral route. This means that the parasite is passed from the feces of an infected person to the mouth of another person.

The parasite can also be spread through contaminated water or food. This is because Giardia lamblia can survive outside of the human body for long periods of time.

Therefore, it is important to practice good hygiene and to avoid drinking or eating contaminated water or food."
Input [Text 1]: "<s> How does Giardia lamblia spread?</s>"
Output [Text 1 Beam 0]: "

Giardia lamblia is a microscopic parasite that causes the diarrheal illness known as giardiasis.

Giardia lamblia spreads through the fecal-oral route. This means that the parasite is passed from the feces of an infected person to the mouth of another person.

The parasite can also be spread through contaminated water or food. This is because Giardia lamblia can survive outside of the human body for long periods of time.

Therefore, it is important to practice good hygiene and to avoid drinking or eating contaminated water or food."

While the triton returns:

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "How does Giardia lamblia spread?", "max_tokens": 1000}'

{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_length":1000,"sequence_start":false,"text_output":"\n\nGiardia lamblia is a parasite that can be spread through contaminated water or food. It can also be spread through direct contact with an infected person, such as through sexual contact or by sharing personal items like towels or toothbrushes.\n\nOnce Giardia lamblia has been ingested, it can cause a wide range of symptoms, including diarrhea, abdominal cramps, bloating, and nausea. These symptoms can last for several weeks, and in some cases, they may persist for months or even years.\n\nIf\n\n##\n\nG\n\n##\n\nG\n\n##\n\nG\n\n##\n\nG\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n##\n\n## # \\end\n\n## #\n\n##\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n\n.\n\n.\n\n\n.\n\n\n.\n #\n\n.\n #include <iostream>\n\n.\n\n.\n\n.\n\n.\n\n.\n\n\n.\n\n.\n\n.\n\n.\n\n.\n\n\n.\n\n.\n\n\n.\n\n\n.\n\n\n.\n #\n\n.\n\n.\n #include <iostream>\n\n.\n\n.\n\n\n.\n\n\n.\n #\n\n.\n #include <iostream>\n\n.\n\n.\n\n.\n\n\n.\n #include <iostream>\n\n.\n\n.\n\n\n.\n\n\n\n\n\n\n\n\n\n \\# #include <iostream>\n\n.\n\n\n\n\n\n \\#\n\n\n\n\n\n \\# #include <iostream>\n\n.\n\n\n\n\n\n\n\n \\#\n\n\n\n\n \\#\n\n\n\n \\#\n\n\n\n \\# #include <iostream>\n\n.\n #include <iostream>\n\n.\n\n\n.\n\n\n\n\n\n\n\n\n \\# #include <iostream>\n\n.\n #include <iostream>\n\n.\n\n.\n\n\n.\n #include <iostream>\n\n.\n\n\n.\n\n\n\n\n\n\n\n\n \n\n\n\n\n \\# #include <iostream>\n\n.\n #include <iostream>\n\n.\n\n\n.\n\n\n\n\n \\# #include <iostream>\n\n.\n #include <iostream>\n\n.\n\n.\n\n\n\n\n\n \\# #include <iostream>\n\n.\n #include"}

So the issue is somewhere in Triton or IFB I guess.

Is there anything else I can help you with?

@PerkzZheng
Copy link
Collaborator

@bprus have you enabled chunked_context or kv_cache_reuse ? what if we disabled them all (and probably paged_context_fmha), but I am still confused why the same commands work for llama (or llama is using different commands ?).

@bprus
Copy link
Contributor Author

bprus commented Jun 18, 2024

@PerkzZheng
As a sanity check I run the same commands for Llama model, and here's what I got:

python ../run.py --engine_dir /models/triton/llama-fp8/tensorrt_llm/1 --tokenizer_dir /models/triton/llama-fp8/tensorrt_llm/1  --max_output_len 1000 --input_text "How does Giardia lamblia spread?"

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024061100
[06/17/2024-13:42:57] [TRT-LLM] [I] Load engine takes: 16.550021648406982 sec
Input [Text 0]: "<s> How does Giardia lamblia spread?"
Output [Text 0 Beam 0]: "

Giardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and other gastrointestinal symptoms. It is spread through the feces of infected individuals, and can be found in contaminated food, water, and surfaces.

Here are some ways that Giardia lamblia can spread:

1. Fecal-oral transmission: Giardia lamblia is spread through the feces of infected individuals. When an infected person has diarrhea, the parasites can be found in their stool. If they do not properly wash their hands after using the bathroom, they can transfer the parasites to their mouth or other people's hands, leading to infection.
2. Contaminated food and water: Giardia lamblia can be found in contaminated food and water. For example, if an infected person prepares food or drinks water without proper hand washing, they can transfer the parasites to the food or water. Similarly, if food or water is contaminated with fecal matter, it can also be a source of infection.
3. Surface transmission: Giardia lamblia can survive on surfaces for a short period of time. If an infected person touches a surface and then another person touches the same surface without washing their hands, they can transfer the parasites to their mouth and become infected.
4. Direct contact: Giardia lamblia can also be spread through direct contact with an infected person's feces. For example, if an infected person changes a diaper and does not properly wash their hands, they can transfer the parasites to their own mouth or to someone else's mouth.
5. Infected pets: Giardia lamblia can also be spread through contact with infected pets, such as dogs and cats. If an infected pet has diarrhea and is not properly cleaned up, the parasites can be transferred to other animals or people.

Overall, Giardia lamblia can spread through a variety of routes, including fecal-oral transmission, contaminated food and water, surface transmission, direct contact, and infected pets. It is important to practice good hygiene, such as washing hands frequently, to reduce the risk of infection."
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "How does Giardia lamblia spread?", "max_tokens": 1000}'

{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_length":537,"sequence_start":false,"text_output":"\n\nGiardia lamblia, also known as Giardia intestinalis, is a parasitic infection that can cause diarrhea, abdominal cramps, and weight loss. It is spread through the fecal-oral route, which means that the parasite is passed from one person to another through contaminated food, water, or direct contact with an infected person's feces.\n\nHere are some ways that Giardia lamblia can spread:\n\n1. Contaminated food and water: Giardia lamblia can survive for weeks in contaminated food and water. If you consume contaminated food or water, you can become infected.\n2. Direct contact with an infected person: If you come into direct contact with an infected person's feces, you can become infected. This can happen through diaper changes, sexual contact, or other forms of direct contact.\n3. Contaminated surfaces: Giardia lamblia can survive on surfaces for up to 24 hours. If you touch a contaminated surface and then touch your mouth or eat without washing your hands, you can become infected.\n4. Infected animals: Giardia lamblia can also be spread through contact with infected animals, such as dogs and cats.\n5. Infected food handlers: If a food handler is infected with Giardia lamblia, they can spread the parasite to others through food they prepare.\n6. Infected water sources: Giardia lamblia can be found in contaminated water sources, such as lakes, rivers, and swimming pools.\n7. Infected soil: Giardia lamblia can also be found in contaminated soil, especially in areas with poor sanitation and hygiene.\n\nIt's important to note that Giardia lamblia is not spread through casual contact, such as hugging or shaking hands. However, if an infected person touches their mouth or nose and then touches someone else, they can potentially spread the parasite.\n\nTo prevent the spread of Giardia lamblia, it's important to practice good hygiene, such as washing your hands frequently, especially after using the bathroom or before preparing food. You should also avoid consuming contaminated food and water, and avoid direct contact with infected animals or people."}

So the outputs are different, but none of them seems broken. Is it expected that generated answers are so much different when using Triton + IFB vs run.py?

As to disabling other options, I used defaults for them.
If you want to try something specific, maybe give me a command to use?

@PerkzZheng
Copy link
Collaborator

@bprus no, that is not expected. have you confirmed that they are using the same sampling configuration (greedy search I assume) ? I got consistent outputs when using IFB + triton backend vs run.py (even though I used llama-2-13b-hf).

As to disabling other options, I used defaults for them.

can you share the full config.pbtxt so we know what are the configurations you are using ?

@bprus
Copy link
Contributor Author

bprus commented Jun 21, 2024

@PerkzZheng
So, it turns out that I introduced unnecessary confusion 😞
After you wrote that you got consistent outputs, I went back to re-check my configuration, mostly all config.pbtxt.
I had missed that at some point, default configurations changed, and add_special_tokens=True was added.
I hadn't it set at all in my config.

After changing that, I get consistent outputs between Triton and run.py.

Thanks for all the help, I'm closing the issue now.

@bprus bprus closed this as completed Jun 21, 2024
@ccchow
Copy link

ccchow commented Jul 15, 2024

Hi Team! Seems #1793 fixed this issue. Could you help confirm that this issue exists in 0.10.0 and we have to avoid FP8 quantization with IFB in this version?

@PerkzZheng
Copy link
Collaborator

Hi Team! Seems #1793 fixed this issue. Could you help confirm that this issue exists in 0.10.0 and we have to avoid FP8 quantization with IFB in this version?

yes, the same issue exists in 0.10.0. You can pick up the lastest main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Investigating quantization Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers waiting for feedback
Projects
None yet
Development

No branches or pull requests

5 participants