Finding protobuf files while benchmarking TensorRT-LLM #2101

KuntaiDu · 2024-08-08T22:45:09Z

System Info

I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo) to serve the LLM, which contains several config.pbtxt files that specifies the batch size, max token length, etc and will be used for triton inference server. However, this repo is based on version r24.04 and I am not sure how to find the corresponding config.pbtxt files in version r24.07. Is there any references for me to locate these config.pbtxt files so that I can compare with TensorRT-LLM version r24.07?

Who can help?

@juney-nvidia @byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Not a code bug issue

Expected behavior

Not a code bug issue

actual behavior

Not a code bug issue

additional notes

Not a code bug issue

The text was updated successfully, but these errors were encountered:

kaiyux · 2024-08-09T01:23:02Z

Hi @KuntaiDu , please see the official location of the config.pbtxt file for v0.11 at here: https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.11.0/all_models/inflight_batcher_llm.

Before you launch the tritonserver, you'll need to set several parameters, please follow the documents of the TensorRT-LLM backend repo and feel free to let us know if there are any questions. Thanks.

gracehonv · 2024-08-13T17:57:43Z

Hello @KuntaiDu, for the ci benchmark, would it be possible to provide the ci script that you're using? I could find the regression page with results https://buildkite.com/vllm/performance-benchmark/builds/4068#_ but couldn't find the ci script. I can help fix that script.
But basically, to get a good run with good settings in the ci script, could you:

build the engine like so:
trtllm-build --model_config <> --use_fused_mlp --gpt_attention_plugin bfloat16 --output_dir OUTPUT --max_batch_size 2048 --max_input_len 4096 --max_seq_len 6144 --reduce_fusion disable --workers 8 --max_num_tokens 16384
set the triton runtime pbtxt files like so, with the base config.pbtx based on the v0.11 files that @kaiyux had mentioned

python3 /workspace/trt-llm_backend/tools/fill_template.py -i tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,engine_dir:OUTPUT,decoupled_mode:true,batching_strategy:inflight_fused_batching,batch_scheduler_policy:guaranteed_no_evict,exclude_input_in_output:true,triton_max_batch_size:2048,max_queue_delay_microseconds:0,max_beam_width:1,max_queue_size:2048,enable_kv_cache_reuse:false
python3 /workspace/trt-llm_backend/tools/fill_template.py -i preprocessing/config.pbtxt triton_max_batch_size:2048,tokenizer_dir:model_name> ,preprocessing_instance_count:5
python3 /workspace/trt-llm_backend/tools/fill_template.py -i postprocessing/config.pbtxt triton_max_batch_size:2048,tokenizer_dir:<model_name> ,postprocessing_instance_count:5,skip_special_tokens:false
python3 /workspace/trt-llm_backend/tools/fill_template.py -i ensemble/config.pbtxt triton_max_batch_size:2048
python3 /workspace/trt-llm_backend/tools/fill_template.py -i tensorrt_llm_bls/config.pbtxt triton_max_batch_size:2048,decoupled_mode:true,accumulate_tokens:"False",bls_instance_count:1

gracehonv · 2024-09-04T03:21:04Z

Incorporated into vLLM PR. Please close as "done'

KuntaiDu added the bug Something isn't working label Aug 8, 2024

lfr-0531 assigned gracehonv Sep 4, 2024

lfr-0531 added triaged Issue has been triaged by maintainers benchmark not a bug Some known limitation, but not a bug. and removed bug Something isn't working labels Sep 4, 2024

lfr-0531 assigned kaiyux Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finding protobuf files while benchmarking TensorRT-LLM #2101

Finding protobuf files while benchmarking TensorRT-LLM #2101

KuntaiDu commented Aug 8, 2024 •

edited

Loading

kaiyux commented Aug 9, 2024

gracehonv commented Aug 13, 2024

gracehonv commented Sep 4, 2024 •

edited

Loading

Finding protobuf files while benchmarking TensorRT-LLM #2101

Finding protobuf files while benchmarking TensorRT-LLM #2101

Comments

KuntaiDu commented Aug 8, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

kaiyux commented Aug 9, 2024

gracehonv commented Aug 13, 2024

gracehonv commented Sep 4, 2024 • edited Loading

KuntaiDu commented Aug 8, 2024 •

edited

Loading

gracehonv commented Sep 4, 2024 •

edited

Loading