Skip to content

Commit

Permalink
Updates to Index and FAQs pages (#2524)
Browse files Browse the repository at this point in the history
Adding LMI page to Serve landing page.
Adding CPU performance to FAQs
Minor update to CPU performance section in performance doc

Co-authored-by: Geeta Chauhan <[email protected]>
Co-authored-by: Ankith Gunapal <[email protected]>
  • Loading branch information
3 people committed Aug 24, 2023
1 parent 2a386ec commit bb4eb8b
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 6 deletions.
23 changes: 19 additions & 4 deletions docs/FAQs.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# FAQ'S
Contents of this document.
* [General](#general)
* [Performance](#performance)
* [Deployment and config](#deployment-and-config)
* [API](#api)
* [Handler](#handler)
Expand Down Expand Up @@ -34,9 +35,23 @@ No, As of now only python based models are supported.
Torchserve is derived from Multi-Model-Server. However, Torchserve is specifically tuned for Pytorch models. It also has new features like Snapshot and model versioning.

### How to decode international language in inference response on client side?
By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.
By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.

If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py#L55
If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py

## Performance

Relevant documents.
- [Performance Guide](performance_guide.md)

### How do I improve TorchServe performance on CPU?
CPU performance is heavily influenced by launcher core pinning. We recommend setting the following properties in your `config.properties`:

```bash
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
```
More background on improving CPU performance can be found in this [blog post](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles).

## Deployment and config
Relevant documents.
Expand Down Expand Up @@ -97,7 +112,7 @@ TorchServe looks for the config.property file according to the order listed in t

- [models](configuration.md): Defines a list of models' configuration in config.property. A model's configuration can be overridden by [management API](management_api.md). It does not decide which models will be loaded during TorchServe start. There is no relationship b.w "models" and "load_models" (ie. TorchServe command line option [--models](configuration.md)).

###
###

## API
Relevant documents
Expand Down Expand Up @@ -133,7 +148,7 @@ Refer to [default handlers](default_handlers.md) for more details.

### Is it possible to deploy Hugging Face models?
Yes, you can deploy Hugging Face models using a custom handler.
Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example.
Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example.

## Model-archiver
Relevant documents
Expand Down
7 changes: 7 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ What's going on in TorchServe?
:link: performance_guide.html
:tags: Performance,Troubleshooting

.. customcarditem::
:header: Large Model Inference
:card_description: Serving Large Models with TorchServe
:image: https://raw.githubusercontent.com/pytorch/serve/master/docs/images/ts-lmi-internal.png
:link: large_model_inference.html
:tags: Large-Models,Performance

.. customcarditem::
:header: Troubleshooting
:card_description: Various updates on Torcherve and use cases.
Expand Down
10 changes: 8 additions & 2 deletions docs/performance_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,17 @@ TorchServe exposes configurations that allow the user to configure the number of

<h4>TorchServe On CPU </h4>

If working with TorchServe on a CPU here are some things to consider that could improve performance:
If working with TorchServe on a CPU you can improve performance by setting the following in your `config.properties`:

```bash
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
```
These settings improve performance significantly through launcher core pinning.
The theory behind this improvement is discussed in [this blog](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which can be quickly summarized as:
* In a hyperthreading enabled system, avoid logical cores by setting thread affinity to physical cores only via core pinning.
* In a multi-socket system with NUMA, avoid cross-socket remote memory access by setting thread affinity to a specific socket via core pinning.

These principles can be automatically configured via an easy to use launch script which has already been integrated into TorchServe. For more information take a look at this [case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which dives into these points further with examples and explanations from first principles.

<h4>TorchServe on GPU</h4>

Expand Down

0 comments on commit bb4eb8b

Please sign in to comment.