Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeding up yolov5 megadetector inference #105

Merged
merged 10 commits into from
Apr 7, 2023

Conversation

rbavery
Copy link
Contributor

@rbavery rbavery commented Mar 25, 2023

Inference for the fully reproduced megadetector v5a model is currently about 9 seconds per image. This PR speeds this up by:

  • compiling to ONNX, independent of image size changes
  • reducing image size while preserving performance as much as possible
  • we did not do any other optimizations (NeuralMagic, or direct custom ONNX sparsify)

See the README.md for instructions on getting started with downloading model weights, packaging the model, running the torchserve container, and sending image post requests. This adds two notebooks that can be used to

  1. compare models on folders of images or
  2. run single image inference and debug each step locally and compare with the torchserve container results.

@rbavery
Copy link
Contributor Author

rbavery commented Mar 25, 2023

Inference is currently about 8 seconds per image. This branch is for investigating how to speed this up by:

  • compiling to torchscript, independent of image size changes
  • reducing image size while preserving performance as much as possible, potentially with multiple compiled torchscript models for different image sizes
  • other optimizations (NeuralMagic, ONNX, TensorRT)

See the README.md for instructions on getting started with downloading model weights, packaging the model, running the torchserve container, and sending image post requests.

Goal: Average inference time per image at 2-3 seconds. We were able to achieve this by resizing all images to 640x640 px and using a torchscript model compiled for this size, but this degraded performance.

@rbavery
Copy link
Contributor Author

rbavery commented Mar 31, 2023

After compiling to ONNX we get inference speeds of 1.7 seconds vs ~5 seconds for no compilation! This is on my local desktop. We'll test this on an endpoint early next week.

Copy link
Member

@nathanielrindlaub nathanielrindlaub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - all looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants