Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run serverless endpoint batch test and record cost and time results #99

Closed
rbavery opened this issue Dec 9, 2022 · 3 comments
Closed

Comments

@rbavery
Copy link
Contributor

rbavery commented Dec 9, 2022

User story

We need to understand the cost of running the current architecture inference on large archives (25 Gb) of imagery. In terms of both time (does it take a week with retries? 2 days?) and in terms of costs for the serverless MDV5 endpoint that auto-scales with requests. For this first run, we won't include the Mira endpoints in this test.

we will run this test on duplicated images that matches ratio of animals/no animals. ~ 60% are empty. All are jpegs.

secondarily, we'd like to understand:

  • the types of exceptions that are thrown
  • the bottlenecks of the pipeline (is it the free tier DB? should we test with atlas again? potentially.)
  • long term cost implications of current serverless architecture in terms of monthly cost and/or per 100,000 images
  • back of napkin calc for batch inference in monthly and 1000,000 image terms. plus dev cost estimate. if a lower cost option is determined to be needed by Natty and team

Things we need to run the test:

  • Instructions on where to put archive of imagery to kick off test, the ingestion bucket. Not designed to handle zip files
    • fargate batch could extract zip files when put in ingestion bucket
    • or aws cli? doesn't match end user way of uploading but quicker to use so we will do this for test
  • example images to duplicate
  • how to clean database ourselves

Resolution Criteria

For 25 Gb of random imagery , where sample images will be close to 1280x1280 (Natty will pick a representative range), how long does autoscale inference take for mdv5?

  • [ ]

What was the cost per image? Did this vary throughout the job due to retries?

  • look to see how many requests were made
  • try to see if we can get cost per request
    • @ingalls will post billing viewing strategy
  • [ ]

Were there any failures not resolved by retries?

  • [ ]
@rbavery
Copy link
Contributor Author

rbavery commented Dec 9, 2022

additional meeting notes:

Testing the impact of automation rules and addition of mira models during test would show multiple trips and writes to db impact cost

green light from mongodb to get dedicated instance for mongodb atlas. should improve performance of DB.

preference is to reduce cost over inference time. kinesis firehose or dynamo db would be more time performant DBs

we agreed to run without atlas next week, and assess if we need to run the test again with atlas later

@rbavery
Copy link
Contributor Author

rbavery commented Mar 27, 2023

With letterboxing and the fully reproduced yolov5, we get average inference times of 9 seconds per image on sagemaker serverless, which only supports CPU.

Initialization time (model loading): 8.54904127120971
Preprocess time (letterbox): 0.032587528228759766
Inference time (model running on image of a given size): 8.05414867401123
Postprocessing time (NMS): 0.02366042137145996

Back when we ran the above test last year, we were testing on fixed resizing to 640x640 with a torchscript model compiled for the CPU, inference time was closer to 2.5 seconds per image: https://docs.google.com/spreadsheets/d/17t-zgKwWdVSArf7mgu4QJXOvtGVIlcUYTnwYEpNZQsU/edit#gid=0

We'll be exploring how to reduce inference time while preserving reproduced accuracy: #106

@nathanielrindlaub
Copy link
Member

@rbavery deployed the ONNX MDv5 (PR here) to a Sagemaker Serverless endpoint and it looks like per-image inference is around 3.5-4 seconds.

The entire processing time for a test batch of 10,168 images was 11hrs, 8 mins (3.9 seconds per image).

So 1000 images takes roughly an hour to process, 100k would take 4.5 days.

Not bad for now! We'll explore speeding this up perhaps down the road by taking advantage of concurrent processing (having two separate Serverless endpoints for Megadetector - one for real-time inference needs and one for batch, and ditching the FIFO queues for standard SQS queues).

There are also endpoint and model level optimizations we could explore as well (#112).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants