Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure proper behavior if quota limits are hit #1

Open
mark-meyer opened this issue Jun 28, 2021 · 0 comments
Open

Ensure proper behavior if quota limits are hit #1

mark-meyer opened this issue Jun 28, 2021 · 0 comments

Comments

@mark-meyer
Copy link
Contributor

Textract has several quotas.

Of particular concern are:

  • 10 requests/second quota on GetDocumentTextDetection
    This is the function that is used to get paginated results from a scanned document. For long PDFs this may be called many times for a single document. The processing of OCR results happens asynchronously based on when Textract has finished processing the PDF, which means we can't control exactly when this function will be called.

  • Maximum number of asynchronous jobs per account that can simultaneously exist: 600
    We have about 780000 documents to process. Which means we will need to limit the rate at which we start async jobs.

A possible solution:

  1. Limit the number of concurrent lambdas processing documents so we don't exceed the 600 total calls at any time.
  2. Set a high number of retries on the SQS queue so failure simply get rescheduled.
  3. Use a dead letter queue to catch anything that fails after max tries so we can resend.

This may require some testing to get right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant