Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: Cap AzureOpenAIEmbeddings chunk_size at 2048 instead of 16. #25852

Merged

Conversation

kyle-winkelman
Copy link
Contributor

Description: Within AzureOpenAIEmbeddings there is a validation to cap chunk_size at 16. The value of 16 is either an old limitation or was erroneously chosen. I have checked all of the preview and stable releases to ensure that the embeddings endpoint can handle 2048 entries Azure/azure-rest-api-specs. I have also found many locations that confirm this limit should be 2048:

Issue: fixes #25462

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Aug 29, 2024
Copy link

vercel bot commented Aug 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 29, 2024 4:39pm

@dosubot dosubot bot added community Related to langchain-community Ɑ: embeddings Related to text embedding models module labels Aug 29, 2024
@kyle-winkelman kyle-winkelman force-pushed the AzureOpenAIEmbeddings-chunk_size branch from 166ebe7 to 94479f4 Compare August 29, 2024 16:34
@kyle-winkelman kyle-winkelman force-pushed the AzureOpenAIEmbeddings-chunk_size branch from 94479f4 to c7eba14 Compare August 29, 2024 16:39
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Aug 29, 2024
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kyle-winkelman. Both of these classes are deprecated in favor of implementations in the langchain-openai package (you should be seeing deprecation warnings directing you to those packages). The default chunk_size in AzureOpenAIEmbeddings is currently 2048.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Aug 29, 2024
@ccurme ccurme enabled auto-merge (squash) August 29, 2024 16:46
@ccurme ccurme merged commit 201bdf7 into langchain-ai:master Aug 29, 2024
27 checks passed
@kyle-winkelman
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: embeddings Related to text embedding models module lgtm PR looks good. Use to confirm that a PR is ready for merging. size:S This PR changes 10-29 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

AzureOpenAIEmbeddings appears to be unnecessarily capped at chunk_size of 16 (should be 2048).
2 participants