OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

ElReyZero · 2023-09-04T20:11:23Z

Description

Issue

This pull request addresses a lingering issue identified in PR #7070. In that previous pull request, an attempt was made to address the problem of empty embeddings when using the OpenAIEmbeddings class. While PR #7070 introduced a mechanism to retry requests for embeddings, it didn't fully resolve the issue as empty embeddings still occasionally persisted.

Problem

In certain specific use cases, empty embeddings can be encountered when requesting data from the OpenAI API. In some cases, these empty embeddings can be skipped or removed without affecting the functionality of the application. However, they might not always be resolved through retries, and their presence can adversely affect the functionality of applications relying on the OpenAIEmbeddings class.

Solution

To provide a more robust solution for handling empty embeddings, we propose the introduction of an optional parameter, skip_empty, in the OpenAIEmbeddings class. When set to True, this parameter will enable the behavior of automatically skipping empty embeddings, ensuring that problematic empty embeddings do not disrupt the processing flow. The developer will be able to optionally toggle this behavior if needed without disrupting the application flow.

Changes Made

Added an optional parameter, skip_empty, to the OpenAIEmbeddings class.
When skip_empty is set to True, empty embeddings are automatically skipped without causing errors or disruptions.

Example Usage

from openai.embeddings import OpenAIEmbeddings

# Initialize the OpenAIEmbeddings class with skip_empty=True
embeddings = OpenAIEmbeddings(api_key="your_api_key", skip_empty=True)

# Request embeddings, empty embeddings are automatically skipped. docs is a variable containing the already splitted text.
results = embeddings.embed_documents(docs)

# Process results without interruption from empty embeddings

vercel · 2023-09-04T20:11:27Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Sep 4, 2023 8:11pm

benprofitt · 2023-09-12T04:31:48Z

This change in 0.0.283 breaks backwards compatibility for me. When unpickling objects that used the OpenAIEmbeddings and trying to use them I get errors related to the presence of skip_empty :(

I don't know if this is the right place for this, but thought I should let someone know!

Error:

[ERROR] AttributeError: 'OpenAIEmbeddings' object has no attribute 'skip_empty'
Traceback (most recent call last):
  File "/var/task/docs.py", line 38, in get_related_chunks_and_metadata_faiss
    results = faiss_index.similarity_search_with_relevance_scores(query, k=k)
  File "/var/task/langchain/vectorstores/base.py", line 247, in similarity_search_with_relevance_scores
    docs_and_similarities = self._similarity_search_with_relevance_scores(
  File "/var/task/langchain/vectorstores/faiss.py", line 764, in _similarity_search_with_relevance_scores
    docs_and_scores = self.similarity_search_with_score(
  File "/var/task/langchain/vectorstores/faiss.py", line 275, in similarity_search_with_score
    embedding = self.embedding_function(query)
  File "/var/task/langchain/embeddings/openai.py", line 511, in embed_query
    return self.embed_documents([text])[0]
  File "/var/task/langchain/embeddings/openai.py", line 483, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
  File "/var/task/langchain/embeddings/openai.py", line 367, in _get_len_safe_embeddings
    response = embed_with_retry(
  File "/var/task/langchain/embeddings/openai.py", line 107, in embed_with_retry
    return _embed_with_retry(**kwargs)
  File "/var/task/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/task/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/task/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/var/task/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/task/langchain/embeddings/openai.py", line 105, in _embed_with_retry
    return _check_response(response, skip_empty=embeddings.skip_empty)

ElReyZero · 2023-09-21T23:45:02Z

@benprofitt That is sadly a consequence of using pickled objects, they are completely dependant on the exact versions of the libraries and code that you used at the time you dumped the object.

ElReyZero added 2 commits September 4, 2023 14:47

Fix: Add optional skip_empty parameter to OpenAIEmbeddings

0a8a4c2

Style: Fix linting on skip_empty description

f9557cd

dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:improvement Medium size change to existing code to handle new use-cases labels Sep 4, 2023

hwchase17 approved these changes Sep 4, 2023

View reviewed changes

hwchase17 merged commit 5dbae94 into langchain-ai:master Sep 4, 2023
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

ElReyZero commented Sep 4, 2023

vercel bot commented Sep 4, 2023 •

edited

Loading

benprofitt commented Sep 12, 2023 •

edited

Loading

ElReyZero commented Sep 21, 2023 •

edited

Loading

OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

OpenAIEmbeddings: Add optional an optional parameter to skip empty embeddings #10196

Conversation

ElReyZero commented Sep 4, 2023

Description

Issue

Problem

Solution

Changes Made

Example Usage

vercel bot commented Sep 4, 2023 • edited Loading

benprofitt commented Sep 12, 2023 • edited Loading

ElReyZero commented Sep 21, 2023 • edited Loading

vercel bot commented Sep 4, 2023 •

edited

Loading

benprofitt commented Sep 12, 2023 •

edited

Loading

ElReyZero commented Sep 21, 2023 •

edited

Loading