Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Fix hugging face embedding #6673

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

N-E-W-T-O-N
Copy link
Contributor

@N-E-W-T-O-N N-E-W-T-O-N commented Jun 11, 2024

Motivation and Context

Fix the Bug #6635 HuggingFace Embedding: Unable to Deserialization for certain models

Description

As mentioned in the issue, the HuggingFace Embedding API interface returns responses typically in the form of List<ReadOnlyMemory<float>> and occasionally as List<List<List<ReadOnlyMemory<float>>>>. Currently, only the latter format is handled correctly, leading to deserialization issues.

To address this, I propose the following solution:

try {
    // Attempt to parse data as List<ReadOnlyMemory<float>> and return the parsed data
}
catch (KernelException ex1) {
    try {
        // If the first attempt fails, attempt to parse data as List<List<List<ReadOnlyMemory<float>>>>` and return the parsed data
    }
    catch (KernelException ex2) {
        // If both attempts fail, handle the exception (e.g., the model doesn't exist ,the model has still been loading, or an HTTP exception occurred) and rethrow the error
    }
}

Contribution Checklist

Added class `TextEmbeddingResponseType1` & `TextEmbeddingResponseType2` to parse two distinct possible JsonRequest
…Async`

- Added try-catch block to better handle parsing in `GenerateEmbeddingsAsync` method.
@N-E-W-T-O-N N-E-W-T-O-N requested a review from a team as a code owner June 11, 2024 18:41
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Jun 11, 2024
@github-actions github-actions bot changed the title Fix hugging face embedding .Net: Fix hugging face embedding Jun 11, 2024
@N-E-W-T-O-N
Copy link
Contributor Author

Hello @dmytrostruk
Can you please suggest to me where I can add the above custom HttpClient Example in main code .

@dmytrostruk
Copy link
Member

Hello @dmytrostruk Can you please suggest to me where I can add the above custom HttpClient Example in main code .

@N-E-W-T-O-N That's awesome, thanks for validating this idea and creating an example for this! I think you can put it here:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Memory/HuggingFace_EmbeddingGeneration.cs

@N-E-W-T-O-N
Copy link
Contributor Author

I want to add two more parameters in the api's payload body namely 'use_cache' & 'wait_for_model'

I want to send this values as addition parameters in the HuggingFaceClient

https://huggingface.co/docs/api-inference/en/detailed_parameters

@N-E-W-T-O-N
Copy link
Contributor Author

@dmytrostruk
Added my example in 4f9eb97 .
Please merge my PR..
Let me know for anymore Improvement.

Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-E-W-T-O-N Thanks for your contribution! There are some warnings about possible null references and formatting issues. In order to proceed with merging, we need to make sure that all warnings are resolved. Thanks again!

@N-E-W-T-O-N
Copy link
Contributor Author

@RogerBarreto It might be possible that changes in TextEmbeddingResponse class will lead to test case failure

@N-E-W-T-O-N
Copy link
Contributor Author

N-E-W-T-O-N commented Jun 22, 2024

@dmytrostruk

Remove the whitespace error.
Please re run the workflow

@markwallace-microsoft
Copy link
Member

@N-E-W-T-O-N there are still formatting errors to be addressed. Suggest you run dotnet format on the solution.

@N-E-W-T-O-N
Copy link
Contributor Author

@dmytrostruk please check it now.
Resolve check-format issues
.Sorry for inconvenience

Copy link
Member

@dmytrostruk dmytrostruk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-E-W-T-O-N PR looks good to me, but there are several unit tests that started to fail because of new change in deserialization logic.

@N-E-W-T-O-N
Copy link
Contributor Author

N-E-W-T-O-N commented Jun 24, 2024

@dmytrostruk may I ask how to test unit test cases locally

@dmytrostruk
Copy link
Member

may I ask how to test unit test cases locally

@N-E-W-T-O-N

cd dotnet/src/Connectors/Connectors.HuggingFace.UnitTests
dotnet test

@N-E-W-T-O-N
Copy link
Contributor Author

@dmytrostruk The issue is unit tests are still based on the previous version of embedding dimensions as noted by@RogerBarreto .Would you like me to update the unit test accordingly?

@dmytrostruk
Copy link
Member

The issue is unit tests are still based on the previous version of embedding dimensions as noted by@RogerBarreto .Would you like me to update the unit test accordingly?

@N-E-W-T-O-N Yes, if we are changing the behavior, then unit tests should be updated as well. Thanks!

@N-E-W-T-O-N
Copy link
Contributor Author

@dmytrostruk moved the class inside with access modifier 'private sealed'

Resolve the unit test issue

@N-E-W-T-O-N
Copy link
Contributor Author

@dmytrostruk anything to add from my side

@dmytrostruk
Copy link
Member

@dmytrostruk anything to add from my side

@N-E-W-T-O-N I think we are ready to merge this to main. I added a couple of small fixes to the example. Thanks for your contribution!

Comment on lines +40 to +45
if (!await sqliteMemory.DoesCollectionExistAsync("Sqlite"))
{
await sqliteMemory.CreateCollectionAsync("Sqlite");
}

await skMemory.SaveInformationAsync("Test", "THIS IS A SAMPLE", "sample", "TEXT");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SaveInformationAsync already creates a collection if needed using the passed in collection name, in this case "Test". Why also check if a "Sqlite" collection exists on lines 40-43?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Semantic test memory will create the collection if no collection by that name exist

if (!(await this._storage.DoesCollectionExistAsync(collection, cancellationToken).ConfigureAwait(false)))
        {
            await this._storage.CreateCollectionAsync(collection, cancellationToken).ConfigureAwait(false);
        }

I added this code just for testing purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove/comment out this part or leave this part as it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove lines 40-43, since it's confusing to have it in the sample. The sample is about demonstrating how to handle the non-standard response from the "cointegrated/LaBSE-en-ru" model, and we don't need to create the collection with name Sqlite to demonstrate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code
Projects
Status: Community PRs
Development

Successfully merging this pull request may close these issues.

None yet

5 participants