Skip to content

Commit

Permalink
feat (core): add cosineSimilarity helper function (#1939)
Browse files Browse the repository at this point in the history
  • Loading branch information
lgrammel committed Jun 13, 2024
1 parent f9db8fd commit d25566a
Show file tree
Hide file tree
Showing 10 changed files with 183 additions and 23 deletions.
5 changes: 5 additions & 0 deletions .changeset/curly-taxis-warn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'ai': patch
---

feat (core): add cosineSimilarity helper function
20 changes: 20 additions & 0 deletions content/docs/03-ai-sdk-core/30-embeddings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,23 @@ const { embeddings } = await embedMany({
],
});
```

## Embedding Similarity

After embedding values, you can calculate the similarity between them using the [`cosineSimilarity`](/docs/reference/ai-sdk-core/cosine-similarity) function.
This is useful to e.g. find similar words or phrases in a dataset.
You can also rank and filter related items based on their similarity.

```ts highlight={"2,10"}
import { openai } from '@ai-sdk/openai';
import { cosineSimilarity, embedMany } from 'ai';

const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: ['sunny day at the beach', 'rainy afternoon in the city'],
});

console.log(
`cosine similarity: ${cosineSimilarity(embeddings[0], embeddings[1])}`,
);
```
53 changes: 53 additions & 0 deletions content/docs/07-reference/ai-sdk-core/50-cosine-similarity.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: cosineSimilarity
description: Calculate the cosine similarity between two vectors (API Reference)
---

# `cosineSimilarity()`

When you want to compare the similarity of embeddings, standard vector similarity metrics
like cosine similarity are often used.

`cosineSimilarity` calculates the cosine similarity between two vectors.
A high value (close to 1) indicates that the vectors are very similar, while a low value (close to -1) indicates that they are different.

```ts
import { openai } from '@ai-sdk/openai';
import { cosineSimilarity, embedMany } from 'ai';

const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: ['sunny day at the beach', 'rainy afternoon in the city'],
});

console.log(
`cosine similarity: ${cosineSimilarity(embeddings[0], embeddings[1])}`,
);
```

## Import

<Snippet text={`import { cosineSimilarity } from "ai"`} prompt={false} />

## API Signature

### Parameters

<PropertiesTable
content={[
{
name: 'vector1',
type: 'number[]',
description: `The first vector to compare`,
},
{
name: 'vector2',
type: 'number[]',
description: `The second vector to compare`,
},
]}
/>

### Returns

A number between -1 and 1 representing the cosine similarity between the two vectors.
6 changes: 6 additions & 0 deletions content/docs/07-reference/ai-sdk-core/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,11 @@ It also contains the following helper functions:
'Creates a registry for using models from multiple providers.',
href: '/docs/reference/ai-sdk-core/model-registry',
},
{
title: 'cosineSimilarity()',
description:
'Calculates the cosine similarity between two vectors, e.g. embeddings.',
href: '/docs/reference/ai-sdk-core/cosine-similarity',
},
]}
/>
20 changes: 0 additions & 20 deletions examples/ai-core/src/complex/semantic-router/cosine-similarity.ts

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
import { Embedding, EmbeddingModel, embed, embedMany } from 'ai';
import { cosineSimilarity } from './cosine-similarity';
import {
Embedding,
EmbeddingModel,
embed,
embedMany,
cosineSimilarity,
} from 'ai';

export interface Route<NAME extends string> {
name: NAME;
Expand Down
18 changes: 18 additions & 0 deletions examples/ai-core/src/embed-many/openai-cosine-similarity.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { openai } from '@ai-sdk/openai';
import { cosineSimilarity, embedMany } from 'ai';
import dotenv from 'dotenv';

dotenv.config();

async function main() {
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: ['sunny day at the beach', 'rainy afternoon in the city'],
});

console.log(
`cosine similarity: ${cosineSimilarity(embeddings[0], embeddings[1])}`,
);
}

main().catch(console.error);
3 changes: 2 additions & 1 deletion packages/core/core/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ export * from './prompt';
export * from './registry';
export * from './tool';
export * from './types';
export * from './util/deep-partial';
export type { DeepPartial } from './util/deep-partial';
export { cosineSimilarity } from './util/cosine-similarity';
28 changes: 28 additions & 0 deletions packages/core/core/util/cosine-similarity.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { cosineSimilarity } from './cosine-similarity';

it('should calculate cosine similarity correctly', () => {
const vector1 = [1, 2, 3];
const vector2 = [4, 5, 6];

const result = cosineSimilarity(vector1, vector2);

// test against pre-calculated value:
expect(result).toBeCloseTo(0.9746318461970762, 5);
});

it('should calculate negative cosine similarity correctly', () => {
const vector1 = [1, 0];
const vector2 = [-1, 0];

const result = cosineSimilarity(vector1, vector2);

// test against pre-calculated value:
expect(result).toBeCloseTo(-1, 5);
});

it('should throw an error when vectors have different lengths', () => {
const vector1 = [1, 2, 3];
const vector2 = [4, 5];

expect(() => cosineSimilarity(vector1, vector2)).toThrowError();
});
44 changes: 44 additions & 0 deletions packages/core/core/util/cosine-similarity.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/**
* Calculates the cosine similarity between two vectors. This is a useful metric for
* comparing the similarity of two vectors such as embeddings.
*
* @param vector1 - The first vector.
* @param vector2 - The second vector.
*
* @returns The cosine similarity between vector1 and vector2.
* @throws {Error} If the vectors do not have the same length.
*/
export function cosineSimilarity(vector1: number[], vector2: number[]) {
if (vector1.length !== vector2.length) {
throw new Error(
`Vectors must have the same length (vector1: ${vector1.length} elements, vector2: ${vector2.length} elements)`,
);
}

return (
dotProduct(vector1, vector2) / (magnitude(vector1) * magnitude(vector2))
);
}

/**
* Calculates the dot product of two vectors.
* @param vector1 - The first vector.
* @param vector2 - The second vector.
* @returns The dot product of vector1 and vector2.
*/
function dotProduct(vector1: number[], vector2: number[]) {
return vector1.reduce(
(accumulator: number, value: number, index: number) =>
accumulator + value * vector2[index]!,
0,
);
}

/**
* Calculates the magnitude of a vector.
* @param vector - The vector.
* @returns The magnitude of the vector.
*/
function magnitude(vector: number[]) {
return Math.sqrt(dotProduct(vector, vector));
}

0 comments on commit d25566a

Please sign in to comment.