Skip to content

Commit

Permalink
feat: new foundations section (#1948)
Browse files Browse the repository at this point in the history
  • Loading branch information
iteratetograceness committed Jun 14, 2024
1 parent a5c2845 commit 1af45ce
Show file tree
Hide file tree
Showing 9 changed files with 242 additions and 102 deletions.
44 changes: 44 additions & 0 deletions content/docs/01.5-fundamentals/01-overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
---
title: Overview
description: An overview of foundational concepts critical to understanding the Vercel AI SDK
---

# Overview

<Note>
This page is a beginner-friendly introduction to high-level artificial
intelligence (AI) concepts. To dive right into implementing the Vercel AI SDK,
feel free to skip ahead to our [quickstarts](/docs/getting-started) or learn
about our [supported models and
providers](/docs/foundations/providers-and-models).
</Note>

The Vercel AI SDK standardizes integrating artificial intelligence (AI) models across [supported providers](/docs/foundations/providers-and-models). This enables developers to focus on building great AI applications, not waste time on technical details.

For example, here’s how you can generate text with various models using the Vercel AI SDK:

<PreviewSwitchProviders />

To effectively leverage the AI SDK, it helps to familiarize yourself with the following concepts:

## Generative Artificial Intelligence

**Generative artificial intelligence** refers to models that predict and generate various types of outputs (such as text, images, or audio) based on what’s statistically likely, pulling from patterns they’ve learned from their training data. For example:

- Given a photo, a generative model can generate a caption.
- Given an audio file, a generative model can generate a transcription.
- Given a text description, a generative model can generate an image.

## Large Language Models

A **large language model (LLM)** is a subset of generative models focused primarily on **text**. An LLM takes a sequence of words as input and aims to predict the most likely sequence to follow. It assigns probabilities to potential next sequences and then selects one. The model continues to generate sequences until it meets a specified stopping criterion.

LLMs learn by training on massive collections of written text, which means they will be better suited to some use cases than others. For example, a model trained on GitHub data would understand the probabilities of sequences in source code particularly well.

However, it's crucial to understand LLMs' limitations. When asked about less known or absent information, like the birthday of a personal relative, LLMs might "hallucinate" or make up information. It's essential to consider how well-represented the information you need is in the model.

## Embedding Models

An **embedding model** is used to convert complex data (like words or images) into a dense vector (a list of numbers) representation, known as an embedding. Unlike generative models, embedding models do not generate new text or data. Instead, they provide representations of semantic and synactic relationships between entities that can be used as input for other models or other natural language processing tasks.

In the next section, you will learn about the difference between models providers and models, and which ones are available in the Vercel AI SDK.
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,21 @@ description: Learn about the Prompt structure used in the Vercel AI SDK.

# Prompts

Prompts are instructions that you give a large language model (LLM) to tell it what to do.
Prompts are instructions that you give a [large language model (LLM)](/docs/foundations/overview#large-language-models) to tell it what to do.
It's like when you ask someone for directions; the clearer your question, the better the directions you'll get.

Many LLM providers offer complex interfaces for specifying prompts. They involve different roles and message types.
While these interfaces are powerful, they can be hard to use and understand.

The Vercel AI SDK simplifies prompting across compatible providers to text prompts or more elaborate message prompts.
Both prompt types support system messages, and message prompts can be multi-modal.
In order to simplify prompting across compatible providers, the Vercel AI SDK offers two categories of prompts: text prompts and message prompts.

## Text Prompts

Text prompts are strings.
They are ideal for simple generation use cases,
e.g. repeatedly generating content for variants of the same prompt text.

You can set text prompts using the `prompt` property.
You can set text prompts using the `prompt` property made available by AI SDK functions like [`generateText`](/docs/reference/ai-sdk-core/generate-text) or [`streamUI`](/docs/reference/ai-sdk-rsc/stream-ui).
You can structure the text in any way and inject variables, e.g. using a template literal.

```ts highlight="3"
Expand Down Expand Up @@ -49,7 +48,7 @@ They are great for chat interfaces and more complex, multi-modal prompts.
Each message has a `role` and a `content` property. The content can either be text (for user and assistant messages), or an array of relevant parts (data) for that message type.

```ts highlight="3-7"
const result = await generateText({
const result = await streamUI({
model: yourModel,
messages: [
{ role: 'user', content: 'Hi!' },
Expand All @@ -59,7 +58,7 @@ const result = await generateText({
});
```

<Note>
<Note type="warning">
Not all language models support all message and content types. For example,
some models might not be capable of handling multi-modal inputs or tool
messages. [Learn more about the capabilities of select
Expand All @@ -78,6 +77,20 @@ Currently image and text parts are supported.

For models that support multi-modal inputs, user messages can include images. An `image` can be one of the following:

- base64-encoded image:
- `string` with base-64 encoded content
- data URL `string`, e.g. `data:image/png;base64,...`
- binary image:
- `ArrayBuffer`
- `Uint8Array`
- `Buffer`
- URL:
- http(s) URL `string`, e.g. `https://example.com/image.png`
- `URL` object, e.g. `new URL('https://example.com/image.png')`

It is possible to mix text and multiple images.
For models that support multi-modal inputs, user messages can include images. An `image` can be one of the following:

- base64-encoded image:
- `string` with base-64 encoded content
- data URL `string`, e.g. `data:image/png;base64,...`
Expand Down Expand Up @@ -160,15 +173,16 @@ const result = await generateText({
### Tool messages

<Note>
Tools (also known as function calling) are programs that you can provide an
LLM to extend it's built-in functionality. This can be anything from calling
an external API to calling functions within your UI.
[Tools](/docs/foundations/tools) (also known as function calling) are programs
that you can provide an LLM to extend it's built-in functionality. This can be
anything from calling an external API to calling functions within your UI.
Learn more about Tools in [the next section](/docs/foundations/tools).
</Note>

For models that support [tool](/docs/ai-sdk-core/tools-and-tool-calling) calls, assistant messages can contain tool call parts, and tool messages can contain tool result parts.
For models that support [tool](/docs/foundations/tools-and-tool-calling) calls, assistant messages can contain tool call parts, and tool messages can contain tool result parts.
A single assistant message can call multiple tools, and a single tool message can contain multiple tool results.

```ts highlight="3-43"
```ts highlight="14-42"
const result = await generateText({
model: yourModel,
messages: [
Expand Down Expand Up @@ -217,7 +231,7 @@ const result = await generateText({

## System Messages

System messages are like character instructions that you would give an actor.
System messages are the initial set of instructions given to models that help guide and constrain the models' behaviors and responses.
You can set system prompts using the `system` property.
System messages work with both the `prompt` and the `messages` properties.

Expand Down
103 changes: 103 additions & 0 deletions content/docs/01.5-fundamentals/04-tools.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: Tools
description: Learn about tools with the Vercel AI SDK.
---

# Tools

While [large language models (LLMs)](/docs/foundations/overview#large-language-models) have incredible generation capabilities,
they struggle with discrete tasks (e.g. mathematics) and interacting with the outside world (e.g. getting the weather).

Tools can be thought of as programs you give to a model which can be run as and when the model deems applicable.

## What is a tool?

A tool is an object that can be called by the model to perform a specific task.
You can use tools with functions across the AI SDK (like [`generateText`](/docs/reference/ai-sdk-core/generate-text) or [`streamUI`](/docs/reference/ai-sdk-rsc/stream-ui)) by passing a tool(s) to the `tools` parameter.

There are three elements of a tool, a description, parameters, and an optional `execute` or `generate` function (dependent on the SDK function).

- **`description`**: An optional description of the tool that can influence when the tool is picked.
- **`parameters`**: A [Zod schema](/docs/foundations/tools#schema-specification-and-validation-with-zod) that defines the parameters. It is converted to a JSON schema that is consumed by the LLM, and also used to validate the LLM tool calls.
- **`execute`** or **`generate`**: An optional async or generator function that is called with the arguments from the tool call.

## Tool Calls

If the LLM decides to use a tool, it will generate a tool call.
Tools with an `execute` or `generate` function are run automatically when these calls are generated.
The results of the tool calls are returned using tool result objects.
Each tool result object has a `toolCallId`, a `toolName`, a typed `args` object, and a typed `result`.

## Tool Choice

You can use the `toolChoice` setting to influence when a tool is selected.
It supports the following settings:

- `auto` (default): the model can choose whether and which tools to call.
- `required`: the model must call a tool. It can choose which tool to call.
- `none`: the model must not call tools
- `{ type: 'tool', toolName: string (typed) }`: the model must call the specified tool

```ts highlight="18"
import { z } from 'zod';
import { generateText, tool } from 'ai';

const result = await generateText({
model: yourModel,
tools: {
weather: tool({
description: 'Get the weather in a location',
parameters: z.object({
location: z.string().describe('The location to get the weather for'),
}),
execute: async ({ location }) => ({
location,
temperature: 72 + Math.floor(Math.random() * 21) - 10,
}),
}),
},
toolChoice: 'required', // force the model to call a tool
prompt:
'What is the weather in San Francisco and what attractions should I visit?',
});
```

## Schema Specification and Validation with Zod

Tool usage and structured object generation require the specification of schemas.
The AI SDK uses [Zod](https://zod.dev/), the most popular JavaScript schema validation library, for schema specification and validation.

You can install Zod with:

<Tabs items={['pnpm', 'npm', 'yarn']}>
<Tab>
<Snippet text="pnpm install zod" dark />
</Tab>
<Tab>
<Snippet text="npm install zod" dark />
</Tab>
<Tab>
<Snippet text="yarn add zod" dark />
</Tab>
</Tabs>

You can then specify schemas, for example:

```ts
import z from 'zod';

const recipeSchema = z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(
z.object({
name: z.string(),
amount: z.string(),
}),
),
steps: z.array(z.string()),
}),
});
```

These schemas can be used to define parameters for tool calls and generated structured objects with [`generateObject`](/docs/reference/ai-sdk-core/generate-object) and [`streamObject`](/docs/reference/ai-sdk-core/stream-object).
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
title: Why Streaming?
title: Streaming
description: Why use streaming for AI applications?
---

# Streaming

Streaming conversational text UIs (like ChatGPT) have gained massive popularity over the past few months. This section explores the benefits and drawbacks of streaming and blocking interfaces.

Large Language Models (LLMs) are extremely powerful. However, when generating long outputs, they can be very slow compared to the latency you're likely used to. If you try to build a traditional blocking UI, your users might easily find themselves staring at loading spinners for 5, 10, even up to 40s waiting for the entire LLM response to be generated. This can lead to a poor user experience, especially in conversational applications like chatbots. Streaming UIs can help mitigate this issue by **displaying parts of the response as they become available**.
[Large language models (LLMs)](/docs/foundations/overview#large-language-models) are extremely powerful. However, when generating long outputs, they can be very slow compared to the latency you're likely used to. If you try to build a traditional blocking UI, your users might easily find themselves staring at loading spinners for 5, 10, even up to 40s waiting for the entire LLM response to be generated. This can lead to a poor user experience, especially in conversational applications like chatbots. Streaming UIs can help mitigate this issue by **displaying parts of the response as they become available**.

<div className="grid lg:grid-cols-2 grid-cols-1 gap-4 mt-8">
<Card
Expand Down Expand Up @@ -43,6 +43,20 @@ As you can see, the streaming UI is able to start displaying the response much f

While streaming interfaces can greatly enhance user experiences, especially with larger language models, they aren't always necessary or beneficial. If you can achieve your desired functionality using a smaller, faster model without resorting to streaming, this route can often lead to simpler and more manageable development processes.

However, regardless of the speed of your model, the Vercel AI SDK is designed to make implementing streaming UIs as simple as possible.
However, regardless of the speed of your model, the Vercel AI SDK is designed to make implementing streaming UIs as simple as possible. In the example below, we stream text generation from OpenAI's `gpt-4-turbo` in under 10 lines of code using the SDK's [`streamText`](/docs/reference/ai-sdk-core/stream-text) function:

```ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

const { textStream } = await streamText({
model: openai('gpt-4-turbo'),
prompt: 'Write a poem about embedding models.',
});

for await (const textPart of textStream) {
console.log(textPart);
}
```

For an introduction to streaming UIs and the AI SDK, check out our [Getting Started guides](/docs/getting-started).
38 changes: 38 additions & 0 deletions content/docs/01.5-fundamentals/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: Foundations
description: A section that covers foundational knowledge around LLMs and concepts crucial to the Vercel AI SDK
---

# Foundations

<IndexCards
cards={[
{
title: 'Overview',
description: 'Learn about foundational concepts around AI and LLMs.',
href: '/docs/foundations/overview',
},
{
title: 'Providers and Models',
description:
'Learn about the providers and models that you can use with the Vercel AI SDK.',
href: '/docs/foundations/providers-and-models',
},
{
title: 'Prompts',
description:
'Learn about how Prompts are used and defined in the Vercel AI SDK.',
href: '/docs/foundations/prompts',
},
{
title: 'Tools',
description: 'Learn about tools in the Vercel AI SDK.',
href: '/docs/foundations/tools',
},
{
title: 'Streaming',
description: 'Learn why streaming is used for AI applications.',
href: '/docs/ai-sdk-core/generating-structured-data',
},
]}
/>
34 changes: 11 additions & 23 deletions content/docs/03-ai-sdk-core/15-tools-and-tool-calling.mdx
Original file line number Diff line number Diff line change
@@ -1,24 +1,17 @@
---
title: Tools and Tool Calling
description: Learn how to use tools and tool calling with the Vercel AI SDK.
title: Tool Calling
description: Learn about tool calling with Vercel AI SDK Core.
---

# Tools and Tool Calling
# Tool Calling

While large language models have incredible generation capabilities,
they struggle with discrete tasks (eg. mathematics) and interacting with the outside world (eg. getting the weather).
Tools can be thought of as programs you give to a model which can be run as and when the model deems applicable.
As covered under Foundations, [tools](/docs/foundations/tools) are objects that can be called by the model to perform a specific task.

## Tools

A tool is an object that can be called by the model to perform a specific task.
You can use tools with the `generateText` or `streamText` functions, by passing a tool(s) to the `tools` parameter.

There are three elements of a tool, a description, parameters, and an optional execute function.
When used with AI SDK Core, tools contain three elements:

- **`description`**: An optional description of the tool that can influence when the tool is picked.
- **`parameters`**: A [Zod](https://zod.dev/) schema that defines the parameters. It is converted to a JSON schema that is consumed by the LLM, and also used to validate the LLM tool calls.
- **`execute`**: An optional async function that is called with the arguments from the tool call and produces a value of type `RESULT` (generic type). It is optional because you might want to forward tool calls to the client or to a queue instead of executing them in the same process.
- **`parameters`**: A [Zod schema](/docs/foundations/tools#schema-specification-and-validation-with-zod) that defines the parameters. It is converted to a JSON schema that is consumed by the LLM, and also used to validate the LLM tool calls.
- **`execute`**: An optional async function that is called with the arguments from the tool call. It produces a value of type `RESULT` (generic type). It is optional because you might want to forward tool calls to the client or to a queue instead of executing them in the same process.

The `tools` parameter of `generateText` and `streamText` is an object that has the tool names as keys and the tools as values:

Expand All @@ -45,23 +38,18 @@ const result = await generateText({
});
```

<Note>
You can use the `tool` helper function to infer the types of the `execute`
parameters.
<Note className="mb-2">
You can use the [`tool`](/docs/reference/ai-sdk-core/tool) helper function to
infer the types of the `execute` parameters.
</Note>

If the LLM decides to use a tool, it will generate a tool call.
Tools with an `execute` function are run automatically when these calls are generated.
The results of the tool executions are returned using tool result objects.
Each tool result object has a `toolCallId`, a `toolName`, a typed `args` object, and a typed `result`.

<Note>
When a model uses a tool, it is called a "tool call" and the output of the
tool is called a "tool result".
</Note>

Tool calling is not restricted to only text generation.
You can also use it to [render user interfaces with Generative AI](/docs/ai-sdk-rsc/overview#rendering-user-interfaces-with-language-models).
You can also use it to [render user interfaces with Generative AI](TODO UPDATE AFTER NICO PR MERGES).

## Tool Choice

Expand Down
Loading

0 comments on commit 1af45ce

Please sign in to comment.