Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "strict" interface to replicate.run() #288

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

aron
Copy link
Contributor

@aron aron commented Jul 8, 2024

This PR implements a new strict option that can be passed to run() to coerce the output into the same data types as defined by the model schema.

const stream = replicate.run("my-org/my-llm", {strict: true, input: { ... }});
for await (const chunk in stream) {
  ...
}

const files = replicate.run("my-org/my-gen-img", {strict: true, input: { ... }});
for await (const image in files) {
  const img = Object.createURLObject(await file.blob());
}

const file = replicate.run("my-org/my-model", {strict: true, input: { ... }});
const data = await file.arrayBuffer();

The main two benefits here are that:

  1. Iterators get converted into actual async iterators instead of just arrays which means the stream() method can be deprecated in favor of a single interface. This opens up opportunities for streaming other types of output too.
  2. Path & File outputs are backed by a Blob like representation that can be used to download the data opening up opportunities to stream files directly in the client. Note, that we don't use Blob, File or ArrayBuffer directly because it's likely the client will want the opportunity to decide whether to actually download the file or not. Similar to how body on Response is not consumed until an async read method is called. Feedback on this appreciated.

Note

The intention here is that once this code is merged and validated we migrate the SDKs to use this as the default interface to run() (a breaking change). The strict flag is a way of soft launching the feature. The name strict was chosen as in JavaScript land "use strict" and TypeScript --strict mode are existing functionality for tightening up interfaces.

The need to download the schema for the version is currently unfortunate, I'm also looking at extending the Prediction interface to include input_schema and output_schema to reduce this additional request.

Implementation

The bulk of the code exists in the coerceOutput() function, I think it implements the bulk of the Cog Output interface, but I may have missed some edge cases.

TODO

  • Currently there is no support for enum types.
  • Currently there is no support for date strings.

Comment on lines +442 to +458
#blob = null;
#source = null;
#name = null;

constructor(source, options) {
if (source instanceof URL) {
this.#source = source;
} else {
this.#blob = new Blob(source, options);
}

this.#name = options?.name;
}

toString() {
return `[LazyFile ${this.#source ?? this.#blob.toString()}]`;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, we've been conservative about JS language features like class fields, optional chaining, and nullish coalescing. These aren't new by any measure, but I'm still surprised to see them causing problems in the wild. To the extent that we can do the same without syntactic sugar (e.g. || instead of ??, && instead of ?.), I think that'd be preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to tighten up how we talk about features by specifying platforms that we support and versions. We've made great progress here with the integration suite.

Currently all newer JS runtimes support these language features, Deno, Bun, CloudFlare etc, and Node has supported it since v14 released in 2020, we currently only work in 18+. All browsers since 2020 support these features (and our library doesn't work in the browser).

I'm all for being conservative, but I do also think we need to be pragmatic.

progress(updatedPrediction);
}
if (strict && !identifier.version) {
// Language models only support streaming at the moment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify potential ambiguity in use of "only" here

Suggested change
// Language models only support streaming at the moment.
// Currently, only language models support streaming

@@ -154,6 +153,7 @@ class Replicate {
prediction = await this.predictions.create({
...data,
model: `${identifier.owner}/${identifier.name}`,
stream: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this will cause all models that don't support streaming to respond with {"detail":"Streaming not supported for the output type of the requested version.","status":422}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've just fixed that, consumers no longer need to specify the stream: true flag. All streamable models will get a stream URL 🙌

Comment on lines +209 to +210
const response = await this.models.versions.get(identifier.owner, identifier.name, identifier.version);
const { openapi_schema: schema } = response;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fetching the schema in a separate request is a deal-breaker here. That's something the Python client has been doing from the start, and it's added a lot of unnecessary overhead.

It's not as magical, but could we instead implement coerced run on a model / version? That way, you'd avoid duplicate requests and have a clearer expectations around how/when schemas are validated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as magical, but could we instead implement coerced run on a model / version? That way, you'd avoid duplicate requests and have a clearer expectations around how/when schemas are validated.

I'm not sure what you mean here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants