Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions on return-generation-logits when streaming #2090

Open
binhtranmcs opened this issue Aug 6, 2024 · 2 comments
Open

questions on return-generation-logits when streaming #2090

binhtranmcs opened this issue Aug 6, 2024 · 2 comments
Assignees
Labels
feature request New feature or request

Comments

@binhtranmcs
Copy link

I see that when run streaming inference, the result contains generationLogits for the full sequence. Which means it will contain a tensor of shape batch_size*beam_size*max_output_length*vocab_size every time the executor returns a token, which is very inefficient. Is this expected behavior? I believe returning only the logits for the current token would be more optimal.

Please have a look. Thanks in advance!

Copy link

github-actions bot commented Sep 6, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

@github-actions github-actions bot added the stale label Sep 6, 2024
@lfr-0531 lfr-0531 added the feature request New feature or request label Sep 7, 2024
@github-actions github-actions bot removed the stale label Sep 8, 2024
@AdamzNV
Copy link
Collaborator

AdamzNV commented Sep 9, 2024

It's under our consideration now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants