Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: vdb_upload example pipeline error on inserting large strings #1650

Closed
2 tasks done
dagardner-nv opened this issue Apr 19, 2024 · 2 comments · Fixed by #1665
Closed
2 tasks done

[BUG]: vdb_upload example pipeline error on inserting large strings #1650

dagardner-nv opened this issue Apr 19, 2024 · 2 comments · Fixed by #1665
Assignees
Labels
bug Something isn't working

Comments

@dagardner-nv
Copy link
Contributor

Version

24.03

Which installation method(s) does this occur on?

Source

Describe the bug.

Occurs intermittently, presumably based on the content fetched via the RSS feeds.

Minimum reproducible example

python examples/llm/main.py vdb_upload pipeline --stop_after=1024

Relevant log output

Click here to see error details

Unable to insert into collection: VDBUploadExample due to <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 132886, max length: 65535)>
RPC error: [batch_insert], <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 132886, max length: 65535)>, <Time:{'RPC start': '2024-04-19 10:35:27.637451', 'RPC error': '2024-04-19 10:35:27.670419'}>

Full env printout

Click here to see environment details

[Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

  • I agree to follow Morpheus' Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@dagardner-nv dagardner-nv added the bug Something isn't working label Apr 19, 2024
@dagardner-nv
Copy link
Contributor Author

The issue is that Milvus has a max string length of 65535 bytes. Characters such as ñ will consume two bytes.

@dagardner-nv dagardner-nv changed the title [BUG]: vdb_upload error on inserting large strings [BUG]: vdb_upload pipeline error on inserting large strings Apr 22, 2024
@dagardner-nv dagardner-nv changed the title [BUG]: vdb_upload pipeline error on inserting large strings [BUG]: vdb_upload example pipeline error on inserting large strings Apr 22, 2024
@dagardner-nv
Copy link
Contributor Author

This appears to be in part a bug on the milvus side as well. If I create a string containing multi-byte characters that is two characters longer than the max of 65535 chars, I receive this exception reflecting the char-length of 65537:

RPC error: [batch_insert], <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 65537, max length: 65535)>, <Time:{'RPC start': '2024-04-23 08:46:27.520470', 'RPC error': '2024-04-23 08:46:27.520651'}>

If I then truncate the data and retry, then I receive a new exception this time reflecting the byte length or 196605:

df['content'] = df['content'].str.slice(0, MAX_STRING_LENGTH)
RPC error: [batch_insert], <MilvusException: (code=1100, message=the length (196605) of 0th string exceeds max length (65535): invalid parameter[expected=valid length string][actual=string length exceeds max length])>, <Time:{'RPC start': '2024-04-23 08:48:08.730043', 'RPC error': '2024-04-23 08:48:08.733671'}>

@dagardner-nv dagardner-nv self-assigned this Apr 25, 2024
@rapids-bot rapids-bot bot closed this as completed in #1665 May 1, 2024
rapids-bot bot pushed a commit that referenced this issue May 1, 2024
* Adds new helper methods to `morpheus.io.utils`, `cudf_string_cols_exceed_max_bytes` and `truncate_string_cols_by_bytes`
* When `truncate_long_strings=True` `MilvusVectorDBResourceService` will truncate all `VARCHAR` fields according to the schema's `max_length`
* Add `truncate_long_strings=True` in config for `vdb_upload` pipeline
* Set C++ mode to default for example LLM pipelines
* Remove issues 1650 & 1651 from `known_issues.md`

Closes #1650 
Closes #1651

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md).
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #1665
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant