Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip HTML & XML tags from RSS feed input #1670

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Apr 29, 2024

Description

  • Optionally strip HTML & XML tags embedded in RSS feeds

Requires PR #1665 to be merged first
Closes #1666

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

dagardner-nv and others added 30 commits April 23, 2024 14:23
In insert_dataframe, perform pandas conversion as late as possible to perform
as many operations in cudf as possible, also only pay the cost of converting
the columns we need to pandas.
…g to pandas, and call truncate_string_cols_by_bytes after converting to pandas [no ci]
@dagardner-nv dagardner-nv added bug Something isn't working dependencies Pull requests that update a dependency file non-breaking Non-breaking change labels Apr 29, 2024
@dagardner-nv dagardner-nv self-assigned this Apr 29, 2024
@dagardner-nv dagardner-nv requested review from a team as code owners April 29, 2024 17:50
@dagardner-nv dagardner-nv marked this pull request as draft April 29, 2024 17:50
@dagardner-nv dagardner-nv added the Merge After Dependencies PR is completed and reviewed but depends on another PR; do not merge out of order label Apr 29, 2024
@dagardner-nv dagardner-nv marked this pull request as ready for review April 29, 2024 18:55
morpheus/controllers/rss_controller.py Outdated Show resolved Hide resolved
@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9d3de8a into nv-morpheus:branch-24.06 May 1, 2024
12 checks passed
@dagardner-nv dagardner-nv deleted the david-vdb_upload-strip-tags-1666 branch May 1, 2024 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file Merge After Dependencies PR is completed and reviewed but depends on another PR; do not merge out of order non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: vdb_upload pipeline should be stripping html tags from content
2 participants