Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate Redundant Fetches in RSS Controller #1442

Conversation

bsuryadevara
Copy link
Contributor

@bsuryadevara bsuryadevara commented Dec 18, 2023

Description

Addressed redundant feed fetches in the RSS Controller when parsing manually with BeautifulSoup and cache is enabled.

Closes #1419

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

@bsuryadevara bsuryadevara added bug Something isn't working non-breaking Non-breaking change sherlock Issues/PRs related to Sherlock workflows and components labels Dec 18, 2023
@bsuryadevara bsuryadevara self-assigned this Dec 18, 2023
Copy link

copy-pr-bot bot commented Dec 18, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@bsuryadevara bsuryadevara marked this pull request as ready for review December 18, 2023 05:36
@bsuryadevara bsuryadevara requested a review from a team as a code owner December 18, 2023 05:36
@bsuryadevara bsuryadevara changed the title Removed RSS Controller Redundant Fetches Eliminate Redundant Fetches in RSS Controller Dec 18, 2023
@mdemoret-nv
Copy link
Contributor

/ok to test

Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over the changes, I'm getting the feeling that the RSS Controller is getting a bit confusing. Lets refactor the class so the steps are clear and we dont need to check whether we are using a cache or if its a URL so often.

For each item in the input list:

  1. Call a function to turn the input item into text
    1. If this is a file, read the file from disk into a string
    2. If this is a URL, use the session to download the text from the URL
      1. It should not be required at this stage to check whether or not the cache is enabled
  2. With the returned text, try to parse it with feedparser
  3. If feedparser fails, try to parse with BeautifulSoup

That should be the only steps necessary to process the feed and should clean up the code a bit.

morpheus/controllers/rss_controller.py Outdated Show resolved Hide resolved
@bsuryadevara
Copy link
Contributor Author

/ok to test

@bsuryadevara
Copy link
Contributor Author

/ok to test

1 similar comment
@bsuryadevara
Copy link
Contributor Author

/ok to test

@mdemoret-nv
Copy link
Contributor

/ok to test

Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test

@mdemoret-nv
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 659e735 into nv-morpheus:branch-24.03 Jan 22, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change sherlock Issues/PRs related to Sherlock workflows and components
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: RSSController performs redundant fetches
2 participants