Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Sources's "website link mode" does not scrape recursively the entire web site #655

Open
marcofiocco opened this issue Aug 5, 2024 · 1 comment
Assignees
Labels
wontfix This will not be worked on

Comments

@marcofiocco
Copy link

It would be a very useful feature.

If you cannot implement it, I can imlement my own web scraper, but what would be the best way to load all the scraped webpages?
Even Browse mode does not allow to specify whole folders, but just multi-selection files

@kartikpersistent kartikpersistent added the wontfix This will not be worked on label Aug 21, 2024
@jexp
Copy link
Contributor

jexp commented Aug 21, 2024

I think as this is a mass processing job, it would make sense to use the underlying python code with LLMGraphTransformer in Langchain.

https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/#llm-graph-transformer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants