Confluence loader "keep_newlines" not always passed to "process_pages" #20086

KevinHubert-Dev · 2024-04-05T20:31:46Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

libs/community/langchain_community/document_loaders/confluence.py
@@ -359,6 +359,7 @@ def _lazy_load(self, **kwargs: Any) -> Iterator[Document]:
content_format,
ocr_languages,
keep_markdown_format,
keep_newlines=keep_newlines
)

Error Message and Stack Trace (if applicable)

No response

Description

I use the confluence loader of langchain to download the pages content of a specific page of my confluence instance. While textspllitting/chunking the pages I've noticed that in none-markdown format the newlines were missing. During the debugging I saw that that the required forward-pass of the keep_newlines parameter was not passed down to all call of the process_pages function inside of
libs/community/langchain_community/document_loaders/confluence.py

System Info

langchain=0.1.14
windows 11
python 3.10

The text was updated successfully, but these errors were encountered:

KevinHubert-Dev · 2024-04-05T20:32:19Z

I've forked the repo and will open a pull request in a few minutes.

…to 'process_pages' function in confluence loader (#20086) (#20087) - **Description:** Fixed missing `keep_newlines` parameter forward-pass in confluence-loader - **Issue:** #20086 - **Dependencies:** None --------- Co-authored-by: ccurme <[email protected]>

dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🔌: chroma Primarily related to ChromaDB integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Apr 5, 2024

KevinHubert-Dev mentioned this issue Apr 5, 2024

community[minor]: Fix missing 'keep_newlines' parameter forward-pass to 'process_pages' function in confluence loader (#20086) #20087

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 5, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confluence loader "keep_newlines" not always passed to "process_pages" #20086

Confluence loader "keep_newlines" not always passed to "process_pages" #20086

KevinHubert-Dev commented Apr 5, 2024

KevinHubert-Dev commented Apr 5, 2024

Confluence loader "keep_newlines" not always passed to "process_pages" #20086

Confluence loader "keep_newlines" not always passed to "process_pages" #20086

Comments

KevinHubert-Dev commented Apr 5, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

KevinHubert-Dev commented Apr 5, 2024