Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to keep my PDF data Private #600

Open
Naveen-Chaurasia opened this issue Jul 22, 2024 · 1 comment
Open

Want to keep my PDF data Private #600

Naveen-Chaurasia opened this issue Jul 22, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@Naveen-Chaurasia
Copy link

I am using the LLM Graph Builder application and have concerns regarding the privacy and security of the data I upload. Specifically, I need to ensure that the uploaded data (e.g., PDFs) remains private and is not accessible by unauthorized parties, including OpenAI. I would like to know the following:

Privacy of Uploaded Data:

  1. How is the uploaded data stored within the application?
  2. Are there any measures in place to ensure that the data remains private and secure?
  3. Will OpenAI have access to the data that I upload to the LLM Graph Builder application?
@Kain-90
Copy link
Contributor

Kain-90 commented Jul 29, 2024

At the moment, it seems to me that the process of handling data looks like this

  1. Upload: the full uploaded files are temporarily stored in the backend/merged_files directory, and then stored in the neo4j database.
  2. embedding session: the documents are divided into chunks, which are transformed into high level vectors by the embedding model and also stored in the neo4j database.
  3. entity extraction session: each chunk is again extracted by the big model to extract entities and relationships from it, and finally stored to the neo4j database.

Therefore, if you want to ensure complete data privacy, you need to make sure that the data processor is local to you or a third party you trust. At least that's what I'm doing at the moment.

  1. Local deployment of the Neo4j database
  2. Local deployment of embedded models
  3. Local ollama deployment of entity-relationship extraction llm

If there is something missing, please feel free to add it.

@kartikpersistent kartikpersistent added the question Further information is requested label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants