Skip to content

Commit

Permalink
Update EVALUATION.md
Browse files Browse the repository at this point in the history
  • Loading branch information
antoninoLorenzo committed Jul 16, 2024
1 parent e5ce191 commit 9ab8618
Showing 1 changed file with 3 additions and 14 deletions.
17 changes: 3 additions & 14 deletions EVALUATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,14 @@ Our objective is to monitor and improve the RAG pipeline for **AI-OPS**, that re
The evaluation workflow is split in two steps:

1. **Dataset Generation** ([dataset_generation.ipynb](./test/benchmarks/rag/dataset_generation.ipynb)):
uses Ollama and the data that is ingested into Qdrant (RAG Vector Database) to generate *question* and *ground truth*
uses Gemini free API and the data that is ingested into Qdrant (RAG Vector Database) to generate *question* and *ground truth*
(Q&A dataset).

2. **Evaluation** ([evaluation.py](./test/benchmarks/rag/evaluation.py)):
builds the RAG pipeline with the same used to generate the synthetic Q&A dataset, leverages the pipeline to provide
an *answer* to the questions (given *contex*), then performs evaluation of the full evaluation dataset using LLM as a
judge; for performance reasons the evaluation is performed using HuggingFace Inference API.
judge. Here everything related to generation is done via Ollama with the same models integrated in **AI-OPS**.

## Results

### Context Precision

**TODO:** *describe the metric and the prompts used*

![Context Precision Plot](data/rag_eval/results/plots/context_precision.png)

### Context Recall

**TODO:** *describe the metric and the prompts used*


![Context Precision Plot](data/rag_eval/results/plots/context_recall.png)
![Context Precision Plot](data/rag_eval/results/plots/plot.png)

0 comments on commit 9ab8618

Please sign in to comment.