diff --git a/EVALUATION.md b/EVALUATION.md
index f258452..af89165 100644
--- a/EVALUATION.md
+++ b/EVALUATION.md
@@ -21,25 +21,14 @@ Our objective is to monitor and improve the RAG pipeline for **AI-OPS**, that re
 The evaluation workflow is split in two steps:
 
 1. **Dataset Generation** ([dataset_generation.ipynb](./test/benchmarks/rag/dataset_generation.ipynb)):
-uses Ollama and the data that is ingested into Qdrant (RAG Vector Database) to generate *question* and *ground truth* 
+uses Gemini free API and the data that is ingested into Qdrant (RAG Vector Database) to generate *question* and *ground truth* 
  (Q&A dataset).
 
 2. **Evaluation** ([evaluation.py](./test/benchmarks/rag/evaluation.py)):
 builds the RAG pipeline with the same used to generate the synthetic Q&A dataset, leverages the pipeline to provide
  an *answer* to the questions (given *contex*), then performs evaluation of the full evaluation dataset using LLM as a
-judge; for performance reasons the evaluation is performed using HuggingFace Inference API.
+judge. Here everything related to generation is done via Ollama with the same models integrated in **AI-OPS**.
 
 ## Results
 
-### Context Precision
-
-**TODO:** *describe the metric and the prompts used* 
-
-![Context Precision Plot](data/rag_eval/results/plots/context_precision.png)
-
-### Context Recall
-
-**TODO:** *describe the metric and the prompts used* 
-
-
-![Context Precision Plot](data/rag_eval/results/plots/context_recall.png)
+![Context Precision Plot](data/rag_eval/results/plots/plot.png)