Skip to content

Commit

Permalink
Tweak parameters in regression yaml (#2581)
Browse files Browse the repository at this point in the history
+ simplified parameters in cases where there are default (for BEIR)
+ moved "threads" parameter up closer to beginning of command (for indexing, all regressions)
  • Loading branch information
lintool committed Aug 31, 2024
1 parent e0a9578 commit 9744463
Show file tree
Hide file tree
Showing 978 changed files with 1,880 additions and 1,211 deletions.
3 changes: 2 additions & 1 deletion docs/regressions/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-generator WashingtonPostGenerator \
-index indexes/lucene-index.wapo.v2/ \
-threads 1 -storePositions -storeDocvectors -storeRaw \
-storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v2 &
```

Expand Down
3 changes: 2 additions & 1 deletion docs/regressions/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection WashingtonPostCollection \
-input /path/to/wapo.v2 \
-generator WashingtonPostGenerator \
-index indexes/lucene-index.wapo.v2/ \
-threads 1 -storePositions -storeDocvectors -storeRaw \
-storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v2 &
```

Expand Down
3 changes: 2 additions & 1 deletion docs/regressions/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection WashingtonPostCollection \
-input /path/to/wapo.v3 \
-generator WashingtonPostGenerator \
-index indexes/lucene-index.wapo.v3/ \
-threads 1 -storePositions -storeDocvectors -storeRaw \
-storePositions -storeDocvectors -storeRaw \
>& logs/log.wapo.v3 &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
-quantize.int8 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +56,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 &
-hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
-quantize.int8 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +56,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-arguana.test.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +55,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 &
-hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +55,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-flat-onnx.topics.beir-v1.0.0-arguana.test.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized HNSW indexes:

```
bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 -quantize.int8 \
-M 16 -efC 100 -quantize.int8 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -57,7 +58,7 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-hnsw-int8-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -efSearch 1000 &
-hits 1000 -efSearch 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized HNSW indexes:

```
bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw-int8.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge -quantize.int8 \
-M 16 -efC 100 -memoryBuffer 65536 -noMerge -quantize.int8 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -57,7 +58,7 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-hnsw-int8-onnx.topics.beir-v1.0.0-arguana.test.txt \
-generator VectorQueryGenerator -topicField title -removeQuery -threads 16 -hits 1000 -efSearch 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -efSearch 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building HNSW indexes:

```
bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 \
-M 16 -efC 100 \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -57,7 +58,7 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-hnsw-cached.topics.beir-v1.0.0-arguana.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -efSearch 1000 &
-hits 1000 -efSearch 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building HNSW indexes:

```
bin/run.sh io.anserini.index.IndexHnswDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-arguana.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-hnsw.beir-v1.0.0-arguana.bge-base-en-v1.5/ \
-threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge \
-M 16 -efC 100 -memoryBuffer 65536 -noMerge \
>& logs/log.beir-v1.0.0-arguana.bge-base-en-v1.5 &
```

Expand All @@ -57,7 +58,7 @@ bin/run.sh io.anserini.search.SearchHnswDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-arguana.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-arguana.bge-base-en-v1.5.bge-hnsw-onnx.topics.beir-v1.0.0-arguana.test.txt \
-generator VectorQueryGenerator -topicField title -removeQuery -threads 16 -hits 1000 -efSearch 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -efSearch 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
3 changes: 2 additions & 1 deletion docs/regressions/regressions-beir-v1.0.0-arguana.flat-wp.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection BeirFlatCollection \
-input /path/to/beir-v1.0.0-arguana.flat-wp \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.flat-wp/ \
-threads 1 -storePositions -storeDocvectors -storeRaw -pretokenized \
-storePositions -storeDocvectors -storeRaw -pretokenized \
>& logs/log.beir-v1.0.0-arguana.flat-wp &
```

Expand Down
3 changes: 2 additions & 1 deletion docs/regressions/regressions-beir-v1.0.0-arguana.flat.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection BeirFlatCollection \
-input /path/to/beir-v1.0.0-arguana.flat \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.flat/ \
-threads 1 -storePositions -storeDocvectors -storeRaw \
-storePositions -storeDocvectors -storeRaw \
>& logs/log.beir-v1.0.0-arguana.flat &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 1 \
-collection BeirMultifieldCollection \
-input /path/to/beir-v1.0.0-arguana.multifield \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.multifield/ \
-threads 1 -storePositions -storeDocvectors -storeRaw -fields title \
-storePositions -storeDocvectors -storeRaw -fields title \
>& logs/log.beir-v1.0.0-arguana.multifield &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,12 @@ Sample indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 16 \
-collection JsonVectorCollection \
-input /path/to/beir-v1.0.0-arguana.splade-pp-ed \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.splade-pp-ed/ \
-threads 16 -impact -pretokenized \
-impact -pretokenized \
>& logs/log.beir-v1.0.0-arguana.splade-pp-ed &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,12 @@ Sample indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 16 \
-collection JsonVectorCollection \
-input /path/to/beir-v1.0.0-arguana.splade-pp-ed \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.splade-pp-ed/ \
-threads 16 -impact -pretokenized \
-impact -pretokenized \
>& logs/log.beir-v1.0.0-arguana.splade-pp-ed &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,12 @@ Typical indexing command:

```
bin/run.sh io.anserini.index.IndexCollection \
-threads 16 \
-collection JsonVectorCollection \
-input /path/to/beir-v1.0.0-arguana.unicoil-noexp \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-inverted.beir-v1.0.0-arguana.unicoil-noexp/ \
-threads 16 -impact -pretokenized \
-impact -pretokenized \
>& logs/log.beir-v1.0.0-arguana.unicoil-noexp &
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-bioasq.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-bioasq.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
-quantize.int8 \
>& logs/log.beir-v1.0.0-bioasq.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +56,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-bioasq.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-bioasq.bge-base-en-v1.5.bge-flat-int8-cached.topics.beir-v1.0.0-bioasq.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 &
-hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ Sample indexing command, building quantized flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-bioasq.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat-int8.beir-v1.0.0-bioasq.bge-base-en-v1.5/ \
-threads 16 -quantize.int8 \
-quantize.int8 \
>& logs/log.beir-v1.0.0-bioasq.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +56,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-bioasq.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-bioasq.bge-base-en-v1.5.bge-flat-int8-onnx.topics.beir-v1.0.0-bioasq.test.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-bioasq.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.beir-v1.0.0-bioasq.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.beir-v1.0.0-bioasq.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +55,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-bioasq.test.bge-base-en-v1.5.jsonl.gz \
-topicReader JsonStringVector \
-output runs/run.beir-v1.0.0-bioasq.bge-base-en-v1.5.bge-flat-cached.topics.beir-v1.0.0-bioasq.test.bge-base-en-v1.5.jsonl.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 &
-hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Sample indexing command, building flat indexes:

```
bin/run.sh io.anserini.index.IndexFlatDenseVectors \
-threads 16 \
-collection JsonDenseVectorCollection \
-input /path/to/beir-v1.0.0-bioasq.bge-base-en-v1.5 \
-generator DenseVectorDocumentGenerator \
-index indexes/lucene-flat.beir-v1.0.0-bioasq.bge-base-en-v1.5/ \
-threads 16 \
>& logs/log.beir-v1.0.0-bioasq.bge-base-en-v1.5 &
```

Expand All @@ -55,7 +55,7 @@ bin/run.sh io.anserini.search.SearchFlatDenseVectors \
-topics tools/topics-and-qrels/topics.beir-v1.0.0-bioasq.test.tsv.gz \
-topicReader TsvString \
-output runs/run.beir-v1.0.0-bioasq.bge-base-en-v1.5.bge-flat-onnx.topics.beir-v1.0.0-bioasq.test.txt \
-generator VectorQueryGenerator -topicField vector -removeQuery -threads 16 -hits 1000 -encoder BgeBaseEn15 &
-encoder BgeBaseEn15 -hits 1000 -removeQuery -threads 16 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
Loading

0 comments on commit 9744463

Please sign in to comment.