Multi-container application based on Pyterrier
This guide provides setup instructions for a simple instance of the stella-app
with micro-services based on Pyterrier. The resulting multi-container application contains several Pyterrier-based micro-services with different ranking methods, and additionally, the dashboard service of the stella-server
.
Prerequisites
Setup
1. Download the data and prepare the index
Pyterrier has a good support of the data catalog ir_datasets
. In this guide, we will use ir_datasets
to prepare index files before starting the stella-app
. First, install the Python package:
pip install --upgrade ir_datasets
We make use of the Cord19 dataset (more specifically, the metadata) that was also used as part of TREC Covid. Execute the following code cell to write the index files into the specified subfolder:
import pyterrier as pt
pt.init()
dataset = pt.get_dataset('irds:cord19')
# Index cord19
indexer = pt.IterDictIndexer('./indices/cord19')
index_ref = indexer.index(dataset.get_corpus_iter(), fields=['title', 'doi', 'date', 'abstract'])
2. Clone stella-app
and copy the index files
Next, clone the repository of the stella-app
from Github:
git clone https://github.com/stella-project/stella-app.git
And afterward, move the index files into the index
directory of the stella-app
. All micro-services will share this index.
mv ./indices/cord19 stella-app/index/
3. Start stella-app
In order to run the multi-container application, make sure you have Docker and docker-compose installed.
docker-compose -f stella-app/yml/pyterrier.yml up -d
It will take some time to build the application. You can verify if every container is running via the CLI commands or with an administration interface like Portainer.
4. Send a request to the stella-app
Below you will find an example request of how to retrieve a ranking for the query coronavirus origin
:
curl localhost:8080/stella/api/v1/ranking?query=coronavirus%20origin&rpp=10
In response, you receive a JSON-formatted output like the following:
{
"body": {
"1": {
"docid": "pl48ev5o",
"type": "EXP"
},
"2": {
"docid": "75773gwg",
"type": "BASE"
},
"3": {
"docid": "kn2z7lho",
"type": "BASE"
},
"4": {
"docid": "xwi9pdd2",
"type": "EXP"
},
"5": {
"docid": "irkjiqll",
"type": "EXP"
},
"6": {
"docid": "4fb291hq",
"type": "BASE"
},
"7": {
"docid": "kqqantwg",
"type": "BASE"
},
"8": {
"docid": "jpnbppry",
"type": "EXP"
},
"9": {
"docid": "es7q6c90",
"type": "EXP"
},
"10": {
"docid": "ne5r4d4b",
"type": "BASE"
}
},
"header": {
"container": {
"base": "pyterrier_bm25",
"exp": "pyterrier_pl2"
},
"hits": 10,
"page": 0,
"q": "coronavirus origin",
"rid": 3,
"rpp": 10,
"sid": "30fae19ec7574ad68af0bff23409adb5"
}
}