Skip to content
Snippets Groups Projects
Commit dde2b5ca authored by Müller, Hanna's avatar Müller, Hanna
Browse files

update readme

parent 7933f4f2
No related branches found
No related tags found
No related merge requests found
......@@ -15,20 +15,23 @@ We used the publicly available [CISI collection](https://ir.dcs.gla.ac.uk/resour
## How to run
To run the first make sure all the requirements are met. Simply use the command "pip install -r requirements.txt" in your terminal. This will install all the packages required to run our code. Check if all libraries are installed.
To run the first make sure all the requirements are met. Simply use the command "pip install -r requirements.txt" in your terminal. This will install all the packages required to run our code. Check if all libraries are installed.
Secondly for running the "reranker-cosine.ipynb" notebook it is necessary to add the "GoogleNews-vectors-negative300.bin" word-embeddings file into the "models" -folder, they were not uploaded to the repository initially, because of the large size. However they can be downloaded [here](https://www.kaggle.com/datasets/leadbest/googlenewsvectorsnegative300).
Secondly, for running the "reranker-cosine.ipynb" notebook it is necessary to add the "GoogleNews-vectors-negative300.bin" word-embeddings file into the "models" folder as they were not uploaded to the repository initially, because of its large size. However, they can be downloaded [here](https://www.kaggle.com/datasets/leadbest/googlenewsvectorsnegative300).
## Files
<u>**initial-retrieval.ipynb**</u>
Creating the intial-retrieval of the 76 queries using bm-25. Will retrieve 100 documents out of 1460 documents per query and save the results in _initial_retrieval_with_bm25_scores.pkl_.
Creating the initial retrieval of the queries using bm25. Retrieves 100 documents out of 1460 documents per query and saves the results in _initial_retrieval_with_bm25_scores.pkl_.
<u>**reranker-cosine.ipynb**</u>
Re-ranks the intial documents with the help of cosine similartiy and the pre-trained embeddings of _GoogleNews-vectors-negative300_. Will retrieve 50 documents out of the initial 100 per query and save the results in _reranker_embeddings_cosine_results.pkl_.
Re-ranks the retrieval results with the help of cosine similarity and the pre-trained embeddings of _GoogleNews-vectors-negative300_. Retrieves 50 documents out of the initial 100 per query and saves the results in _reranker_embeddings_cosine_results.pkl_.
<u>**reranker-bertopic.ipynb**</u>
Will create topics for all documents and queries and re-rank the intial retrieval. Will retrieve 50 documents per query and save the results in _reranker_bertopic_results_topic_model.pkl_.
Creates topics for all documents and queries and re-ranks the initial retrieval results. Retrieves 50 documents per query and saves the results in _reranker_bertopic_results_topic_model.pkl_.
<u>**evaluation.ipynb**</u>
Takes the results of all three methods and calculates Recall@k, Precicsion@k, F1@k and nDCG@k. Will create plots for visualizing the results.
\ No newline at end of file
Takes the results of all three methods and calculates Recall@k, Precicsion@k, F1@k and nDCG@k. Creates plots for visualizing the results.
<u>**dataset_and_bertopic_analysis.ipynb**</u>
Contains analysis of the dataset and BERTopic experiments.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment