Update file README.md

9f751371 · Linke, Julian · d0f8c15f · 9f751371
Commit 9f751371 authored 1 year ago by Linke, Julian
--- a/README.md
+++ b/README.md
@@ -15,28 +15,32 @@ Stay tuned for updates and we appreciate your interest in our work. Please conti
 - matplotlib
 ## Repository structure
-The repository includes data folders which you need to prepare. The repository also includes example files from the BEA corpus (Hungarian) and the GRASS corpus (Austrian German) which makes it possible to run an example from scratch. The speech data should ne stored in the folder ```BEAGR``` and should look like this:
+The repository includes a **main script** (```run.sh```), a folder named **local** which includes Python scripts (```local/*py```) and an example data folder (if you want to work with your own data you would need to prepare this folder). 
- BEAGR/data_BEA_CS
+The example data folder includes example files from the BEA corpus (Hungarian) and the GRASS corpus (Austrian German) which makes it possible to run an experiment from scratch. In general, the speech data should be stored in the folder ```DATA``` and in case of the example experiment folder ```BEAGR``` it should get clear how a specific speech data folder should be structured:
+- DATA/BEAGR/data_BEA_CS
  - Various speaker (spkID1, spkID2, ...) folders
    - Various .wav or .flac files (fs=16kHz)
- BEAGR/data_BEA_RS
+- DATA/BEAGR/data_BEA_RS
  - Various speaker (spkID1, spkID2, ...) folders
    - Various .wav or .flac files (fs=16kHz)
- BEAGR/data_GR_CS
+- DATA/BEAGR/data_GR_CS
  - Various speaker (spkID1, spkID2, ...) folders
    - Various .wav or .flac files (fs=16kHz)
- BEAGR/data_GR_RS
+- DATA/BEAGR/data_GR_RS
  - Various speaker (spkID1, spkID2, ...) folders
    - Various .wav or .flac files (fs=16kHz)
-As you can see ```BEAGR``` includes the subfolders ```data_BEA_CS``` (BEA Spontaneous Speech), ```data_BEA_RS``` (BEA Read Speech), ```data_GR_CS``` (GRASS Conversational Speech) and ```data_GR_RS``` (GRASS Read Speech). **Please make sure that folders are named like this: ```data_{corpus}_{speakingstyle}```**. The audio files should have a sampling rate of 16kHz and can be .wav or .flac files. Given this structure and after installing/preparing all dependencies (see below) you should be able to run the experiment. To run a specific stage of the script for a specific dataset, provide the directory where all you data is stored (here ```BEAGR```) and an integer as an argument to the `./run.sh` command. For instance, to run stage ```3``` for the example dataset, you would use the following command:
+The example folder ```BEAGR``` (which must be placed in ```DATA/```) sort of defines one experiment and includes the subfolders ```data_BEA_CS``` (BEA Spontaneous Speech), ```data_BEA_RS``` (BEA Read Speech), ```data_GR_CS``` (GRASS Conversational Speech) and ```data_GR_RS``` (GRASS Read Speech). **Please make sure that those folders are named like this: ```data_{corpus}_{speakingstyle}```**. The audio files should have a sampling rate of 16kHz and can be .wav or .flac files. Given this structure and after installing/preparing all dependencies (see below) you should be able to run the experiment. 
+To run a specific stage of the script for a specific dataset, provide the directory where all your data is stored (here ```DATA/BEAGR```), an experiment name (here ```BEAGR```) and an integer as an argument to the `./run.sh` command. For instance, to run stage ```3``` for the example dataset ```DATA/BEAGR``` with the experiment name ```BEAGR```, you would use the following command:
 ```
-./run.sh BEAGR 3
+./run.sh DATA/BEAGR/ BEAGR 3
 ```
-The command automatically generates the experiment folder ```exp_BEAGR```. Note that stage ```0``` deletes the entire experiment folder and restarts running everything in a row.
+The command automatically generates the experiment folder ```exp_BEAGR```. Note that stage ```0``` deletes this entire experiment folder if it already existed and restarts the entire experiment by running all stages in a row (see below an overview of the stages).
 ## Reproduction
 The following steps are necessary to reproduce the experiment. At first you need to create a conda envrionment and install the necessary packages. Second you have to  clone the fairseq repository and modify the file ```path.sh``` to export necessary environment variables. 
@@ -60,6 +64,8 @@ source */anaconda3/etc/profile.d/conda.sh
 conda activate speechcodebookanalysis
 ```
+The file ```conda.sh``` is sourced at the beginning of ```run.sh```.
 ### Fairseq Repository
 You need to clone the fairseq repository to another directory (e.g., ```../fairseq```).
@@ -67,7 +73,16 @@ You need to clone the fairseq repository to another directory (e.g., ```../fairs
 git clone https://github.com/facebookresearch/fairseq.git
 ``` 
-Make sure to modify the file ```path.sh``` in order to export the necessary environment variables.
+Make sure to modify the file ```path.sh``` in order to export the necessary environment variables. The file ```path.sh``` is also sourced at the beginning of ```run.sh```.
 #### Stages
+Here is a short overview of the stages:
+- stage 0: deletes experiment folder if it exists and runs all subsequent stages in a row
+- stage 1: prepares the data given an experiment folder (e.g., ```DATA/BEAGR```); resulting files are stored in ```exp_*/data/```
+- stage 2: counts frequencies of used codebook entries per speaker; if VERBOSE is true this stage also generates log-files; **ATTENTION:** if you need to extract features with a CPU, set ```device = torch.device('cpu')``` in the script ```local/codebook_freqs.py``` (default is ```device = torch.device('gpu')```); resulting files are stored in ```exp_*/logs/```, ```exp_*/numpy/``` and```exp_*/txt/```
+- stage 3: prepares and stores a similarty matrix in the folder ```exp_*/numpy/```
+- stage 4: performs a PCA on the similarity matrix and plots the PCA space; resulting ```*.png```-files are stored in ```exp_*/plots/analysis/```
+- stage 5: performs k-means on the resulting PCA space and assigns classes; **ATTENTION:** the parameter ```nclust``` in the script ```run.sh``` specifies the number of allowed clusters which should be modified depending on the task; resulting ```*.png```-files (confusion matrices) are stored in ```exp_*/plots/kmenas/```