Skip to content
Snippets Groups Projects
Name Last commit Last update
README.md

SpeechCodebookAnalysis

Hello and welcome to our project! Here's a brief introduction about what you can expect:

This project contains the code related to the analytical section of our research paper, "What do self-supervised speech representations encode? An analysis of languages, varieties, speaking styles and speakers", which has been accepted for Interspeech 2023 in Dublin.

As of now, this project is a placeholder. We're still in the process of polishing the final details. However, we assure you that the complete project will be up and running before the conference commences.

Stay tuned for updates and we appreciate your interest in our work. Please continue exploring this README for more details on the project setup, codebase, and how to navigate through it.

Dependencies

  • Python3.8
  • fairseq
  • matplotlib

Repository structure

The repository includes data folders which you need to prepare. The repository also includes example files from the BEA corpus (Hungarian) and the GRASS corpus (Austrian German) which makes it possible to run an example from scratch. The speech data should ne stored in the folder BEAGR and should look like this:

  • BEAGR/data_BEA_CS
    • Various speaker (spkID1, spkID2, ...) folders
      • Various .wav or .flac files (fs=16kHz)
  • BEAGR/data_BEA_RS
    • Various speaker (spkID1, spkID2, ...) folders
      • Various .wav or .flac files (fs=16kHz)
  • BEAGR/data_GR_CS
    • Various speaker (spkID1, spkID2, ...) folders
      • Various .wav or .flac files (fs=16kHz)
  • BEAGR/data_GR_RS
    • Various speaker (spkID1, spkID2, ...) folders
      • Various .wav or .flac files (fs=16kHz)

As you can see BEAGR includes the subfolders data_BEA_CS (BEA Spontaneous Speech), data_BEA_RS (BEA Read Speech), data_GR_CS (GRASS Conversational Speech) and data_GR_RS (GRASS Read Speech). Please make sure that folders are named like this: data_{corpus}_{speakingstyle}. The audio files should have a sampling rate of 16kHz and can be .wav or .flac files. Given this structure and after installing/preparing all dependencies (see below) you should be able to run the experiment. To run a specific stage of the script for a specific dataset, provide the directory where all you data is stored (here BEAGR) and an integer as an argument to the ./run.sh command. For instance, to run stage 3 for the example dataset, you would use the following command:

./run.sh BEAGR 3

The command automatically generates the experiment folder exp_BEAGR. Note that stage 0 would run everything in a row.

Reproduction

The following steps are necessary to reproduce the experiment. At first you need to create a conda envrionment and install the necessary packages. Second you have to clone the fairseq repository and modify the file path.sh to export necessary environment variables.

Conda environment

You need to install the following packages:

conda create -n speechcodebookanalysis python=3.8
conda activate speechcodebookanalysis
pip install fairseq
pip install matplotlib
pip install scikit-learn
pip install faiss-cpu

Fairseq Repository

You need to clone the fairseq repository to another directory (e.g., ../fairseq). The file path.sh needs to modified in order to export the necessary environment variables.

git clone https://github.com/facebookresearch/fairseq.git

Stages