Getting Started¶
This page walks you through installing VTC and preparing your audio files. By the end, your machine will be ready to run VTC and your recordings will be in the format VTC expects.
If anything goes wrong, see the Common errors table at the bottom of this page, or the more detailed Troubleshooting page.
Requirements¶
What you need before installing VTC: a supported operating system and three small command-line tools.
- Operating system: Linux or macOS. Windows is not supported. If you only have a Windows machine, the easiest workaround is to use Windows Subsystem for Linux (WSL).
- Three small tools that VTC depends on:
You don't need to install Python yourself
uv handles the correct Python version and all of its packages automatically. Just install uv (instructions below) and you're set.
Installation¶
Copy and paste the block matching your operating system into a terminal. The whole process is about five commands.
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install ffmpeg and git-lfs (Ubuntu/Debian)
sudo apt install ffmpeg git-lfs
# Clone the repo (--recurse-submodules is required for model weights)
git lfs install
git clone --recurse-submodules https://github.com/LAAC-LSCP/VTC.git
cd VTC
# Install Python dependencies
uv sync
# Verify everything is set up
./check_sys_dependencies.sh
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install brew (a package manager for macOS)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install ffmpeg and git-lfs (macOS)
brew install ffmpeg git-lfs
# Clone the repo (--recurse-submodules is required for model weights)
git lfs install
git clone --recurse-submodules https://github.com/LAAC-LSCP/VTC.git
cd VTC
# Install Python dependencies
uv sync
# Verify everything is set up
./check_sys_dependencies.sh
Don't skip --recurse-submodules
The trained VTC model is stored as a separate piece (a "submodule") inside the repository. Without --recurse-submodules, that piece stays empty, the model weights never get downloaded, and VTC will fail when you try to run it.
If you've already cloned without that flag, you can fix it after the fact:
Prepare your audio¶
VTC only reads WAV files at 16 kHz, mono. This section shows you how to organise your files and how to convert them if they're in a different format.
Recommended folder structure¶
Before running VTC, organise your files so that all the recordings you want to process live together in one folder. A simple layout:
my_project/
├── audio/ # Your WAV files go here
│ ├── child01_day1.wav
│ ├── child01_day2.wav
│ └── child02_day1.wav
└── output/ # VTC will write results here
├── rttm/
│ ├── child01_day1.rttm
│ └── ...
└── rttm.csv
Place all of your .wav files inside a single folder (e.g., audio/). When you run VTC, you point it at this folder, and VTC processes every .wav file it finds inside. Results are written to a separate folder that VTC creates for you.
If your audio is organised in subfolders
If your recordings are sorted into subfolders (e.g., one subfolder per child), use the --recursive_search flag to tell VTC to look inside subfolders too. See the Command Line Interface Arguments for details.
Audio format requirements¶
VTC expects each recording to be:
- a WAV file (uncompressed audio),
- sampled at 16 kHz (16 000 samples per second), and
- with a single audio channel (mono — not stereo).
You can check a file's format with the ffprobe tool (it comes with ffmpeg):
ffprobe prints a lot of information; the line you want starts with Stream #0:0: Audio: and looks something like this:
What to look for on that line:
pcm_s16le— confirms the file is an uncompressed WAV.16000 Hz— the sample rate. It must be exactly 16 kHz (16000 Hz).monoor1 channels— the file has a single channel. (Depending on yourffprobeversion, this appears either as the wordmonoor as1 channels. If you seestereo,2 channels, or anything higher than 1, the file has multiple channels and needs converting.)
If any of these don't match, convert your file before running VTC. Either of the two methods below will resample to 16 kHz and combine multiple channels into a single mono channel:
Common errors¶
Quick fixes for the problems most people run into. If your issue isn't listed here, see the full Troubleshooting page.
| Problem | Likely cause | Fix |
|---|---|---|
uv: command not found |
uv is not installed | See the uv installation guide |
ffmpeg: command not found |
ffmpeg is not installed | sudo apt install ffmpeg (Linux) or brew install ffmpeg (macOS) |
| Model weights missing | Cloned without --recurse-submodules |
Run git lfs install && git submodule update --init --recursive |
CUDA out of memory |
Batch size too large for your GPU | Add --batch_size 64 (or lower) to your command, or use --device cpu |
No .wav files found |
Wrong folder, or files are in a different format | Check that the path passed to --wavs contains .wav files |