Version History¶
A short history of VTC versions, the differences between them, and what changes if you're migrating from an older version. Also a comparison with the LENA system, which is often used in the same research field.
VTC versions¶
Each version is a re-trained model; the most recent ones use a stronger underlying speech model and reach noticeably higher accuracy.
| VTC 1.0 (2020) | VTC 1.5 (2025) | VTC 2.0 (2025) | VTC 2.1 (2025) | |
|---|---|---|---|---|
| Underlying speech model | PyanNet | Whisper-based | BabyHuBERT | BabyHuBERT |
| Average F1 (BabyTrain-2025 validation set) | 50.9% | 53.6% | 64.6% | 66.9% |
| Speaker labels | CHI(KCHI), OCH, MAL, FEM, SPEECH | KCHI, OCH, MAL, FEM | KCHI, OCH, MAL, FEM | KCHI, OCH, MAL, FEM |
| Python version | 3.7+ (conda) | 3.13+ (uv) | 3.13+ (uv) | 3.13+ (uv) |
| Repository | MarvinLvn/voice-type-classifier | LAAC-LSCP/VTC-IS-25 | LAAC-LSCP/VTC | LAAC-LSCP/VTC |
VTC 2.x is built on BabyHuBERT, a self-supervised speech model trained specifically on child-centred audio, which is the main reason for the accuracy jump from earlier versions. The largest gains over VTC 1.0 are on the other-children (OCH) and adult-male (MAL) categories — both more than 20 percentage points of F1 better in VTC 2.1 than in VTC 1.0 on BabyTrain-2025. (See the accuracy table on the home page for the per-class numbers.)
What "F1 = 66.9%" means in practice
F1 is a single summary number per category, averaged across the four. A higher number means the model agrees more often with human annotators; it does not translate directly into "right 66.9% of the time" or "wrong 33.1% of the time." For a richer breakdown, see the per-category numbers and discussion on the home page.
Migrating from VTC 1.0¶
What you'll need to change in your scripts and pipelines if you're moving from the original VTC.
- Label set has changed. The combined
CHIlabel (which lumped KCHI + OCH together) and theSPEECHlabel no longer exist. If you depend on either, combineKCHIandOCHin your downstream scripts. - Output formats are unchanged at the format level: VTC 2 still writes RTTM, and now additionally produces a CSV.
- Fresh install required. VTC 2 cannot run inside a VTC 1.0 conda environment. Follow the Getting Started page to set up a clean install with
uv.
VTC vs. LENA¶
VTC and LENA both label speakers in child-centred recordings, but they are different products with different licensing, hardware, and category schemes. They are not interchangeable.
| VTC 2 | LENA | |
|---|---|---|
| Cost | Free, open-source | Commercial |
| Hardware | Any Unix machine; runs on any WAV file | Requires a LENA-branded recorder |
| Speaker classes | KCHI, OCH, MAL, FEM | CHN, CXN, MAN, FAN (plus others) |
| Transparency | Code and model weights publicly available | Proprietary |
| Input format | Any WAV audio (16 kHz mono) | LENA .its files |
| Customisable to your data | Yes — via fine-tuning or threshold re-tuning | No |
VTC is not a drop-in replacement for LENA. The categories are similar in spirit but not equivalent, and counts produced by the two systems on the same audio will differ. If your study compares to or reuses LENA results, treat any cross-system comparison with care. For detailed accuracy comparisons between the two systems, see the ExELang book.