Version History¶

A short history of VTC versions, the differences between them, and what changes if you're migrating from an older version. Also a comparison with the LENA system, which is often used in the same research field.

VTC versions¶

Each version is a re-trained model; the most recent ones use a stronger underlying speech model and reach noticeably higher accuracy.

	VTC 1.0 (2020)	VTC 1.5 (2025)	VTC 2.0 (2025)	VTC 2.1 (2025)
Underlying speech model	PyanNet	Whisper-based	BabyHuBERT	BabyHuBERT
Average F1 (BabyTrain-2025 validation set)	50.9%	53.6%	64.6%	66.9%
Speaker labels	CHI(KCHI), OCH, MAL, FEM, SPEECH	KCHI, OCH, MAL, FEM	KCHI, OCH, MAL, FEM	KCHI, OCH, MAL, FEM
Python version	3.7+ (conda)	3.13+ (uv)	3.13+ (uv)	3.13+ (uv)
Repository	MarvinLvn/voice-type-classifier	LAAC-LSCP/VTC-IS-25	LAAC-LSCP/VTC	LAAC-LSCP/VTC

VTC 2.x is built on BabyHuBERT, a self-supervised speech model trained specifically on child-centred audio, which is the main reason for the accuracy jump from earlier versions. The largest gains over VTC 1.0 are on the other-children (OCH) and adult-male (MAL) categories — both more than 20 percentage points of F1 better in VTC 2.1 than in VTC 1.0 on BabyTrain-2025. (See the accuracy table on the home page for the per-class numbers.)

What "F1 = 66.9%" means in practice

F1 is a single summary number per category, averaged across the four. A higher number means the model agrees more often with human annotators; it does not translate directly into "right 66.9% of the time" or "wrong 33.1% of the time." For a richer breakdown, see the per-category numbers and discussion on the home page.

Migrating from VTC 1.0¶

What you'll need to change in your scripts and pipelines if you're moving from the original VTC.

Label set has changed. The combined CHI label (which lumped KCHI + OCH together) and the SPEECH label no longer exist. If you depend on either, combine KCHI and OCH in your downstream scripts.
Output formats are unchanged at the format level: VTC 2 still writes RTTM, and now additionally produces a CSV.
Fresh install required. VTC 2 cannot run inside a VTC 1.0 conda environment. Follow the Getting Started page to set up a clean install with uv.

VTC vs. LENA¶

VTC and LENA both label speakers in child-centred recordings, but they are different products with different licensing, hardware, and category schemes. They are not interchangeable.

	VTC 2	LENA
Cost	Free, open-source	Commercial
Hardware	Any Unix machine; runs on any WAV file	Requires a LENA-branded recorder
Speaker classes	KCHI, OCH, MAL, FEM	CHN, CXN, MAN, FAN (plus others)
Transparency	Code and model weights publicly available	Proprietary
Input format	Any WAV audio (16 kHz mono)	LENA `.its` files
Customisable to your data	Yes — via fine-tuning or threshold re-tuning	No

VTC is not a drop-in replacement for LENA. The categories are similar in spirit but not equivalent, and counts produced by the two systems on the same audio will differ. If your study compares to or reuses LENA results, treat any cross-system comparison with care. For detailed accuracy comparisons between the two systems, see the ExELang book.