Skip to content

LAAC-LSCP Models

VTC2.0
VTC2.0
- none
BabyHuBERT
BabyHuBERT
- Home
- Ethics
  Ethics
  - One pager
  - none
  - License
- Advanced
  Advanced
  - Representation Extraction
Addressee
Addressee
- none

Representation Extraction¶

Fine-tuning VTC on your own annotated data can improve results for specific recording environments or populations. This requires annotated audio (human-labeled RTTM files), a GPU with 16+ GB memory, and familiarity with deep learning training pipelines.

The fine-tuning code is at arxaqapi/vtc-finetune. The general workflow is:

Prepare annotated data (audio + RTTM) split into train/validation/test
Configure training starting from the pre-trained VTC 2.0 checkpoint
Train and evaluate against the base model to confirm improvement