Forensic Automatic Speaker Recognition : Analyzing codecs for calibration and their impact on system performance
Njegovec, Vendela (2025)
In this study the impact of audio codecs on the calibration performance of forensic automatic speaker recognition is analyzed, addressing challenges posed by mismatched conditions. Using the NFI-FRIDA (Netherlands Forensic Institute - Forensically Realistic Inter-Device Audio) database, a collection of speech recordings captured simultaneously with multiple recording devices relevant to forensic analysis, high quality audio samples are processed through various codecs to simulate real telephone speech and compared to actual telephone intercepts. The study uses an x-vector based automatic speaker recognition system, VOCALISE (Voice Comparison and Analysis of the Likelihood of Speech Evidence) for all experiments and system performance is measured in terms of calibration loss and cost of log likelihood ratio. The study reveals a significant performance loss due to codec mismatches and emphasizes the complexity of simulating telephone speech and replicating real world telephony conditions. Additionally, the study highlight the potential of cross-processing datasets with mismatched codecs to lower the calibration loss.
njegovec_MA_EEMCS.pdf