University of Twente Student Theses


From scan to speech : articulation analysis from real-time vocal tract MRI

Leeuwen, K.G. van (2019) From scan to speech : articulation analysis from real-time vocal tract MRI.

[img] PDF
Abstract:Speech production is a complex process drawing much attention from researchers. When speech is altered due to, for example, cancer in the tongue, lips or palate, it is important to understand how the articulation has changed to provide optimal rehabilitation therapy. In this thesis, we aimed to develop a methodology that enables objective and replicable assessment of speech articulation. Real-time magnetic resonance imaging (rtMRI) was chosen as a means to acquire articulatory information during speech, due to its good soft tissue contrast and non-invasiveness. The USC Speech and Vocal Tract Morphology MRI Database was used throughout this thesis and contains MR data from seventeen healthy American-English speaking subjects. A preliminary study was performed to demonstrate that relevant articulatory information is present in the MRI data. We trained a deep learning network to predict from a single MR image the phoneme that was articulated. The network itself was analyzed and revealed that it had learned similar relations between vowels as is known to phoneticians. During articulation, the vocal tract shape changes through which sound is transformed into speech. To extract quantitative information on the articulation, we segmented the vocal tract from every rtMRI frame in the dataset with the Chan-Vese level set method. We used Bayesian hyperparameter optimization to learn optimal parameters for the level set and image preprocessing. With this method, we showed that all frames could be segmented with the need for a single manual segmentation per subject with a dice score of 95.6% and a mean surface distance of 1.8 mm. From these vocal tract segmentations, we subsequently derived the vocal tract distance function. The centerline of the vocal tract was found and a grid was projected from which the width of the vocal tract was deducted for each frame. We have proposed several ways of visualizing the vocal tract dynamics, such that the location of articulation for different phonemes can be studied and the differences in articulation space could be observed. With this thesis, we developed a methodology to extract the vocal tract distance function of a large rtMRI dataset. Where most studies focus on merely a single time point or a single location within the vocal tract over time, the method proposed here regards both the temporal and spatial dimensions. The enrichment of the dataset is made publicly available to support articulatory studies in healthy subjects to broaden our understanding of articulation in speech. The tool itself has the potential to be used in clinical practice by aiding speech therapist with the assessment of the patient's articulation abilities. By performing pre- and postintervention measurements the effect of treatment can be studied and a personalized rehabilitation plan proposed.
Item Type:Essay (Master)
Netherlands Cancer Institute, Amsterdam, Netherlands
Faculty:TNW: Science and Technology
Subject:17 linguistics and theory of literature, 30 exact sciences in general, 44 medicine, 50 technical science in general, 54 computer science
Programme:Technical Medicine MSc (60033)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page