Uwe Reichel (MTA NYTI)

Computational modeling of native language impact on word segmentation




It has been shown in artificial language learning experiments that the choice of perceptual cues for word segmentation is partly dependent of the listener's mother tongue. The present study addresses this native language bias for English and Italian within a Bayesian classifier framework. The two languages differ in how to mark word boundaries and word stress by phoneme lengthening. The classifiers are trained on automatically transcribed written word bigram collections for both languages to predict word boundaries by language-related vowel and consonant lengthening features. The interpolation weights of the classification models shed light on the relative and language-dependent importance of these edge cues in word initial and final position. The models are furthermore evaluated in how well they extract the vocabulary of several artificial language variants. Finally, it will be discussed how this Bayesian framework could be employed in Analogy-based Phonology.