Abstract
This paper describes the multi-level annotation process of Urdu speech corpus and its quality assessment using PRAAT. The annotation of speech corpus has been done at phoneme, word, syllable and break index levels. Phoneme, word and break index level annotation has been done manually by trained linguists whereas syllable-tier annotation has been done automatically using template matching algorithm. On average the accuracy achieved at phoneme and break-index tiers is 79% and 89% respectively. The quality assessment of word and syllable tiers is still under investigation.

Benazir Mumtaz, Amen Hussain, Sarmad Hussain, Afia Mahmood, Rashida Bhatti, Mahwish Farooq, Sahar Rauf. (2014) Multitier Annotation of Urdu Speech Corpus, Conference on Language and Technology 2014.
  • Viewed 1345
  • Downloads 0
Publisher
Center for Language Engineering
Country
Pakistan
City
Karachi
From
13-11-2014
To
15-11-2014