Basic assistance based on the artificial intelligence chip can be found everywhere these days. For us Pakistanis, The biggest obstacle for the use of these systems in our daily lives is always the language barrier.
Most attendees are programmed to recognize voice in English, and even today there is a shortage of programs that can recognize and translate a word into Urdu. It may be about to change soon, thanks to a group of Pakistani scientists at International Telecommunication Union (ITU’) Center for Speech and Language Technologies (CSaLT) laboratory.
For all languages can be identified to a computer, there must be a corpus of words, and most basic components of the language. The corpus is a database of all the different basic voices that is used in common speech in a particular language.
Called ” CSaLT Rich Urdu Speech Corpus”, “It exist of a 70 minutes books including 708 phrases that cover 63 possible phonemes. In total it consists of 5656 the only word and is available for download at the research center’s website.
Dr. Ali Raza Agha, an assistant professor at Information Technology Lahore University, and a PhD in Language Information Technology, said.
“Speech recognition is a two-step process. The corpus will give the computer application access to all possible phonemes used in the formation of meaningful Urdu words from everyday speech,”
He further elaborates that although the existence of 63 different phonemes in Urdu, and these are not compatible with 63 clean sounds in everyday speech.
He also explained that a sound of the word can vary from one to another depending on the voice used before and after a word. Therefore, any Q audio, there will be 63*x*63 possible (tri-phoneme) sounds. Covers the body you drop possible these voices. The corpus he is using the shells covers for all these possible sounds.
Work on this corpus Dr.Raza began under the supervision of Dr. Sarmad Hussain in National University of Computer and Emerging Sciences FAST, Lahore. Thanks for this body, and the process of creating the application language Urdu speech recognition has become much easier.
“The technique used in development of this corpus will work for any language for which written material is available.”