The Council for Scientific and Industrial Research (CSIR) speech node, situated in Pretoria, is involved in localised language technology development, and focuses on speech technologies such as automatic speech recognition (ASR) and text-to-speech (TTS) and controlled natural language processing (CNLP). The CSIR’s TTS offering, known as Qfrency is the only commercial TTS product catering for all of the South African official languages. The Qfrency TTS suite consists of 17 TTS voices, in male and female genders, in all of the official languages, as well as a male child voice.
The Node’s TTS research focuses on improving the naturalness of their TTS voices with a particular emphasis on tone and prosody in the African languages and building TTS voices using state-of-the-art techniques. The ASR research focuses on semi-supervised harvesting of audio data required to develop speech recognition systems on a par with international offerings, and ASR system development for the local languages using state-of-the-art techniques. On the CNLP side, the Speech Node is following an approach known as grammar-based language modelling, via Grammatical Framework, a state-of-the-art multilingual grammar engineering framework. This approach allows for highly accurate and rich multilingual natural language processing, starting in limited domains and for use-case specific language fragments and working towards increased coverage. This is often useful in speech technology applications, and especially so in domains such as education or healthcare where reliability is essential.
The focus of the Speech Node is on the education domain, where its technologies are well-placed to support early literacy development and accessibility. Capabilities and resources required in this domain include expressive speech synthesis that can automatically adapt to the text of the domain, voice adaptation work in order to create more voices with fewer resources, ASR data sets in the literacy domain, adaptation of speech to text systems for younger users, speech assessment, and high accuracy multilingual natural language generation and understanding. In developing and deploying these technologies, the node’s HCI capability ensures the involvement of clients and end-users from the early design stages to ensure maximum impact.
The end result is technologies like Isinkwe, an app designed to support literacy development by helping learners overcome barriers to reading and learning. The app helps learners read and learn more effectively through an integrated audio-visual experience, pulling together various capabilities of the Speech Node into an integrated real-world solution.