The South African Centre for Digital Language Resources (SADiLaR) is a national centre supported by the Department of Science and Technology (DST) as part of the new South African Research Infrastructure Roadmap (SARIR).
“SARIR is a high-level strategic and systemic intervention to provide research infrastructure across the entire public research system, building on existing capabilities and strengths, and drawing on future needs.” (DST SARIR brochure).
SADiLaR has an enabling function, with a focus on all official languages of South Africa, supporting research and development in the domains of language technologies and language-related studies in the humanities and social sciences. The Centre supports the creation, management and distribution of digital language resources, as well as applicable software, which are freely available for research purposes through the Language Resource Catalogue.
SADiLaR clients include academic scholars and professionals in all domains of Humanities and Social Sciences, Language Technologies, Natural Language Processing, Computer Science, as well as potential end-users in education, business and industry.
SADiLaR is a multi-partner entity with the North-West University (NWU) functioning as host as well as hub of a network of linked nodes, comprising:
University of Pretoria (Department of African Languages);
University of South Africa (Department of African Languages);
Meraka Institute at the CSIR (HLT Research Group);
North-West University (Centre for Text Technology); and
Inter-Institutional Centre for Language Development and Assessment (ICELDA).
It is foreseen that the number of nodes will increase in future as new functional nodes develop.
SADiLaR runs two programmes:
A digitisation programme, which entails the systematic creation of relevant digital text, speech and multi-modal resources related to all official languages of South Africa. The development of appropriate natural language processing software tools for research and development purposes are included as part of the digitisation programme.
A Digital Humanities programme, which facilitates the building of research capacity by promoting and supporting the use of digital data and innovative methodological approaches within the Humanities and Social Sciences. (See http://www.digitalhumanities.org.za)
SADiLaR’s enabling function impacts at least three domains:
Language technology domain High-level resources and natural language processing tools are developed for use in applications such as, machine translation engines for local languages; automatic speech recognition systems; text-to-speech systems; speech-to-speech translation systems; interactive communication systems; as well as a variety of text-related applications such as grammar and spelling checkers, online electronic dictionaries, and so forth.
Humanities and social sciences domain The building of research capacity is facilitated among scholars by promoting the use of digital data, innovative methods, and software tools that enhance research activities, and enable scholars to ask and pursue previously-unanswerable questions within their respective disciplines.
Socio-economic domain Reusable digital language resources are licensed for use in interactive commercial applications in local languages. The nature and use of local African languages are documented as part of a living archive, including documentation of cultural heritage practices within different language communities.