The South African Centre for Digital Language Resources (SADiLaR) is organising the 3rd RAIL workshop in the field of Resources for African Indigenous Languages. This workshop aims to bring together researchers who are interested in showcasing their research and thereby boosting the field of African indigenous languages. This provides an overview of the current state-of-the-art and emphasizes availability of African indigenous language resources, including both data and tools. Additionally, it will allow for information sharing among researchers interested in African indigenous languages and also start discussions on improving the quality and availability of the resources. Many African indigenous languages currently have no or very limited resources available and, additionally, they are often structurally quite different from more well-resourced languages, requiring the development and use of specialized techniques. By bringing together researchers from different fields (e.g., (computational) linguistics, sociolinguistics, language technology) to discuss the development of language resources for African indigenous languages, we hope to boost research in this field.
The RAIL workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages. It aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as tools, specifically designed for or applied to indigenous languages found in Africa.
Suggested topics include the following:
Digital representations of linguistic structures
Descriptions of corpora or other data sets of African indigenous languages
Building resources for (under resourced) African indigenous languages
Developing and using African indigenous languages in the digital age
Effectiveness of digital technologies for the development of African indigenous languages
Revealing unknown or unpublished existing resources for African indigenous languages
Developing desired resources for African indigenous languages
Improving quality, availability and accessibility of African indigenous language resources
This project aims to provide information on the current state of HLT R&D in South Africa. Specifically, to replicate the HLT audit completed in 2009 and to update the information on the various HLT tools, resources and applications identified in the 2009 audit. The tools, resources and applications developed since 2009 will be identified and categorised using a more updated version of the technology matrix previously employed.
Project type: <To be completed> Project Start date: <To be completed> Project Status: Completed [Finalising]
The African Wordnet (AWN) and Linguistic Terminology project is two-pronged and concerns the development of language resources in the form of wordnets for a variety of African languages as well as the development of linguistics terminology for all official African languages. In the initial project, development of the AWN was limited to 7 of the indigenous South African languages. With this expansion, we plan to add the remaining 2 languages so that all 9 indigenous South African languages are represented in the African Wordnet. In doing so, we will ensure that further development for all languages is stimulated and the platform for further wordnet development in any of the languages is created.
The extended project comprises two work packages: Work Package 1 deals with expanding the scope of the existing African Wordnet including the usage of the AWN data for language learning while Work Package 2 involves a wrap up workshop for the Open Educational Resource Term Bank (OERTB) with newly extracted linguistic terminology.
Training workshop (technical, linguistic, lexicographic and corpus extraction) and meetings with partners. In the workshop, all experts (linguists, lexicographers, computer scientists) will develop a joint and general approach for the usage of AWN and other dictionary data for language learning. Open technical problems like data conversion and regular updates will be discussed;
Development of 2 000 new synsets across 2 languages (including basic synsets, usage examples and definitions, plus quality assurance) by language experts;
Usage of the African Wordnet data (AWN) for language learning:
3.1.1 Existing language data, especially dictionary data for the South African languages and AWN data will be made available online (both as a website and mobile app). The following challenges will be addressed:
(1) Defining the requirements from a meta-lexicographical point of view and designing the front end suitable for different dictionary user groups.
(2) Design of the data structure for the dictionary data according to (1), using a database system. The AWN data will be included and updated regularly.
(3) Programming of the website and mobile app, presenting the data according to the metalexicographical guidelines in (1) using the data as described in (2)
One (1) dissemination workshop to showcase the expanded OERTB and meetings with partners;
Press release on the expansion of the OERTB for publication in Unisa's internal newsletter and for distribution to SADilaR stakeholders
The African Wordnet (AWN) and Linguistic Terminology project is two-pronged and concerns the development of language resources in the form of word nets for a variety of African languages as well as the development of linguistics terminology for all official African languages. The project comprises of two work packages: Work Package 1 deals with expanding the scope of the existing African Wordnet while Work Package 2 involves the expansion of the Open Educational Resource Term Bank (OERTB) with newly extracted linguistic terminology.
Link with a dedicated SADiLaR server to host the AWN
One article per annum in a peer reviewed journal or peer-reviewed conference proceedings.
Training workshops and meetings with partners (including at least 1 international guest who will be funded by SADilaR)
Digitisation of outdated study guides (in collaboration with the University of Pretoria (UP)
Term extraction of at least 500 terms for Sesotho sa Leboa and 500 terms for isiZulu (in collaboration with UP) from outdated study guides
Quality assurance on 500 extracted terms for Sesotho sa Leboa and 500 extracted terms for isiZulu
Development of 500 new term definitions in English by subject experts
Standardisation of terms in Sesotho sa Leboa and isiZulu
One (1) Training workshop and meetings with partners
Development of 250 new terms each for Setswana; Sesotho; isiXhosa; isiNdebele; Siswati; Xitsonga and Tshivenga by subject experts
11. Quality assurance on 250 extracted terms each for Setswana; Sesotho; isiXhosa; isiNdebele; Siswati; Xitsonga and Tshivenga