The South African Centre for Digital Language Resources (SADiLaR) is a new research infrastructure set up by the Department of Science and Innovation (DSI) as part of the new South African Research Infrastructure Roadmap (SARIR). This centre focuses on the digitisation of language resources, and the high-level training of scholars in methodological aspects related to the use of digital language resources in research and development endeavours.
In order to achieve these aims, SADiLaR runs:
a digitisation program, which entails the systematic acquisition, creation, enhancement, management, and distribution of digital text, speech, and multi-modal resources related to all the official languages of South Africa. Furthermore, SADiLaR also supports the development of natural language processing resources, both software and data, for research and development purposes; and
a Digital Humanities programme, which facilitates research capacity-building by promoting and supporting the use of digital data and innovative methodological approaches within the Humanities and Social Sciences (see http://www.digitalhumanities.org.za).
Aims of the call
As part of SADiLaR's enabling function, this call aims to:
support the development of digital language resources for South African languages in all modalities, including, but not limited to the:
digitisation of text and speech data;
creation of mono- and multi-lingual corpora;
creation of annotated corpora;
creation of digital dictionaries and terminology; and
collection of digital language data.
promote research activities and outputs within the domains of Digital Humanities or language technology;
promote research in language topics within the methodological commons of Digital Humanities, which may include, but is not limited to:
humanities and social science research enabled through digital media, artificial intelligence or machine learning, software studies, or information design and modelling;
social, institutional, global, multilingual, and multicultural aspects of Digital Humanities;
computer applications in literary, linguistic, cultural, and historical studies, including public humanities and interdisciplinary aspects of modern scholarship; and
quantitative stylistics and philology.
further capacity building related initiatives, such as workshops as well as summer and winter schools.
All projects must adhere to the following criteria:
digital language resources developed with funding from the project must be made available for distribution by SADiLaR under the most open license possible;
all digital language resources should have explicit copyright clearance, where applicable, that allows for further use and distribution by the research community;
all digital resources must be developed in compliance with the relevant international standards and should be stated in the proposal;
an evaluation plan to validate the quality of the project deliverables must be formulated;
proposals must state explicit project deliverables, including digital resource(s), research outputs, training opportunities, and capacity building;
proof of ethical clearance must be provided along with an associated ethics clearance number from the hosted institution for the collection and distribution of data if the project proposal is approved; and
successful projects will need to enter into a formal contract with SADiLaR outlining the specific deliverables, reporting, and evaluation of project deliverables.
Who may apply
This call is open to researchers employed at any official higher institute of learning as listed on http://www.dhet.gov.za/ whose research interest is aligned with the aims of the call and ethical process.
This call is open to working researchers residing in South Africa and who are affiliated with a recognised higher education or research institution such as a university, university of technology, or science council. International researchers are eligible to be part of the project team, but cannot be the principal applicant.
What can be applied for
Projects can apply for motivated funding of between R50,000 and R200,000 per project, and all project deliverables must be submitted before December 2021.
The funding can be used to cover any of the following activities, where the activities are specifically related to the deliverables specified in the project proposal:
research-related costs, which include any activities, including research equipment, consumables, accessories, research-related trips, field work;
short-term mobility or travel expenses;
placements for young, post-graduate students, including Honours, Masters, Doctoral or Post-doctoral level researchers. These placements must provide the candidates with an opportunity to access facilities, learn new skills, or build research relationships that would not otherwise be available; and
knowledge sharing activities where national or international experts participate in colloquia, seminars, symposia, lectures, etc., that further the dissemination of knowledge in the fields of Digital Language Resource Development, Digital Humanities, and/or Natural Language Processing.
In addition to funding the development of data through this open call, SADiLaR also hosts data previously developed by external parties. These data providers are typically researchers that have access to data that they want to make available to the larger research communities, private individuals working on languages, as well as companies that develop innovative software in the field of human language technologies.
If you have access to data that you feel will benefit the community, please consider sharing this data with us. We can make data available under a wide variety of licences, including research only, open source or commercial use. All data will be securely stored in our repository and issued with a persistent identifier, making referencing and sharing easier, while increasing exposure. All resources are also listed on SADiLaR Repository and stored to ensure long-term access to the resources.
Limited funding is available to convert existing resources to the latest standards and formats, to expand on existing datasets, and, in exceptional circumstances, to purchase resources. Please contact email@example.com to discuss these possibilities.
Duration of projects
Projects of various lengths can be supported. Preferred duration of one to two years, with conclusion of the project by December 2022.
There are no policies or regulatory regimes, apart from ethical considerations and IP rights that govern the acquisition of language resources, whether it is a text, speech, or multimodal resource. There are, however, a number of best practices that are followed by researchers and/or developers, which are normally open to public and academic scrutiny and will be implemented by the centre. SADiLaR commits itself to act in accordance with the directives of the National Intellectual Property Management Office (NIPMO).
How to apply
Kindly complete the following pre-proposal template, for initial consideration and feedback.
We aim to provide feedback on the pre-proposal within 25 business days.
Approval of the pre-proposal does not imply successful final application.
On acceptance of the pre-proposal, a detailed project proposal template will be provided for completion.
All project proposals will be submitted to various nationally and internationally recognised experts in the specific fields addressed by the proposal. Projects will be evaluated according to the following criteria:
adherence to the requirements set out above;
competence and expertise of the researchers involved in the project;
capacity development of young researchers;
value for money; and
decisions are normally communicated within 10 weeks after submission.
SADiLaR reserves the right to, within reason, request changes to shortlisted proposals to better align with its strategy, medium-term expenditure framework, etc. Such requests may be directed at, amongst others, specifications, deliverables, budgets, and evaluation criteria.