The South African Centre for Digital Language Resources, which is a national centre supported by the Department of Science Innovation (DSI) as part of the New South Africa Research Infrastructure Roadmap, collaborates with UP (the University of Pretoria) as one of the nodes of SADiLaR. UP hosts the UP Digitisation node of SADiLaR. This node is housed within the Department of African Languages in the Faculty of Humanities.
The main function of the UP digitisation node is to create language resources for the African languages by digitising different kinds of language material. The digitised output will be made available on the SADiLaR platform to be utilised by researchers and developers of Human Language Technologies. This platform is an open resource and the data which is hosted on this platform will be freely available.
The Department of African languages is fortunate to have access to a well-stocked resource centre, which has been built up over many years and which contains valuable language data in the form of books, audio and video material for the African Languages. By making these data available in digital, that is machine-readable format, the data are not only preserved, but become an invaluable tool in the creation of digital resources for especially the African languages.
Our digitisation activities are centred on three types of data, which are textual material, audio and audio-visual data.
Converting text data into digitised format is the main focus of the digitisation node. It is a well-known fact that many African language books are out of print and no longer commercially available. With the necessary permission and copyright clearance from publishers, such books are converted to digital format by means of Optical Character Recognition (OCR) scanning.
Other textual material which is currently being digitised includes back copies of popular magazines, a collection of dictionary index cards bequeathed to the department by the late N J van Warmelo and copies of MA dissertations and PhD theses written in the African languages.
Audio material that is digitised is mostly data on audio cassettes, many of which have been retrieved from the archives of the first language laboratory at UP. Other valuable material includes audio notes made by linguists specialising in African languages doing field work, dating back to the early sixties.
Among the valuable material that UP has in its collection, are priceless video recordings of interviews with Northern Sotho authors. Video recordings of lectures on linguistics and literature in isiZulu, Setswana and Sesotho sa Leboa also form part of the collection.
In order to ensure that the African languages are able to take up their rightful place in a digital, developing digitised language resources is essential.