Displaying items by tag: research
Voyant bjalo ka senolofatši sa phetleko ya dingwalo
Author: Dimakatso Mathe (SADiLaR Sesotho sa Leboa researcher)
English translation for this blog at the bottom.
Lenaneo la go dira diphatišišo le akaretša go tsitsinkela sengwalo ka leihlo la ntšhotšhonono ka maikemišetšo a go utulla se monyakišiši a ratago go se nepiša. Le ge go le bjalo, go na le tšeo leihlo la nama le ka šitwago go di lemoga ge sengwalo seo e le se setelele. Go fa mohlala, modiro wa go bala gore leina la moanegwa yo a itšego le tšwelela gakae sengwalong sa padi, e ka ba se sengwe seo leihlo le ka šitwago go se phethagatša. Ka mahlatse, sedirišwa sa go swana le Voyant, se ka phethagatša modiro wo ka ponyo ya leihlo le go go fa dipoelo tša go kgodiša tšeo di nepagetšego. Voyant ke sedirišwa sa inthanete seo se tsebjago kudu morerong wa dithuto tša botho tša ditšitale (Digital Humanities). Moreromogolo wa sedirišwa se ke go nolofatša modiro wa go fetleka tshedimošo, ka maikemišetšo a go fihlelela goba go tiišetša kgopolo yeo e itšego ya phatišišo. Sona se thuša go bea pepeneneng diteng tša tshedimošo ya sengwalo tšeo di ka gogago šedi ya monyakišiši ntle le go badišiša sengwalo go tloga mathomong go fihlela mafelelong. Ntlha yona ye o fetšago go e bala, re tla boela go yona mafelelong a sengwalo se gore re e otlolle.
Seswantšho se se tšwelelago ka fase, se swailwe ka ditlhaka tša go tloga go A go fihlela go E, gomme tšona di laetša tše dingwe tša dikarolo tša Voyant tše di ka dirišwago go fetleka tshedimošo. Tšona ke cirrus (A), reader (B), trends (C), summary (D) le contexts (E). Therišano ya rena e tla ithekga ka tšona dikarolo tše di boletšwego gomme ra tsopola ka boripana mešomo goba mehola ya tšona. Diteng tše di lego ka seswantšhong se sa Voyant di tšerwe kanegelong yeo e phatlaladitšwego ke Nal’ibali inthaneteng . Yona e tsentšhitšwe ka go sedirišwa sa Voyant ka mokgwa wa go ngwatha diteng tša sengwalo gomme tša pharwa ka go Voyant, ke gore “cut & paste”.
Seswantšho sa Voyant
Mohola wa karolo ya cirrus yeo e swailwego ka tlhaka ya A seswantšhong, ke go hlagiša goba gona go bonagatša mantšu ao a tšwelelago gantši go feta a mangwe sengwalong. Mantšu ao a tšweleditšwe ka mebala ya go fapana ebile re lemoga gore ke a magolo kudu ge a bapetšwa le mantšu a mangwe ao a hlagišitšwego. Ge o batametša ntlhanakhomphuthara (mouse pointer) kgauswi le mantšu ao a hwetšagalago ka go cirrus, o tla kgona go bona gore lentšu leo le tšwelela gakae swengwalong seo. Gona fao, re lemoga gore lentšu le “Temo” ke le lengwe la mantšu ao a tšwelelelago gantši sengwalong. Se ga se makatše ka ge moanegwathwadi wa kanegelo e le Temo, gomme ditiragalo tša kanegelo di dikuloga godimo ga gagwe. Seo ke sona se hlolago gore a fele a tšwelela kgafetša kanegelong ka ge e le moanegwathwadi.
Ge re gatela pele, karolo ya reader, yeo e swailwego ka tlhaka ya B seswantšhong sa rena, yona mohola wa yona ke go kgontšha monyakišiši (goba modiriši wa Voyant) go bala sengwalo go ya le ka mokgwa wo se tšwelelago ka gona. Ka mantšu a mangwe, diteng tša kanegelo di alwa go tloga mathomong go fihla mafelelong go ya le ka mokgwa wo di tšwelelago ka gona sengwalong. Go swana le karolong ya cirrus, ge o batametša ntlhanakhomphuthara godimo ga lentšu le lengwe le le lengwe, go tšwelela tshedimošo ya go laetša gore lentšu leo le tšwelela gakae sengwalong.
Karolo ya trends, ye e laeditšwego ka tlhaka ya C, ke kerafo ye e laetšago bontši bja mantšu go ya le ka moo a tšwelelago dikarolong tša sengwalo. Mebala ya methalokerafo e nyalelana le ya mantšu ao a tšwelelago ka go cirrus. Le ge a mangwe a mantšu ao a tšwelelago go feta a mangwe a ka tšweletšwa ka mebala ya go swana, bjale ka ge re bona mantšu a “a” le “ba” a tšweletšwa ka talalerata seswantšhong, ge o batametša ntlhanakhomphuthara godimo ga mothalokerafo, o tla hwetša tshedimošo ka botlalo mabapi le lentšu le mothalokerafo o le emetšego. Go feta fao, go na le mapokisana ka godimo ga methalokerafo ao a bontšhago gore mebala ya methalokerafo e emetše mantšu afe.
Karolo ya D, yeo e lego summary, yona ke kakaretšo ya tshedimošo ya sengwalo ka ge e itlhaloša. E laetša palomoka ya dingwalwa tše di fetlekwago (ke se setee mo lebakeng le), palomoka ya mantšu ao a hwetšwago sengwalong (1 092), nako yeo tshedimošo e tsentšhitšwego ka go sedirišwa sa Voyant (e hlamilwe gona bjale), gammogo le palomoka ya mantšu ao a hlamago mafoko (bontši bja mafoko a sengwalo se a bopša ke mantšu a 13). Ka go le lengwe, contexts ye e swailwego ka tlhaka ya E le yona e laetša lentšu le le tšwelelago ka bontši. Ga go felele fao, se se ikgethilego ka yona ke gore e laetša sekafoko se se tšwelelago ka go la nngele le sa ka go la go ja ga lentšu leo. Ka go realo, e laetša tšhomišo ya lentšu leo ka go utulla dikafoko tše di panago mmogo le lona lentšu leo.
Go na le dikarolo tše dingwe tša Voyant tšeo re sego ra bolela ka tšona sengwalong se, ebile o hlohleletšwa go ikgwathela tšona maitekelong a gago a go ithuta le go šomiša Voyant. Ke tše dintše tše Voyant e ka di dirago, eupša go na le tšeo e ka se go direlego tšona. Se se re bušetša ntlheng ye e kgwathilwego matsenong mabapi le bokgoni bja Voyant bja go utullela monyakišiši tshedimošo ya go tanya šedi ntle le go badišiša sengwalo go tloga mathomong go fihla mafelelong. Voyant e kgona go bala palo ya mantšu sengwallong eupša seo e ka se kego ya se dira ke go go utullela moko goba morero wa sengwalo. Se se laetša gabotse gore go tloga go le bohlokwa gore o ipalele sengwalo le go kwešiša diteng tša sona gore o kgone go hlatholla tswalano magareng ga mantšu ao a tšwelelago kudu sengwalong bjale ka ge go tšweleditšwe seswantšhong. Ka mantšu a mangwe, Voyant e ka se go direle phatišišo eupša go molaleng gore e tla go nolofaletša morero wa go dira phatišišo. Ge o na le kgahlego ya go ithuta kutšwana ka ga Voyant, o se ke wa diega go ikgokaganya le SADiLaR gore o dire kgopelo ya go fiwa tlhahlo ntle le tefo. Gore o hwetše tshedimošo ka botlalo mabapi le go dira kgopelo ya tlhahlo ya Voyant go tšwa go SADiLaR, kgotla mo.
 Gore o fihlelele sedirišwa sa Voyant, kgotla mo https://voyant-tools.org/
 O ka fihlelela sengwalo se go bolelwago ka sona go https://nalibali.org/story-library/multilingual-stories/temo-le-mahodu-dimela
 Lemoga gore mo sengwalong se, lereo le mantšu/lentšu le šomišwa ntle le go šetša gore ke karolo efe ya sehlophantšu goba mohuta ofe wa sekantšu.
Voyant as enabler tool for text analysis
Research process involves scrutinising texts with the purpose of achieving a particular aim. However, there are certain tasks which might not be easily achieved manually if the text is too long. As an example, counting how many times the name of a particular character appears in the novel can be a daunting task. Fortunately, Voyant can perform this task in the blink of an eye and provide convincing, credible and accurate results. Voyant is an internet application that is well known in Digital Humanities. This application is mainly used for text analysis, with the aim of achieving or strengthening a research idea. The application helps to visualise texts which might be of interest to the researcher without having to go through the entire document. This point will be elucidated in the conclusion of this discussion.
The illustration below is labelled A - E and shows some of Voyant’s features that can be used to analyse text, i.e. cirrus (A), reader (B), trends (C), summary (D) and contexts (E). Our discussion will be based on these features and we will briefly explain their functions. The contents in the Voyant illustration were taken from a Nal’ibali source published on the internet. It was inserted in the Voyant application by means of cutting the selected texts of the document and pasting them into Voyant.
The function of the cirrus feature, marked A on the illustration, is to visualise words that appear more frequently than others in the document. The words have been highlighted in different colours and the observation is that they are even bigger when compared to other words presented above. When you drag the mouse pointer closer to the words that are found in the cirrus, you can see how many times a word appears in the document. Looking at the illustration, it is clear that the word “Temo” is one of the words that appears more frequently in the document. This is not surprising as the main character of the story is Temo, and the events in the story revolve around her.
The function of the reader feature, marked B on our illustration, is to enable the researcher (or the Voyant user) to read the document as it appears in its original form. In other words, the contents of the story are outlined according to how they appear from beginning to end in the document. If you move the mouse pointer over a word, additional information appears that shows how many times the word appears in the document.
The trends feature, marked C, is a graph that shows the distribution of word frequency, i.e. how often words appear in the sections of the document. The colours of the graph lines are associated with words that appear in the cirrus. Although some of the frequent words are presented in the same colour, as we can see with words such as “o” and “ba” appearing in light blue on the illustration, if you drag the mouse pointer over the graph line, you find further details regarding the word that is represented by the graph line. Moreover, there are boxes above the graph lines showing which words are represented by different colours of the graph lines.
Feature D, which is summary, is an overview of the information in the document. It shows the total number of documents that are being analysed (currently there is one), the total number of words found in the document (1 092), when the information was uploaded in the Voyant tool (has just been created), and the average number of words used to create sentences (most of the sentences in the document are formed by 13 words). On the other hand, the contexts, marked E, also shows the most frequent words. What stands out about it is that it shows the phrases on either side of that word. Therefore, it shows the context of the word by including phrases which are in combination with that word.
There are other Voyant features which were not mentioned in this discussion, therefore you are encouraged to click on them yourselves as you explore and learn how to use Voyant. There is a lot that Voyant can do, but it also has its limitations. This takes us back to the point we touched on in the introduction regarding Voyant’s efficiency to visualise text which might be of interest to the researcher without having to re-read the document from beginning to end. Voyant can count the number of words in a document, but it cannot reveal the main purpose or theme of a text. This clearly shows that it is very important that you should read the text and understand the content so that you may be able to describe the relation between words appearing frequently in the text. In other words, Voyant cannot do research on your behalf, but it can facilitate the process of doing research. If you are interested to learn more about Voyant, do not hesitate to contact SADiLaR. For more information regarding training on Voyant from SADiLaR, click here.
 To access the Voyant application, click here https://voyant-tools.org/
 You may access the aforementioned document on https://nalibali.org/story-library/multilingual-stories/temo-le-mahodu-dimela
 Note that in this text, the term word/words has been used without taking into account the parts of speech or morphemes of the forms mentioned.
SADiLaR Team: Tshivenda Researcher
SADiLaR’s Tshivenda researcher, Mr Phathutshedzo Maxwell Ramukhadi, specialises in literature and making use of digital tools. He finds the field of literature particularly interesting and continued by saying:
“I want to familiarise myself with the field of Human Language Technology”
In the current Covid-19 situation and national lockdown Mr Ramukhadi is working on two articles named:
- How to develop a Tshivenda digital literary corpus
- The portrayal of children Character in Tshivenda play
- Analysis of Tshivenda lemmatization tool
He says that he is planning to finish the two articles that he is currently working on, within the next few months, to submit them for review.
“I also want to start working on my PHD proposal”.
Mr Ramukhadi argues that digital humanities makes a greater impact in the African context because African languages has been under development for so long. In conclusion he says:
“Digital Humanities will contribute a lot to make sure that our languages are being treated and have the same standard as the European languages”.
The CODATA-RDA Research Data summer: First of its kind
During the month of January, the isiXhosa researcher from the South African Centre for Digital Language Resources (SADiLaR), the Siswati researcher and a programmer were fortune enough to be selected to attended a summer school in Pretoria, which was organised by the University of Pretoria’s Department of Information Science together with the Data-Intensive Research Initiative of South Africa (DIRISA), SADiLaR and Network of Data and Information Curation Communities (NeDICC). The CODATA-RDA Research Data summer school ran from 13 – 24 January 2020.
It was such a privilege to be part of the group of Africans to attend the first summer school presented in South Africa which provided a group of early career researchers with the essential data science skills which include technical skills and responsible research practices, to enable them to work with data in an effective and efficient manner required by the fast paced 21st century.
SADiLaR Team: Xitsonga Researcher
Author: Mieke Hofmeyr
At SADiLaR, Mr. Respect Mlambo is the Xitsonga researcher. Respect specialises in lexicography, translation and terminology.
Currently he is keeping himself busy with the writing of papers and blogs for SADiLaR in his various fields of research. While the current Covid-19 situation and national lockdown is impacting the whole of South Africa and its workforce Mr. Mlambo plans to keep on writing and doing research projects in his field.
Upon asking Mr. Mlambo about his field of interest, in terms of research, he kept it straight forward and answered – “Lexicography”.
Mr. Respect Mlambo concluded by sharing his thoughts on the contribution that Digital Humanities can make within the African context by saying
“Digital Humanities will improve the functionality of African languages in various modern fields”.
SADiLaR Team: Setswana Researcher
Mrs. Valencia Wagner is SADiLaR’s Setswana language researcher. She finds sociolinguistics, phonetics, phonology and digital humanities particularly interesting within the language of Setswana.
Currently she is working on the writing of articles and organising virtual workshops. Other projects that is currently consuming Mrs. Wagner’s time is focusing- and working on her PhD studies. She is also planning to write more articles within the next few months. Furthermore Valencia will be working on two projects:
- The Setswana and IsiXhosa grammar portal
- The speech data collection project
Upon asking her what she thinks about the contribution that Digital Humanities can make in the African context she had the following to say:
“Africa still has a limited exposure to various digital tools, resources & methodologies, therefore, many African researchers and scholars still rely a lot on traditional methods of conducting research. Digital humanities can transform traditional humanities by creating and integrating digital technologies into African research. Research that would have taken years to complete manually, will be much easier to undertake with the use of digital tools.”
She concluded by saying that these digital platforms could also help us to share information and grow our indigenous communities.
SADiLaR Team: Afrikaans Researcher
Author: Mieke Hofmeyr
Benito Trollip is the SADiLaR reseacher in the field of the Afrikaans language. When it comes to research he is especially interested in the ways in which meaning is constructed in language.
“I tend to focus on compounds and other word-forming processes and the way people choose to combine form and meaning. There are endless possibilities when it comes to constructing meaning and language is a literal house of abundance when it comes to these possibilities,” says Mr Trollip.
He is also interested in legal aspects of research with regards to intellectual property rights, ownership and the distribution of data. As a SADiLaR researcher, Mr Trollip is always busy discovering new ways to bring language and the digital age together. At the current moment he is finalising a dataset and article on denominal adjectives in Afrikaans, of which eend-agtig 'duck-like' is an example. He has also worked with a graphic designer colleague of his on a short video on intensified adjectives in Afrikaans, of which hond-warm literally 'dog hot > piping hot' is one.
SADiLaR Team: Sesotho Researcher
Author: Mieke Hofmeyr
Mmasibidi Setaka is the proud Sesotho researcher at SADiLaR. She is passionate about her research into the digital sphere of this language as well as research area of lexicography.
As a researcher at SADiLaR, Ms Setaka is always working on a variety of different projects, but she is currently focusing her research on picture dictionaries. She also has a few projects that she is planning to work on in the future which includes writing articles about the different aspects of picture dictionaries, checking different methodologies and working out strategies to collect her research results, without making physical contact with the children whom the dictionaries are aimed for.
Since one of the main focuses of SADiLaR is to do research into languages within the context of Digital Humanities, this is also a big part of Ms Setaka’s research. When it comes to Digital Humanities, she states the following: “Digital Humanities brings different fields together thereby making collaboration for African scholars possible.”
SADiLaR Team: isiZulu Researcher
Author: Mieke Hofmeyr
Ms. Rooweither Mabuya is SADiLaR’s IsiZulu language researcher. She finds computational linguistics interesting. More particularly she is interested in the development of resources for African languages.
Whilst the current Covid-19 situation and national lockdown is impacting the whole of South Africa and its workforce, Ms. Mabuya is staying focused by working on various projects.
SADiLaR Team: isiNdebele Reseacher
Author: Mieke Visser
Ms. Nomsa Skosana is the isiNdebele researcher at SADiLaR. She finds the research fields of terminology development and lexicography very interesting, but specializes in translation (most of her research papers are based on translation).
She submitted a paper for Euralex 2021 with an abstract accepted for poster presentation (which was postponed due to COVID-19) and is finishing up a paper for ALASA 2020. During the period of national lockdown Ms. Skosana also started on a new paper, based on Autshumato Machine Translation.
SADiLaR Team: isiXhosa Researcher
Author: Mieke Hofmeyr
Ms Andiswa Bukula is the isiXhosa Researcher for SADiLaR. As a researcher, she is passionate about her field of research and works towards a better future for the isiXhosa language in a digital learning space.
She is currently working on two articles. One with Dr. Roald Eiselen on the use of Named Entity Recognizers in the isiXhosa. The second article which she will be submitting for the writing retreat in June, is with a colleague from UNISA, Mlamli Diko, which will be looking at the representation of female characters in the isiXhosa drama book, Indlala inamanyala.