Displaying items by tag: digital humanities
Voyant bjalo ka senolofatši sa phetleko ya dingwalo
Author: Dimakatso Mathe (SADiLaR Sesotho sa Leboa researcher)
English translation for this blog at the bottom.
Lenaneo la go dira diphatišišo le akaretša go tsitsinkela sengwalo ka leihlo la ntšhotšhonono ka maikemišetšo a go utulla se monyakišiši a ratago go se nepiša. Le ge go le bjalo, go na le tšeo leihlo la nama le ka šitwago go di lemoga ge sengwalo seo e le se setelele. Go fa mohlala, modiro wa go bala gore leina la moanegwa yo a itšego le tšwelela gakae sengwalong sa padi, e ka ba se sengwe seo leihlo le ka šitwago go se phethagatša. Ka mahlatse, sedirišwa sa go swana le Voyant[1], se ka phethagatša modiro wo ka ponyo ya leihlo le go go fa dipoelo tša go kgodiša tšeo di nepagetšego. Voyant ke sedirišwa sa inthanete seo se tsebjago kudu morerong wa dithuto tša botho tša ditšitale (Digital Humanities). Moreromogolo wa sedirišwa se ke go nolofatša modiro wa go fetleka tshedimošo, ka maikemišetšo a go fihlelela goba go tiišetša kgopolo yeo e itšego ya phatišišo. Sona se thuša go bea pepeneneng diteng tša tshedimošo ya sengwalo tšeo di ka gogago šedi ya monyakišiši ntle le go badišiša sengwalo go tloga mathomong go fihlela mafelelong. Ntlha yona ye o fetšago go e bala, re tla boela go yona mafelelong a sengwalo se gore re e otlolle.
Seswantšho se se tšwelelago ka fase, se swailwe ka ditlhaka tša go tloga go A go fihlela go E, gomme tšona di laetša tše dingwe tša dikarolo tša Voyant tše di ka dirišwago go fetleka tshedimošo. Tšona ke cirrus (A), reader (B), trends (C), summary (D) le contexts (E). Therišano ya rena e tla ithekga ka tšona dikarolo tše di boletšwego gomme ra tsopola ka boripana mešomo goba mehola ya tšona. Diteng tše di lego ka seswantšhong se sa Voyant di tšerwe kanegelong yeo e phatlaladitšwego ke Nal’ibali inthaneteng [2]. Yona e tsentšhitšwe ka go sedirišwa sa Voyant ka mokgwa wa go ngwatha diteng tša sengwalo gomme tša pharwa ka go Voyant, ke gore “cut & paste”.
Seswantšho sa Voyant
Mohola wa karolo ya cirrus yeo e swailwego ka tlhaka ya A seswantšhong, ke go hlagiša goba gona go bonagatša mantšu ao a tšwelelago gantši go feta a mangwe sengwalong. Mantšu ao a tšweleditšwe ka mebala ya go fapana ebile re lemoga gore ke a magolo kudu ge a bapetšwa le mantšu a mangwe ao a hlagišitšwego. Ge o batametša ntlhanakhomphuthara (mouse pointer) kgauswi le mantšu ao a hwetšagalago ka go cirrus, o tla kgona go bona gore lentšu leo le tšwelela gakae swengwalong seo. Gona fao, re lemoga gore lentšu le “Temo” ke le lengwe la mantšu ao a tšwelelelago gantši sengwalong. Se ga se makatše ka ge moanegwathwadi wa kanegelo e le Temo, gomme ditiragalo tša kanegelo di dikuloga godimo ga gagwe. Seo ke sona se hlolago gore a fele a tšwelela kgafetša kanegelong ka ge e le moanegwathwadi.
Ge re gatela pele, karolo ya reader, yeo e swailwego ka tlhaka ya B seswantšhong sa rena, yona mohola wa yona ke go kgontšha monyakišiši (goba modiriši wa Voyant) go bala sengwalo go ya le ka mokgwa wo se tšwelelago ka gona. Ka mantšu a mangwe, diteng tša kanegelo di alwa go tloga mathomong go fihla mafelelong go ya le ka mokgwa wo di tšwelelago ka gona sengwalong. Go swana le karolong ya cirrus, ge o batametša ntlhanakhomphuthara godimo ga lentšu le lengwe le le lengwe, go tšwelela tshedimošo ya go laetša gore lentšu leo le tšwelela gakae sengwalong.
Karolo ya trends, ye e laeditšwego ka tlhaka ya C, ke kerafo ye e laetšago bontši bja mantšu go ya le ka moo a tšwelelago dikarolong tša sengwalo. Mebala ya methalokerafo e nyalelana le ya mantšu ao a tšwelelago ka go cirrus. Le ge a mangwe a mantšu ao a tšwelelago go feta a mangwe a ka tšweletšwa ka mebala ya go swana, bjale ka ge re bona mantšu[3] a “a” le “ba” a tšweletšwa ka talalerata seswantšhong, ge o batametša ntlhanakhomphuthara godimo ga mothalokerafo, o tla hwetša tshedimošo ka botlalo mabapi le lentšu le mothalokerafo o le emetšego. Go feta fao, go na le mapokisana ka godimo ga methalokerafo ao a bontšhago gore mebala ya methalokerafo e emetše mantšu afe.
Karolo ya D, yeo e lego summary, yona ke kakaretšo ya tshedimošo ya sengwalo ka ge e itlhaloša. E laetša palomoka ya dingwalwa tše di fetlekwago (ke se setee mo lebakeng le), palomoka ya mantšu ao a hwetšwago sengwalong (1 092), nako yeo tshedimošo e tsentšhitšwego ka go sedirišwa sa Voyant (e hlamilwe gona bjale), gammogo le palomoka ya mantšu ao a hlamago mafoko (bontši bja mafoko a sengwalo se a bopša ke mantšu a 13). Ka go le lengwe, contexts ye e swailwego ka tlhaka ya E le yona e laetša lentšu le le tšwelelago ka bontši. Ga go felele fao, se se ikgethilego ka yona ke gore e laetša sekafoko se se tšwelelago ka go la nngele le sa ka go la go ja ga lentšu leo. Ka go realo, e laetša tšhomišo ya lentšu leo ka go utulla dikafoko tše di panago mmogo le lona lentšu leo.
Go na le dikarolo tše dingwe tša Voyant tšeo re sego ra bolela ka tšona sengwalong se, ebile o hlohleletšwa go ikgwathela tšona maitekelong a gago a go ithuta le go šomiša Voyant. Ke tše dintše tše Voyant e ka di dirago, eupša go na le tšeo e ka se go direlego tšona. Se se re bušetša ntlheng ye e kgwathilwego matsenong mabapi le bokgoni bja Voyant bja go utullela monyakišiši tshedimošo ya go tanya šedi ntle le go badišiša sengwalo go tloga mathomong go fihla mafelelong. Voyant e kgona go bala palo ya mantšu sengwallong eupša seo e ka se kego ya se dira ke go go utullela moko goba morero wa sengwalo. Se se laetša gabotse gore go tloga go le bohlokwa gore o ipalele sengwalo le go kwešiša diteng tša sona gore o kgone go hlatholla tswalano magareng ga mantšu ao a tšwelelago kudu sengwalong bjale ka ge go tšweleditšwe seswantšhong. Ka mantšu a mangwe, Voyant e ka se go direle phatišišo eupša go molaleng gore e tla go nolofaletša morero wa go dira phatišišo. Ge o na le kgahlego ya go ithuta kutšwana ka ga Voyant, o se ke wa diega go ikgokaganya le SADiLaR gore o dire kgopelo ya go fiwa tlhahlo ntle le tefo. Gore o hwetše tshedimošo ka botlalo mabapi le go dira kgopelo ya tlhahlo ya Voyant go tšwa go SADiLaR, kgotla mo.
[1] Gore o fihlelele sedirišwa sa Voyant, kgotla mo https://voyant-tools.org/
[2] O ka fihlelela sengwalo se go bolelwago ka sona go https://nalibali.org/story-library/multilingual-stories/temo-le-mahodu-dimela
[3] Lemoga gore mo sengwalong se, lereo le mantšu/lentšu le šomišwa ntle le go šetša gore ke karolo efe ya sehlophantšu goba mohuta ofe wa sekantšu.
Voyant as enabler tool for text analysis
Research process involves scrutinising texts with the purpose of achieving a particular aim. However, there are certain tasks which might not be easily achieved manually if the text is too long. As an example, counting how many times the name of a particular character appears in the novel can be a daunting task. Fortunately, Voyant[1] can perform this task in the blink of an eye and provide convincing, credible and accurate results. Voyant is an internet application that is well known in Digital Humanities. This application is mainly used for text analysis, with the aim of achieving or strengthening a research idea. The application helps to visualise texts which might be of interest to the researcher without having to go through the entire document. This point will be elucidated in the conclusion of this discussion.
The illustration below is labelled A - E and shows some of Voyant’s features that can be used to analyse text, i.e. cirrus (A), reader (B), trends (C), summary (D) and contexts (E). Our discussion will be based on these features and we will briefly explain their functions. The contents in the Voyant illustration were taken from a Nal’ibali[2] source published on the internet. It was inserted in the Voyant application by means of cutting the selected texts of the document and pasting them into Voyant.
Voyant illustration
The function of the cirrus feature, marked A on the illustration, is to visualise words that appear more frequently than others in the document. The words have been highlighted in different colours and the observation is that they are even bigger when compared to other words presented above. When you drag the mouse pointer closer to the words that are found in the cirrus, you can see how many times a word appears in the document. Looking at the illustration, it is clear that the word “Temo” is one of the words that appears more frequently in the document. This is not surprising as the main character of the story is Temo, and the events in the story revolve around her.
The function of the reader feature, marked B on our illustration, is to enable the researcher (or the Voyant user) to read the document as it appears in its original form. In other words, the contents of the story are outlined according to how they appear from beginning to end in the document. If you move the mouse pointer over a word, additional information appears that shows how many times the word appears in the document.
The trends feature, marked C, is a graph that shows the distribution of word frequency, i.e. how often words appear in the sections of the document. The colours of the graph lines are associated with words that appear in the cirrus. Although some of the frequent words are presented in the same colour, as we can see with words[3] such as “o” and “ba” appearing in light blue on the illustration, if you drag the mouse pointer over the graph line, you find further details regarding the word that is represented by the graph line. Moreover, there are boxes above the graph lines showing which words are represented by different colours of the graph lines.
Feature D, which is summary, is an overview of the information in the document. It shows the total number of documents that are being analysed (currently there is one), the total number of words found in the document (1 092), when the information was uploaded in the Voyant tool (has just been created), and the average number of words used to create sentences (most of the sentences in the document are formed by 13 words). On the other hand, the contexts, marked E, also shows the most frequent words. What stands out about it is that it shows the phrases on either side of that word. Therefore, it shows the context of the word by including phrases which are in combination with that word.
There are other Voyant features which were not mentioned in this discussion, therefore you are encouraged to click on them yourselves as you explore and learn how to use Voyant. There is a lot that Voyant can do, but it also has its limitations. This takes us back to the point we touched on in the introduction regarding Voyant’s efficiency to visualise text which might be of interest to the researcher without having to re-read the document from beginning to end. Voyant can count the number of words in a document, but it cannot reveal the main purpose or theme of a text. This clearly shows that it is very important that you should read the text and understand the content so that you may be able to describe the relation between words appearing frequently in the text. In other words, Voyant cannot do research on your behalf, but it can facilitate the process of doing research. If you are interested to learn more about Voyant, do not hesitate to contact SADiLaR. For more information regarding training on Voyant from SADiLaR, click here.
[1] To access the Voyant application, click here https://voyant-tools.org/
[2] You may access the aforementioned document on https://nalibali.org/story-library/multilingual-stories/temo-le-mahodu-dimela
[3] Note that in this text, the term word/words has been used without taking into account the parts of speech or morphemes of the forms mentioned.
SADiLaR Team: Tshivenda Researcher
SADiLaR’s Tshivenda researcher, Mr Phathutshedzo Maxwell Ramukhadi, specialises in literature and making use of digital tools. He finds the field of literature particularly interesting and continued by saying:
“I want to familiarise myself with the field of Human Language Technology”
In the current Covid-19 situation and national lockdown Mr Ramukhadi is working on two articles named:
- How to develop a Tshivenda digital literary corpus
- The portrayal of children Character in Tshivenda play
- Analysis of Tshivenda lemmatization tool
He says that he is planning to finish the two articles that he is currently working on, within the next few months, to submit them for review.
“I also want to start working on my PHD proposal”.
Mr Ramukhadi argues that digital humanities makes a greater impact in the African context because African languages has been under development for so long. In conclusion he says:
“Digital Humanities will contribute a lot to make sure that our languages are being treated and have the same standard as the European languages”.
SADiLaR Team: Xitsonga Researcher
Author: Mieke Hofmeyr
At SADiLaR, Mr. Respect Mlambo is the Xitsonga researcher. Respect specialises in lexicography, translation and terminology.
Currently he is keeping himself busy with the writing of papers and blogs for SADiLaR in his various fields of research. While the current Covid-19 situation and national lockdown is impacting the whole of South Africa and its workforce Mr. Mlambo plans to keep on writing and doing research projects in his field.
Upon asking Mr. Mlambo about his field of interest, in terms of research, he kept it straight forward and answered – “Lexicography”.
Mr. Respect Mlambo concluded by sharing his thoughts on the contribution that Digital Humanities can make within the African context by saying
“Digital Humanities will improve the functionality of African languages in various modern fields”.
SADiLaR Team: Setswana Researcher
Mrs. Valencia Wagner is SADiLaR’s Setswana language researcher. She finds sociolinguistics, phonetics, phonology and digital humanities particularly interesting within the language of Setswana.
Currently she is working on the writing of articles and organising virtual workshops. Other projects that is currently consuming Mrs. Wagner’s time is focusing- and working on her PhD studies. She is also planning to write more articles within the next few months. Furthermore Valencia will be working on two projects:
- The Setswana and IsiXhosa grammar portal
- The speech data collection project
Upon asking her what she thinks about the contribution that Digital Humanities can make in the African context she had the following to say:
“Africa still has a limited exposure to various digital tools, resources & methodologies, therefore, many African researchers and scholars still rely a lot on traditional methods of conducting research. Digital humanities can transform traditional humanities by creating and integrating digital technologies into African research. Research that would have taken years to complete manually, will be much easier to undertake with the use of digital tools.”
She concluded by saying that these digital platforms could also help us to share information and grow our indigenous communities.
SADiLaR Team: Afrikaans Researcher
Author: Mieke Hofmeyr
Benito Trollip is the SADiLaR reseacher in the field of the Afrikaans language. When it comes to research he is especially interested in the ways in which meaning is constructed in language.
“I tend to focus on compounds and other word-forming processes and the way people choose to combine form and meaning. There are endless possibilities when it comes to constructing meaning and language is a literal house of abundance when it comes to these possibilities,” says Mr Trollip.
He is also interested in legal aspects of research with regards to intellectual property rights, ownership and the distribution of data. As a SADiLaR researcher, Mr Trollip is always busy discovering new ways to bring language and the digital age together. At the current moment he is finalising a dataset and article on denominal adjectives in Afrikaans, of which eend-agtig 'duck-like' is an example. He has also worked with a graphic designer colleague of his on a short video on intensified adjectives in Afrikaans, of which hond-warm literally 'dog hot > piping hot' is one.
SADiLaR Team: isiZulu Researcher
Author: Mieke Hofmeyr
Ms. Rooweither Mabuya is SADiLaR’s IsiZulu language researcher. She finds computational linguistics interesting. More particularly she is interested in the development of resources for African languages.
Whilst the current Covid-19 situation and national lockdown is impacting the whole of South Africa and its workforce, Ms. Mabuya is staying focused by working on various projects.
The Carpentries: Second Online Workshop
Author: Mieke Hofmeyr
During the week of 1-5 June 2020, SADiLaR collaborated with The Carpentries again and hosted another online workshop. The daily sessions during this week started at 09:00 until 13:00 and a detailed schedule was followed to make sure that those who attended the workshop received all the knowledge and skills that the workshop promised to deliver.
The workshop aimed to develop and teach the fundamental data skills needed to conduct research, which included data organisation with spreadsheets and OpenRefine as well as data analyses and visualiation with R. The target audience for the workshop was mainly researchers and postgraduate students who have little to no prior computational experience, and its lessons were domain-specific, building on learners' existing knowledge to enable them to quickly apply skills learned to their research. Participants were encouraged to help one another and to apply what they have learned to their research problems.
SADiLaR Team: isiNdebele Reseacher
Author: Mieke Visser
Ms. Nomsa Skosana is the isiNdebele researcher at SADiLaR. She finds the research fields of terminology development and lexicography very interesting, but specializes in translation (most of her research papers are based on translation).
She submitted a paper for Euralex 2021 with an abstract accepted for poster presentation (which was postponed due to COVID-19) and is finishing up a paper for ALASA 2020. During the period of national lockdown Ms. Skosana also started on a new paper, based on Autshumato Machine Translation.
SADiLaR Team: Siswati Researcher
Author: Mieke Hofmeyr
Muzi Matfunjwa is the SADiLaR researcher, specialising in the language of Siswati. Other areas of research that interests Mr Matfunjwa includes Digital Humanities, Linguistics and Sociolinguistics. Covid-19 did not come in the way of Mr Matfunjwa and his research and he is currently working on a few different projects.
“I am writing an article on the translation of collocations in the South African Constitution from English into isiZulu, Siswati and isiNdebele. I am also writing an article on the use of ParaConc to extract terminology for quadrilingual dictionary creation.”
Muzi does not focus primarily on the present, but looks for prospective future future projects to work on as well. For the next couple of months, he will finish his articles and submit them for publication.
When it comes to Digital Humanities, Mr Matfunjwa believes that it can provide advanced and contemporary research methods in African studies especially in African Languages, hence promoting research and providing resources for African languages.
Resources for African Indigenous Languages (RAIL) online workshop
Author: Mmasibidi Setaka (SADiLaR Sesotho Researcher)
The South African Centre for Digital Language Resources (SADiLaR) organised a workshop (originally expected to be held at the LREC 2020 conference in Marseille, France) in the field of African Indigenous Language Resources. This workshop aimed at bringing together researchers who are interested in showcasing their research and thereby boosting the field of African indigenous languages. It provided an overview of the current state-of-the-art and emphasises availability of African indigenous language resources, including both data and tools. Additionally, it allowed for information sharing among researchers interested in African indigenous languages as well as starting discussions on improving the quality and availability of the resources. Many African indigenous languages currently have no or very limited resources available and, additionally, they are often structurally quite different from more well-resourced languages, requiring the development and use of specialised techniques. By bringing together researchers from different fields (e.g., (computational) linguistics, sociolinguistics, language technology) to discuss the development of language resources for African indigenous languages, we hoped the workshop would boost research in this field.