The Covid 19 virus has been with us since late 2019 but only spread into a pandemic in early 2020. That is just seven months ago. Since then there have been thousands of periodical articles, news reports, editorials, reports, commentaries, blogs and all other matter of text documents dealing with the virus, its spread and how to contain it. Text mining and content analysis is a useful tool in exploring these documents to see how our thinking and processes have evolved over time, what we got right and what we got wrong.
In their recently published paper (Mora, Kummithab, Espositoc, 2020) examine how governments and public health agencies virtually the world have used information and liaison technologies (ICTs) to snift and tenancy the spread of the virus and how do sociomaterial arrangements moderate the effectiveness of the technological solutions unexplored by public authorities to tenancy the Covid-19 pandemic. The wordplay to this question is important considering while technology may be very useful its effectiveness can depend on how it is deployed and who deploys it.
Using keyword and combination keyword searches. (and, or) of they produced 2,187 pertinent documents published between January and April, 2020. They were worldly-wise to identify 39 technological solutions. The documents were cleaned to ensure the keywords were contained in the soul of the documents and not in headlines or references. The authors were left with 515 documents for text mining.
“In order to wield text mining techniques based on co-occurrence data, the source documents were transformed in Rich Text Format (RTF) files. The RTF files were then processed with the content wringer software WordStat (Version 8.0.21). WordStat transformed the source documents into high-dimensional sets of unstructured textual data and made it possible to semi-automatize both data cleaning and data processing. Through the data cleaning process, 7 the dimensionality of the dataset was reduced while preserving quality information. This process consisted in the removal of unnecessary textual information whose presence would have generated noise. In the specimen of wonk literature, for example, the footers, headers, and details well-nigh the authors and their institutions were filtered out. Given their little semantic value, stop words were moreover removed by relying on WordStat dictionaries. In addition, misspellings were corrected, and variant forms of the same word were lemmatized. Following the data cleaning phase, 13,894 textual items were extracted, which include 5,917 words and 7,977 phrases. Phrases are conceptual units well-balanced of minimum two and maximum four words. After stuff extracted, WordStat was tasked with measuring the strength of undertone between each couple of words and phrases by gingerly their intra-document co-occurrence. The co-occurrence data was then normalized. In structuring with research by Eck and Waltman (2009), to normalize the data, the probabilistic unification alphabetize Undertone Strength was preferred to set-theoretic measures.” P. 6-7
The authors identified various ICTs used to snift and gainsay the spread of Covid 19 and the relative effectiveness of the technology and how it was used based on the misogynist literature. As we know now some was were increasingly constructive than others. The guidelines on how and when to use methods such as temperature sensing devices, contact tracing, testing and other tactics are stuff continually updated as we learn increasingly well-nigh the virus and how it spreads. You can read the well-constructed article online.
References
Mora, L., Kummitha, R. K. R., & Esposito, G. (2020). Digital technology deployment and pandemic control: how sociomaterial arrangements and technological determinism undermine virus containment measures. Available at SSRN 3612338.