What did they cover? : a cluster analysis of news stories published in the Botswana Daily News, January – December 2004
[摘要] ENGLISH ABSTRACT: In this study, a cluster analysis of news stories published in the Botswana DailyNews during the period January - December 2004 was undertaken. The studywas exploratory in nature and sought to find out what topics were predominantduring the study period. The approach we adopted can be divided into threephases, namely data collection, document pre-processing, and cluster analysis.The data used in the study was downloaded from the Botswana Daily Newswebsite using a simple program developed specifically for that purpose. Documentpre-processing was concerned with transforming the raw documentsinto a format that could be directly operated upon by the various clusteringalgorithms. The documents themselves were represented using the vectorspace model, with the tf.idf term weighting scheme. We experimented withthree clustering approaches, namely, direct k-way clustering, k-way clusteringthrough repeated bisections, and agglomerative clustering. Agglomerativeclustering performed poorly, and we thus discarded its results. Direct k-wayclustering and k-way clustering through repeated bisections produced similarresults, though the former performed better in terms of external isolation andinternal cohesion of the clusters produced. Consequently, we only retained theresults from direct k-way clustering, and subsequently performed a quarterlyanalysis of our corpus using only the direct k-way clustering algorithm. Analysisof the complete corpus identified a number of topics that were prevalentover the study period. Interestingly, a quarterly analysis of the corpus revealedother topics whose prevalence appears to have been limited to certain parts ofthe year.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]