Educational data mining (EDM) in a South African University: a longitudinal study of factors that affect the academic performance of computer science I students
[摘要] The past few years have seen an increase in the number of first year students registering in the Schoolof Computer Science at Wits University. These students come from different backgrounds both academicallyand socially. As do many other institutions, Wits University collects and stores vast amounts ofdata about the students they enrol and teach. However this data is not always used after being stored. Thearea of Educational Data Mining (EDM) focuses on using this stored data to find trends and patterns thatcould enhance the knowledge about the student’s behavior, their academic performance and the learningenvironment.This longitudinal study focuses on the application of EDM techniques to obtain a better understandingof some of the factors that influence the academic performance of first year computer science studentsat the University of the Witwatersrand. Knowledge obtained using these techniques could assist in increasingthe number of students who complete their studies successfully and identifying students whoare at risk of failing and ensuring that early intervention processes can be put into place. A modifiedversion of the CRISP-DM (CRoss-Industry Standard Process for Data Mining) was used, with three datamining techniques, namely: Classification, Clustering and Association Rule Mining. Three algorithmswere compared in the first two techniques while only one algorithm was used in the Association RuleMining. For the classification technique, the three algorithms that were compared were the J48 Classifier,Decision Table and Na¨ıve Bayes algorithm. The clustering algorithms used included the SimpleK-means, Expectation Maximization (EM) and the Farthest First algorithm. Finally, the Predictive Apriorialgorithm was selected as the Association Rule Mining technique.Historical Computer Science I data, from 2006 to 2011, was used as the training data. This set of datawas used to find relationships within the data that could assist with predictive modeling. For each of theselected techniques a model was created using the training data set. These models were incorporated ina tool, the Success or Failure Determiner (SOFD), that was created specifically as part of this research.Thereafter, the test data set was put through the SOFD tool in the testing phase. Test data sets usuallycontain a variable whose value is predicted using the models built during the training phase. The 2012Computer Science I data instances were used during the testing phase. The investigations brought forthboth expected and interesting results. A good relationship was found between academic performance inComputer Science and three of the factors investigated: Mathematics I, mid-year mark and the moduleperceived to be the most difficult in the course. The relationship between Mathematics and ComputerScience was expected, However, the other two factors (mid-year mark and most difficult module) arenew, and may need to be further investigated in other courses or in future studies. An interesting findingfrom the Mathematics investigation was the better relationship between Computer Science and Algebrarather than Calculus. Using these three factors to predict Computer Science performance could assistin improving throughput and retention rates by identifying students at risk of failing, before they writetheir final examinations. The Association Rule Mining technique assisted in identifying the selection ofcourses that could yield the best academic performance overall, in first year. This finding is important,since the information obtained could be used during the registration process to assist students in makingthe correct decisions when selecting the courses they would like to do. The overall results show that usingdata mining techniques and historical data collected atWits University about first year Computer Science(CS-1) students can assist in obtaining meaningful information and knowledge, from which a better uniiderstanding of present and future generations of CS-1 students can be derived, and solutions found tosome of the academic problems and challenges facing them. Additionally this can assist in obtaining abetter understanding of the students and factors that influence their academic performance. This studycan be extended to include more courses withinWits University and other higher educational institutions.Keywords. Educational Data Mining, CRISP-DM, Classification, Clustering, Association Rule Mining,J48 Classifier, Decision Table, Na¨ıve Bayes, Simple K-means, Expectation Maximization, FarthestFirst, Predictive Apriori
[发布日期] [发布机构] University of the Witwatersrand
[效力级别] [学科分类]
[关键词] [时效性]