已收录 273611 条政策
 政策提纲
  • 暂无提纲
Development of a Big Data analytics demonstrator
[摘要] ENGLISH ABSTRACT: The continued development of the information era has established the term `Big Data' and large datasets are now easily created and stored.Now humanity begins to understand the value of data, and more importantly, that valuable insights are captured within data. To uncover and convert these insights into value, various mathematical and statisticaltechniques are combined with powerful computing capabilitiesto perform analytics. This process is described by the term `data science'.Machine learning is part of data analytics and is based on some of the mathematical techniques available.The ability of the industrial engineer to integrate systems and incorporate new technological developments benefiting business makes it inevitable that the industrial engineering domain will also be involved in data analytics. The aim of this study was to develop a demonstrator so that the industrial engineering domain can learn from it and have first-hand knowledge in order to better understand a Big DataAnalytics system.This study describes how the demonstrator as a system was developed,what practical obstacles were encountered as well as the techniques currently available to analyse large datasets for new insights. An architecture has been developed based on existing but somewhat limited literature and a hardware implementation has been done accordingly.For the purpose of this study, three computers were used: the first was configured as the master node and the other two as slave nodes.Software that coordinates and executes the analysis was identified and used to analyse various test datasets available in the public domain.The datasets are in different formats which require different machine learning techniques. These include, among others, regression under supervised learning, and k-means under unsupervised learning.The performance of this system is compared with a conventional analytics configuration, in which only one computer is used. The criteria used were 1) The time to analyse a dataset using a given techniqueand 2) the accuracy of the predictions made by the demonstrator and conventional system. The results were determined for several datasets, and it was found that smaller data sets were analysed faster by the conventional system, but it could not handle larger datasets.The demonstrator performed very well with larger datasets and all the machine learning techniques applied to it.
[发布日期]  [发布机构] Stellenbosch University
[效力级别]  [学科分类] 
[关键词]  [时效性] 
   浏览次数:3      统一登录查看全文      激活码登录查看全文