Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques
[摘要] Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called "sentiment analysis" is born to address the problem of automatically determining the polarity (Positive, negative, neutral,...) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.
[发布日期] [发布机构] Laboratory of System Analysis, Information Processing and Integrated Management, Mohammadia School of Engineers, Mohammed v University, Rabat, Morocco^1
[效力级别] 无线电电子学 [学科分类]
[关键词] Different domains;E-commerce sites;Evaluation study;K nearest neighbor (KNN);Machine learning techniques;Sentiment classification;Standard deviation;Supervised machine learning [时效性]