Bayesian estimation-based sentiment word embedding model for sentiment analysis
[摘要] Sentiment word embedding has been extensively studied and used in sentiment analysis tasks. However, most existing models have failed to differentiate high-frequency and low-frequency words. Accordingly, the sentiment information of low-frequency words is insufficiently captured, thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis. A Bayesian estimation-based sentiment word embedding (BESWE) model, which aims to precisely extract the sentiment information of low-frequency words, has been proposed. In the model, a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment probabilities of words, and a novel loss function is defined for sentiment word embedding learning. The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods, for example, C&W, CBOW, GloVe, SE-HyRank and DLJT1, in sentiment analysis tasks, which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function. In addition, replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.
[发布日期] [发布机构]
[效力级别] [学科分类] 数学(综合)
[关键词] text analysis;learning (artificial intelligence);probability;Bayes methods;natural language processing;pattern classification [时效性]