Forecasting Twitter topic popularity using bass diffusion model and machine learning
[摘要] Today social network websites like Twitter are important information sources for a company;;s marketing, logistics and supply chain. Sometimes a topic about a product will ;;explode;; at a ;;peak day,;; suddenly being talked about by a large number of users. Predicting the diffusion process of a Twitter topic is meaningful for a company to forecast demand, and plan ahead to dispatch its products. In this study, we collected Twitter data on 220 topics, covering a wide range of fields. And we created 12 features for each topic at each time stage, e.g. number of tweets mentioning this topic per hour, number of followers of users already mentioning this topic, and percentage of root tweets among all tweets. The task in this study is to predict the total mention count within the whole time horizon, 180 days, as early and accurately as possible. To complete this task, we applied two models - fitting the curve denoting topic popularity (mention count curve) by Bass diffusion model; and using machine learning models including K-nearest-neighbor, linear regression, bagged tree, and ensemble to learn the topic popularity as a function of the features we created. The results of this study reveal that the Basic Bass model captures the underlying mechanism of the Twitter topic development process. And we can analogue Twitter topics;; adoption to a new product;;s diffusion. Using only mention count, over the whole time horizon, the Bass model has much better predictive accuracy, compared to machine learning models with extra features. However, even with the best model (the Bass model) and focusing on the subset of topics with better predictability, predictive accuracy is still not good enough before the ;;explosion day.;; This is because ;;explosion;; is usually triggered by news outside Twitter, and therefore is hard to predict without information outside Twitter.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]