Assessing Spurious Correlations in Big Search Data
[摘要] Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.
[发布日期] [发布机构]
[效力级别] [学科分类] 陶瓷学
[关键词] spurious correlation;Bonferroni;big data;big search data;Google Correlate;Google Trends;search data [时效性]