Fundamental Limits of Data Analytics in Sociotechnical Systems
[摘要] In the Big Data era, informational systems involving humans and machines are being deployed in multifarious societal settings. Many use data analytics as subcomponents for descriptive, predictive, and prescriptive tasks, often trained using machine learning. Yet when analytics components are placed in large-scale sociotechnical systems, it is often difficult to characterize how well the systems will act, measured with criteria relevant in the world. Here, we propose a system modeling technique that treats data analytics components as `noisy black boxes' or stochastic kernels, which together with elementary stochastic analysis provides insight into fundamental performance limits. An example application is helping prioritize people's limited attention, where learning algorithms rank tasks using noisy features and people sequentially select from the ranked list. This paper demonstrates the general technique by developing a stochastic model of analytics-enabled sequential selection, derives fundamental limits using concomitants of order statistics, and assesses limits in terms of system-wide performance metrics like screening cost and value of objects selected. Connections to sample complexity for bipartite ranking are also made.
[发布日期] [发布机构]
[效力级别] [学科分类] 计算机网络和通讯
[关键词] concomitants of order statistics;data analytics;fundamental limits;Sequential selection;Sociotechnical systems;stochasitc kernels [时效性]