已收录 268921 条政策
 政策提纲
  • 暂无提纲
The SPRAWL distributed stream dissemination system
[摘要] Many large financial, news, and social media companies process and stream large quantities of data to customers, either through the public Internet or on their own internal networks. These customers often depend on that data being delivered in a timely and resource-efficient manner. In addition, many customers subscribe to the same or similar data products (e.g., particular types of financial feeds, or feeds of specific social media users). A naive implementation of a data dissemination network like this will cause redundant data to be processed and delivered repeatedly, wasting CPU and bandwidth, increasing network delays, and driving up costs. In this dissertation, we present SPRAWL, a distributed stream processing layer to address the wide-area data processing and dissemination problem. SPRAWL provides two key functions. First, it is able to generate a shared and distributed multi-query plan that transmits records through the network just once, and shares the computation of streaming operators that operate on the same subset of data. Second, it is able to compute an in-network placement of complex queries (each with dozens of operators) in wide-area networks (consisting of thousands of nodes). This placement is optimal within polynomial time and memory complexity when there are no resource (CPU, bandwidth) or query (latency) constraints. In addition, we develop several heuristics to guarantee the placement is near optimal when constraints are violated, and experimentally evaluate the performance of our algorithms versus an exhausting algorithm. We also design and implement a distributed version of the SPRAWL placement algorithm in order to support wide-area networks consisting of thousands of nodes, which centralized algorithms cannot handle. Finally, we show that SPRAWL can make complex query placement decisions on wide-area networks within seconds, and the placement can increase throughput by up to a factor of 5 and reduce dollar costs by a factor of 6 on a financial data stream processing task.
[发布日期]  [发布机构] Massachusetts Institute of Technology
[效力级别]  [学科分类] 
[关键词]  [时效性] 
   浏览次数:4      统一登录查看全文      激活码登录查看全文