已收录 268921 条政策
 政策提纲
  • 暂无提纲
Accelerating Reads in Big-Data File Systems via Upward Migration of Cold Data
[摘要] The ability to process massive amounts of data to draw insights for human consumption or to train machine learning models is transforming the world as we know it. At the heart of the infrastructure to process this data are big-data file systems. File systems store the rawinputs as well as the outputs of computations. To keep up with the ever-increasing amount of data being collected, we have to continuously improve file systems to avoid them becoming the bottleneck to extracting value out of our data.This thesis optimizes reads within big-data file systems by migrating cold data into memory.Two observations motivate it. The first is that many big-data applications spend a significantpart of their execution blocked on disk reads. Second, the inputs for many applications are cold, so common techniques that aim to keep hot data in memory do not benefit these jobs.We develop a three-part approach to the challenge of accelerating cold data reads in big-data file systems.The first part is exploiting fore-knowledge of the access pattern of applications to initiate asynchronous migration of inputs into memory before reads occur. We analyzetrace data from a production cluster at Google and find that the key ingredients for effectively migrating cold data exist in production environments.We then design and implement Ignem, a framework for migrating cold data in big-data file systems. Ignem provides substantial speed up to several applications. However, Ignem is unable to respond to bandwidth imbalance and performs poorly when the load is dynamic and heterogeneous. To address this, the second part of this thesis presents the design and implementation of a bandwidth-aware migration scheme, DYRS, that can adapt to the available bandwidth on storage nodes. When there is bandwidth heterogeneity, DYRS can dynamically adjust load distribution, avoid hotspots, and minimize the number of stragglers.Third, we study how different scheduling policies for migrations affect application performance. We observe that the choice of migration scheduling policy can have a significant effect on application performance and we investigate the underlying reasons behind why some ordering policies outperform others.
[发布日期]  [发布机构] Rice University
[效力级别] storage [学科分类] 
[关键词]  [时效性] 
   浏览次数:3      统一登录查看全文      激活码登录查看全文