Modeling Remote I/O versus Staging Tradeoff in Multi-Data Center Computing
[摘要] In multi-data center computing, data to be processed is not always local to the computation. This is a major challenge especially for data-intensive Cloud computing applications, since large amount of data would need to be either moved the local sites (staging) or accessed remotely over the network (remote I/O). Cloud application developers generally chose between staging and remote I/O intuitively without making any scientific comparison specific to their application data access patterns since there is no generic model available that they can use. In this paper, we propose a generic model for the Cloud application developers which would help them to choose the most appropriate data access mechanism for their specific application workloads. We define the parameters that potentially affect the end-to-end performance of the multi-data center Cloud applications which need to access large datasets over the network. To test and validate our models, we implemented a series of synthetic benchmark applications to simulate the most common data access patterns encountered in Cloud applications. We show that our model provides promising results in different settings with different parameters, such as network bandwidth, server and client capabilities, and data access ratio.
[发布日期] [发布机构] Department of Computer Science, North American College, Houston; TX; 77038, United States^1
[效力级别] 物理学 [学科分类] 计算机科学(综合)
[关键词] Application data;Cloud applications;Data intensive;End-to-end performance;Generic modeling;Large datasets;Network bandwidth;Synthetic benchmark [时效性]