Distributed data as a choice in PetaBricks

[摘要] Traditionally, programming for large computer systems requires programmers to hand place the data and computation across all system components such as memory, processors, and GPUs. As each system can have sufficiently different compositions, the application partitioning, as well as algorithms and data structures, has to be different for each system. Thus, hardcoding the partitioning not only is difficult but also makes the programs not performance portable. PetaBricks solves this problem by allowing programmers to specify multiple algorithmic choices to compute the outputs, and let the system decide how to apply these choices. Since PetaBricks can determine optimized computation order and data placement with auto-tuning, programmers do not need to modify the programs when migrating to a new system. In this thesis, we address the problem of automatically partitioning PetaBricks programs across a cluster of distributed memory machines. It is complicated to decide which algorithm to use, where to place data, and how to distribute computation. We simplify the decision by auto-tuning data placement, and moving computation to where the most data is. Another problem is using distributed data and scheduler can be costly. In order to eliminate distributed overhead, we generate multiple versions of code for different types of data access, and automatically switch to run a shared memory version when the data is local to achieve better performance. To show that the system can scale, we run PetaBricks benchmark on an 8-node system, with a total of 96 cores, and a 64-node system, with a total of 512 cores. We compare the performance with a non-distributed version of PetaBricks, and, in some cases, we get linear speedups.

[发布日期] [发布机构] Massachusetts Institute of Technology

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：4

统一登录查看全文激活码登录查看全文