已收录 272606 条政策
 政策提纲
  • 暂无提纲
GUMSMP: a scalable parallel Haskell implementation
[摘要] The most widely available high performance platforms today are hierarchical,with shared memory leaves, e.g. clusters of multi-cores, or NUMA with multipleregions. The Glasgow Haskell Compiler (GHC) provides a number of parallelHaskell implementations targeting different parallel architectures. In particular,GHC-SMP supports shared memory architectures, and GHC-GUM supportsdistributed memory machines. Both implementations use different, but related,runtime system (RTS) mechanisms and achieve good performance. A specialisedRTS for the ubiquitous hierarchical architectures is lacking.This thesis presents the design, implementation, and evaluation of a newparallel Haskell RTS, GUMSMP, that combines shared and distributed memorymechanisms to exploit hierarchical architectures more effectively. The designevaluates a variety of design choices and aims to efficiently combine scalabledistributed memory parallelism, using a virtual shared heap over a hierarchicalarchitecture, with low-overhead shared memory parallelism on shared memorynodes. Key design objectives in realising this system are to prefer local work,and to exploit mostly passive load distribution with pre-fetching.Systematic performance evaluation shows that the automatic hierarchical loaddistribution policies must be carefully tuned to obtain good performance. Weinvestigate the impact of several policies including work pre-fetching, favouringinter-node work distribution, and spark segregation with different export andselect policies. We present the performance results for GUMSMP, demonstratinggood scalability for a set of benchmarks on up to 300 cores. Moreover, our policiesprovide performance improvements of up to a factor of 1.5 compared to GHC-GUM.The thesis provides a performance evaluation of distributed and shared heapimplementations of parallel Haskell on a state-of-the-art physical shared memoryNUMA machine. The evaluation exposes bottlenecks in memory management,which limit scalability beyond 25 cores. We demonstrate that GUMSMP, thatcombines both distributed and shared heap abstractions, consistently outper-forms the shared memory GHC-SMP on seven benchmarks by a factor of 3.3on average. Specifically, we show that the best results are obtained when shar-ing memory only within a single NUMA region, and using distributed memorysystem abstractions across the regions.
[发布日期]  [发布机构] University:University of Glasgow;Department:School of Computing Science
[效力级别]  [学科分类] 
[关键词] parallel, multi-core [时效性] 
   浏览次数:23      统一登录查看全文      激活码登录查看全文