A Scalable Locality-aware Adaptive Work-stealingScheduler for Multi-core Task Parallelism
[摘要] Recent trend has made it clear that the processor makers are committed to the multicorechip designs. The number of cores per chip is increasing, while there is little orno increase in the clock speed. This parallelism trend poses a significant and urgentchallenge on computer software because programs have to be written or transformedinto a multi-threaded form to take full advantage of future hardware advances.Task parallelism has been identified as one of the prerequisites for software productivity.In task parallelism, programmers focus on decomposing the problem into subcomputationsthat can run in parallel and leave the compiler and runtime to handlethe scheduling details. This separation of concerns between task decomposition andscheduling provides productivity to the programmer but poses challenges to theruntime scheduler.Our thesis is that work-stealing schedulers with adaptive scheduling policies andlocality-awareness can provide a scalable and robust runtime foundation for multicoretask parallelism. We evaluate our thesis using the new Scalable Locality-awareAdaptive Work-stealing (SLAW) runtime scheduler developed for the Habanero-Javaprogramming language, a task-parallel variant of Java.SLAW's adaptive task scheduling is motivated by the study of two commonscheduling policies in a work-stealing scheduler, specifically, the work-first and thehelp-first policy. Both policies exhibit limitations in performance and resource usagein different situations. The variances make it hard to determine the best policy apriori. SLAW addresses these limitations by supporting both policies simultaneouslyand selecting policies adaptively on a per-task basis at runtime. Our results showthat SLAW achieves O.98x to 9.2x speedup over the help-first scheduler and O.97xto 4.5x speedup over the work-first scheduler. Further, for large irregular parallelcomputations, SLAW supports data sizes and achieves performance that cannot bedelivered by the use of any single fixed policy.SLAW's locality-aware scheduling framework aims to overcome the cache unfriendlinessof work-stealing due to randomized stealing. The SLAW scheduler is designedfor programming models where locality hints are provided to the runtime by theprogrammer or compiler. Our results show that locality-aware scheduling can improveperformance by increasing temporal data reuse for iterative data-parallel applications.
[发布日期] [发布机构] Rice University
[效力级别] science [学科分类]
[关键词] [时效性]