Polymorphic Pipeline Array: A Flexible Multicore Accelerator for MobileMultimedia Applications.

[摘要] Mobile computing in the form of smart phones, netbooks, and PDAs has become an integral part of our everyday lives. Moving ahead to the next generation of mobile devices, we believe that multimedia will become a more critical and product-differentiating feature. High definition audio and video as well as 3D graphics providericher interfaces and compelling capabilities. However, these algorithms also bring different computational challenges than wireless signal processing. Multimedia algorithms are more complex featuring more control flow and variable computational requirements where execution time is not dominated by innermost vector loops. Further, data access is more complex where media applications typically operate on multi-dimensional vectors of data rather than single-dimensional vectors with simple strides. Thus, the design of current mobile platforms requires re-examination to account for these new application domains.In this dissertation, we focus on the design of a programmable, low-power accelerator for multimedia algorithms referred to as a Polymorphic Pipeline Array (PPA).The PPA design is inspired by coarse-grain reconfigurable architectures (CGRAs) that consist of an array of function units interconnected by a mesh style interconnect. The PPA improves upon CGRAs by attacking two major limitations:scalability and acceleration limited to innermost loops. The large number of resources are fullyutilized by exploiting both Lne-grain instruction-level and coarse-grain pipeline parallelism,and the acceleration is extended beyond innermost loops to encompass the whole region of applications.Various compiler and architectural optimizations are presented for CGRAs that form the basic building blocks of PPA. Two compiler techniques are presented that systematically construct the schedule with intelligent heuristics. Modulo graph embedding leverages graph embedding technique for scheduling in CGRAs and edgecentricmodulo scheduling provides a communication-oriented way to address the scheduling problem. For architectural improvement, a novel control path design is presented that leverages the token network of dataflow machines to reduce the instructionmemory power.The PPA is designed with flexibility and programmability as first-order requirementsto enable the hardware to be dynamically customizable to the application. A PPA exploit pipeline parallelism found in streaming applications to create a coarsegrain hardware pipeline to execute streaming media applications.

[发布日期] [发布机构] University of Michigan

[效力级别] Computer Science [学科分类]

[关键词] Multicore Accelerator for Embedded Systems;Computer Science;Engineering;Computer Science & Engineering [时效性]

浏览次数：4

统一登录查看全文激活码登录查看全文