Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies
[摘要] The COordinate Rotate DIgital Computer (CORDIC) algorithm is a well knownversatile approach and is widely applied in today's SoCs for especially butnot restricted to digital communications. Dedicated CORDIC blocks can beimplemented in deep sub-micron CMOS technologies at very low area and energycosts and are attractive to be used as hardware accelerators for ApplicationSpecific Instruction Processors (ASIPs). Thereby, overcoming the well knownenergy vs. flexibility conflict. Optimizing Global Navigation SatelliteSystem (GNSS) receivers to reduce the hardware complexity is an importantresearch topic at present. In such receivers CORDIC accelerators can be usedfor digital baseband processing (fixed-point) and in Position-Velocity-Timeestimation (floating-point). A micro architecture well suited to suchapplications is presented. This architecture is parameterized according tothe wordlengths as well as the number of iterations and can be easilyextended for floating point data format. Moreover, area can be traded forthroughput by partially or even fully unrolling the iterations, whereby thedegree of pipelining is organized with one CORDIC iteration per cycle. Fromthe architectural description, the macro layout can be generated fullyautomatically using an in-house datapath generator tool. Since the addersand shifters play an important role in optimizing the CORDIC block, theymust be carefully optimized for high area and energy efficiency in theunderlying technology. So, for this purpose carry-select adders andlogarithmic shifters have been chosen. Device dimensioning was automaticallyoptimized with respect to dynamic and static power, area and performanceusing the in-house tool. The fully sequential CORDIC block for fixed-pointdigital baseband processing features a wordlength of 16 bits, requires 5232transistors, which is implemented in a 40-nm CMOS technology and occupies asilicon area of 1560 μm2 only. Maximum clock frequency fromcircuit simulation of extracted netlist is 768 MHz under typical, and 463 MHz under worst case technology and application corner conditions,respectively. Simulated dynamic power dissipation is 0.24 uW MHz−1 at 0.9 V; static power is 38 uW in slow corner, 65 uW in typical corner and 518 uW infast corner, respectively. The latter can be reduced by 43% in a 40-nmCMOS technology using 0.5 V reverse-backbias. These features are comparedwith the results from different design styles as well as with animplementation in 28-nm CMOS technology. It is interesting that in thelatter case area scales as expected, but worst case performance and energydo not scale well anymore.
[发布日期] [发布机构]
[效力级别] [学科分类] 电子、光学、磁材料
[关键词] [时效性]