4模余数系统反向转换器设计
摘 要: 反向转化已经成为制约剩余数系统发展的瓶颈问题,尤其对于模集合个数多于3个的模集合。针对4基数模集合{2n,22n+1,2n+1,2n-1},在新中国余数定理Ⅰ的基础上,提出了一个新的高效并行转换算法。该算法可同时处理4个模,处理数的动态范围达到5n位,乘法逆元全部采用闭合形式,电路完全基于加法器构成,硬件实现容易。理论分析表明,与同类模集合反向转换器相比,大大降低了对硬件电路的要求,明显减小了转换器的面积和电路延迟,提高了转换效率。
关键词: 新中国余数定理; 反向转换; 余数系统; VLSI
中图分类号: TN911?34 文献标识码: A 文章编号: 1004?373X(2016)02?0110?03
Design of reverse convertor for 4?moduli residue number system
Lü Xiaolan, CUI Delong
(College of Computer and Electronic Information, Guangdong University of Petrochemical Technology, Maoming 525000, China)
Abstract: The reverse conversion has become the bottleneck which restricts the development of the residue number system (RNS), especially for the moduli sets in which the number of the moduli is larger than 3. For the 4?moduli set {2n, 22n+1, 2n+1, 2n-1}, a new efficient parallel conversion algorithm is proposed based on the new Chinese Remainder Theorem I. The proposed algorithm can deal with the four moduli simultaneously, and the dynamic range of the treatment number can reach up to 5n bits. The close form is adopted by the multiplicative inverse of the moduli set, and the circuit is entirely constituted based on the summator, which is easy for hardware implementation. The theoretical analysis indicates that, compared with the congeneric moduli?set reverse converter, the system can greatly reduce the requirements for hardware circuit, obviously decrease the area and circuit delay of the converter, and improve the conversion efficiency.
Keywords: new Chinese Remainder Theorem; reverse conversion; residue number system; VLSI
0 引 言
随着大规模集成电路发展,高集成度,高精度便携式电子系统的发展,在信号处理方面,大规模的并行处理技术已经逐步取代传统的信号处理技术。基于此,剩余数系统以其特有的无权重和并行运算特性,成为大规模并行信号处理技术的最佳选择[1?2]。
剩余数系统应用的意义已经被证明,尤其在处理密集型加、减、乘、除等运算速度上占有绝对的优势。然而,由于其运算的复杂性,在剩余数系统就失去了并行性的优势,这些运算有时不得不将余数转换成二进制数后再做运算,所以会浪费大量的电路面积和延迟。为了提高此类运算电路的性能,近年来许多和研究人员开始对此领域进行研究,但是大部分主要针对比较常用的3模集合[3?4]{2n,2n+1,2n-1}。
本文针对4模集合[5]{2n,22n+1,2n+1,2n-1},在分析模集合特征的基础上,提出了一个新反向转换算法,并基于加法器实现其VLSI结构。
1 算法描述
定理1 四个两两互素的正整数m1,m2,m3,m4(i=1,2,3,4),M=m1·m2·m3·m4为可处理数据的动态范围,X模mi所得到的余数表示为[Xmi=xi]。根据新中国余数定理1(New CRT?Ⅰ),其剩余数表示[x1,x2,x3,x4RNS]对应的权重数X在0~M区间具有惟一解[4],即:
[X=x1+m1k1(x2-x1)+k2m2(x3-x2)+k3m2m3(x4-x3)m2m3m4] (1)
其中:k1,k2,k表示乘法逆元,满足[k1m1m2m3m4=1,k2m1m2m3m4=1,k3m1m2m3m4=1]。
根据文献[4],对于4基数模集合{2n,22n+1,2n+1,2n-1},当n为任意整数时,模之间两两互质。设[m1=2n],[m2=22n+1],[m3=2n+1],[m4=2n-1],此剩余数([x1],[x2],[x3],[x4])RNS对应剩余数的二进制表示为:
[x1=x1,n-1……x1,0n], [x2=x2,2n……x2,02n+1],
[x3=x3,n……x3,0n+1], [x4=x4,n-1……x4,0n]
乘法逆元计算如下所述:
乘法逆元的计算公式:
[k1m1m2m3m4=1?k12n24n-1=1?k1=23n]
证明:
[23n2n24n-1=24n=1]
[k2m1m2m3m4=1?k2=2n-1]
证明:
[2n-12n22n+122n-1=22n-122n-1+22n22n-1=1]
[k3m1m2m3m4=1?k2=2n-2]
证明:
[k32n22n+12n+12n-1=2n2n2n-1=1]
根据式(1),有:
[X=x1+2nk1(x2-x1)+k222n+1(x3-x2)+ k322n+12n+1(x4-x3)24n-1 =x1+2nY] (2)
其中:
[Y=k1(x2-x1)+k222n+1(x3-x2)+ +k322n+12n+1(x4-x3)24n-1 =23n(x2-x1)+2n-122n+1(x3-x2)+ +2n-222n+12n+1(x4-x3)24n-1 =ω2x2+ω1x1+ω3x3+ω4x424n-1 =α1+α2+α3+α424n-1] (3)
其中:
[α1=ω1x124n-1=-23nx124n-1],
[α2=ω2x2=23n-1-2n-1x224n-1],
[α3=ω3x3=23n-2-24n-2-22n-2+2n-2x324n-1,] [α4=ω4x4=2n-223n+22n+2n+1x424n-1]
2 硬件实现
定理2 若0≤v≤2n-2,则v2i模2n-1的结果相当于将n位宽二进制数v,即vn-1vn-2…v0循环左移i位。
定理3 若 0≤v≤2n-2,则(-v)2i模2n-1的结果相当于将v乘以2i模2n-1的结果按位取反。
根据定理2和定理3,[α1,α2,α3,α4]进一步表示为:
[α1=ω1x124n-1=-23nx124n-1 =x1,n-1…x1,0n&1…13n24n-1] (4)
[α2=ω2x2=23n-1-2n-1x224n-1 =x2,n…x2,0n+1&0…02n-1&x2,2n…x2,n+1n+ 1…1n&x2,2n…x2,02n+1&1…1n-124n-1] (5)
式中:符号“&”表示拼接。[α1]与[α2]进行合并,由于[2n-12n-1=0],去掉n位全1项,合并得到[α5]:
[α5=x2,n…x2,0n+1&0…02n-1&x2,2n…x2,n+1n+ x1,n-1…x1,0n&x2,2n…x2,02n+1&1…1n-124n-1 =α51+α5224n-1] (6)
其中:
[α51=x2,n…x2,0n+1&0…02n-1&x2,2n…x2,n+1n24n-1α52=x1,n-1…x1,0n&x2,2n…x2,02n+1&1…1n-124n-1] (7)[α3=ω3x3=23n-2-24n-2-22n-2+2n-2x324n-1 =0&x3,n…x3,0n+1&0…0n-1&x3,n…x3,0n+1&0…0n-2+ x3,1x3,02&1…1n-1&x3,n…x3,0n+1&1…1n-1&x3,n…x3,2n+124n-1 =α31+α32]
[α4=ω4x4=2n-223n+22n+2n+1x424n-1=x4,1x4,02&x4,n-1…x4,0n&x4,n-1…x4,0n& x4,n-1…x4,0n&x4,n-1…x4,2n-224n-1] (9)
最终,[α51,α52,α31,α32,α4]这5个数通过3级进位保留加法器(CSA),最终形成2个4n位宽的S,C;S和C通过模[24n-1]加法器得到4n位模加法器的结果Y,[Y]&x1连接,直接形成整数X;整体结构图如图1所示。
图1 模集合{2n,22n+1,2n+1,2n-1}反向转换
3 性能评估和比较
为了进行定性比较,本文与文献[4]的算法模型进行对比,采用1位全加器(FA)的面积和延迟作为所有模型基本计算单位进行比较。本研究和文献[4]全部采用目前效率最高的具有惟一表示的快速并行前缀模2n-1加法器[6]。根据文献[6],其面积按照nAFA,延时为2ntFA计算。同时,在进位保留加法器阶段,4n位进位保留其硬件按4nAFA,延时 [7]为tFA。硬件消耗和延时的理论对比数据如表1所示。从表中可以看出,在延时相同的情况下,本文所提出的剩余数至二进制转化算法模型在硬件消耗方面远远优于参考文献[4]给出的转换器算法模型。
表1 余数至二进制转换面积和延时的理论数据比较
4 结 语
文中给出4基数模集合{2n-1,2n+1,2n,22n-1-1}的剩余数至二进制数转换的优化算法,该模集合可同时4通道并行处理数据,可处理数据动态范围达5n-1位,乘法逆元全部属于闭合形式,电路基于加法器实现。理论分析结果表明,本研究的转换器算法优化,硬件实现容易,整体性能变现优异。
参考文献
[1] SZABO N S, TANAKA R I. Residue arithmetic and its applications to computer technology [M]. New York: McGraw?Hill, 1967: 1?20.
[2] MIROSLAV D L, DEJAN V T, BRIAN L E. 信号处理滤波器设计:基于Matlab和Mathematica的设计方法[M].朱义胜,董辉,译.北京:电子工业出版社,2004:250?256.
[3] WANG Y, SONG X, ABOULHAMID M, et al. Adder based residue to binary number converters for (2n-1, 2n, 2n+1) [J]. IEEE Transactions on Signal Processing, 2002, 50(7): 1772?1779.
[4] WANG Yuke. Residue?to?binary converters based on new Chinese remainder theorems [J]. IEEE Transactions on Circuits and Systems?II, 2000, 47(3): 197?205.
[5] CAO B, CHANG C H, SRIKANTHAN T. An efficient reverse converter for the 4?moduli set {2n-1, 2n, 2n+1, 22n+1} based on the new Chinese Remainder Theorem [J]. IEEE Transactions on Circuits and Systems I, 2003, 50(10): 1296?1303.
[6] PATEL R A, BENAISSA M, BOUSSAKTA S. Fast parallel?prefix architectures for modulo 2n?1 addition with a single representation of zero [J]. IEEE Transactions on Computers, 2007, 56(11): 1484?1492.
[7] PIESTRAK S J. Design of residue generators and multi?operand modular adders using carry?save adders [J]. IEEE Transactions on Computers, 1994, 43(1): 68?77.