A high-performance 1D-DCT architecture is proposed. It is based on the New Distributed Arithmetic Architecture algorithm (NEDA) [1]. Enhancements to NEDA are proposed to reduce the number of computations. Only addition operations are used, with 42 additions to compute the outputs for an 8x1 DCT. No subtractions, multiplications, or ROM is needed. High-throughput is achieved by pipelining the architecture. In every clock cycle, it receives eight pixels (each is 9-bits) as inputs, and produces eight DCT coefficients (each is 14-bits). The delay of one pipeline stage is the delay of a 3-level 4:2 compressor tree. The architecture is implemented in 0.35? technologies; it runs at 1.5 GHz, and processes 108 Gbps of image/video sequence data.
Citation:
Ahmed Shams, Magdy Bayoumi, "A 108 Gbps, 1.5 GHz 1D-DCT Architecture," asap, pp.163, 12th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'00), 2000