In several digital signal processing algorithms, the computation is performed in consecutive stages consisting of parallel computational nodes. The stages are decoupled by data permutations where stride permutations are common because of their regularity. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, register-based stride permutation networks are proposed for array processors where the storage requirement of the networks is relatively small, and thus, memory-based structures would be an expensive solution. The proposed networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Furthermore, the networks are generated without heuristics, which makes them attractive for automated design procedures.
Citation:
Tuomas J?rvinen, Perttu Salmela, Harri Sorokin, Jarmo Takala, "Stride Permutation Networks for Array Processors," asap, pp.376-386, 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'04), 2004