Modular reduction is a fundamental operation in cryptographic systems. Most well known modular reduction methods including Barrett?s and Montgomery?s algorithms leverage some-pre computations to avoid divisions so that the main complexity of these methods lies in a sequence of two long multiplications. For large wordlengths a multiplication which is tantamount to a linear convolution is performed via the Fast Fourier Transform (FFT) or other transform-based techniques as in the Schonhage-Strassen multiplication algorithm.
We show a fundamental property (the separation principle): in a modular reduction based on long multiplications, the linear convolution required by one of the two long multiplications can be replaced by a cyclic convolution, and the halves can be separated using other information available due to the intrinsic redundancy of the operations. This reduces the number of operations by about 25%. We demonstrate that both Barrett?s and Montgomery?s methods can be sped up by using the aforementioned fundamental principle. It is shown that a direct application of this algorithm to modular exponentiation (either using Barrett?s or Montgomery?s methods) can be expected to yield about about 17% speedup.