A modular architecture with random access on-chip local memory for real-time motion estimation has been proposed. The random access on-chip local memory with simple address generation has been proposed to overcome the irregular data flow of the three-step search BMA. This architecture features simple interconnection with low memory bandwidth and throughput rate as high as 1/N block per clock cycle for an N x N block with the search range of dm = N/2 - 1 pixels with 100% processor utilization. By using a method called pipeline interleaving, this architecture offers a feasible solution for the Grand Alliance HDTV picture format with large search range.