This paper presents a multithreaded superpipelined superscalar processor design. It is expected to have a sustained rate of 5.4 instructions run per cycle, with 4 threads on chip. Multithreading serves to improve the superscalar CPI by interleaving threads executions. Operator sharing is used instead of out of order execution. It requires less hardware -no reservation stations, collision vectors or renamed registers- and should offer a greater parallelism potential. Arithmetic operators, including adders, shifters, a multiplier and a step divider, have been pipelined to reduce the processor cycle width to a 16 bits adder propagation delay. Separate and equal lengths data paths controlled by a completely RISC instruction set allow efficient in order issue and termination. Floating point operations are emulated with integer ones with data dependent algorithms providing as good latencies as for traditional hardware implementation. A single register file serves for both the integer and the floating point data.
Index Terms:
Architecture, Multithreaded Processors, Instruction Level Parallelism, Superscalar Processors, Superpipelined Processors
Citation:
Bernard Goossens, Duc Thang Vu, "Multithreading to Improve Cycle Width and CPI in Superpipelined Superscalar Processors," ispan, pp.36, 1996 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '96), 1996