In this paper, we will study the on-chip network and memory hierarchy design of the Godson-T - a homogeneous many-core processor. Godson-T has 64 cores (with private L1 cache), and 16 global L2 cache banks. All these on-chip units are connected by a 2D $8\times8$ mesh network. Our study reveals that:(a) Global on-chip L2 cache can effectively alleviate the memory pressure caused by the data-thirsty on-chip computing engines. However, its potential is still limited by both the off-chip and the in-chip bandwidth, especially when increasing the number of active threads.(b) On-chip traffic congestion is largely caused by the intensive memory access requests issued from the on-chipcores. Therefore, the design of the on-chip network must consider the available performance of the datapath that connects the processor to the main memory. (c) In theory, different applications have different communication patterns (Berkeley's view [1]). However, the application's runtime communication pattern is only determined by the design of the underlying memory hierarchy and on-chip interconnection. These conclusions are generally applicable to a wide variety of many-core processors with similar design.
Index Terms:
many-core processor, on-chip network, communication pattern, memory hierarchy, cache
Citation:
Xu Wang, Ge Gan, Joseph Manzano, Dongrui Fan, Shuxu Guo, "A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor," icpads, pp.689-696, 2008 14th IEEE International Conference on Parallel and Distributed Systems, 2008