The performance of modern multiprocessor systems is increasingly limited by interconnection delays or long latencies of memory subsystems. Instruction-level multithreading is a technique to tolerate such long latencies by switching from one instruction thread to another and continuing instruction execution concurrently with the long-latency operations. Using timed Petri net models, the paper analyzes performance limitations introduces by different components of distributed-memory multithreaded multiprocessor systems. Simulation results are used to compare performance improvements obtained by replicating critical components of the system to those obtained using components with better performance characteristics.