This paper presents the algorithms and implementation of a high-performance functional unit used for multiple interpolation applications. Graphics processing units (GPUs) frequently perform two classes of floating point interpolation within programmable shaders: per-pixel attribute interpolation and transcendental function approximation. We present a design that efficiently performs both classes of interpolation on a shared functional unit. Enhanced minimax approximations with quadratic interpolation minimize lookup-table sizes and datapath widths for fully-pipelined function approximation. Rectangular multipliers support both sign-magnitude and two?s complement inputs of variable widths. Superpipelining is used throughout the design to increase operating frequency and interpolation throughput while maximizing area efficiency.