The hierarchical structure of real-life data dominated applications limits the exploration space for high level optimisations. This limitation is often overcome by func-tion inlining. However, it increases the basic block code size, which causes a significant growth of instruction cache misses and thus performance slow-down. This effect has been confirmed on experiments with our applications.
We have developed a novel methodology for selective function inlining steered by cost/gain balance to trade-off power and performance. Although this results in a speed up, the increase of the instruction cache misses is still present, i.e. the memory power consumption is higher. This implies the possibility of the Pareto-optimal trade-offs between memory power and performance. Our methodology is demonstrated on an MPEG-4 video decoder.