Further thoughts on FPGA co-processing and performance
I’d like to go one layer down on a point that was introduced in the last post and why you can’t lump all non-software approaches into one hardware bucket. Software suffers in terms of performance in two fundamental ways, heavy CPU loading when tasks are complex, and kernel to application space context switching when iteration counts are high (ie. millions of anything per second). Let’s go through how hardware helps with each.
Issue 1: Heavy CPU Loading/CPU Offload
CPU intensive operations are difficult for general purpose software to execute on general purpose CPUs. Examples might be monte carlo simulations, complex algorithms or complex transformation of large data records. Think of it this way: let’s say that the CPU cost of running simulations in software is as follows:

Intel promoting on-board FPGAs to address low-latency financial market
Rik Turner of CBR recently wrote an interesting story about Intel trying to break into the low-latency financial services space by courting FPGA chip manufacturers and solution providers leveraging FPGAs to partner with Intel as they launch their Nehalem technology, faster front-side bus and FPGA co-processing capabilities.
Of course the idea of hosting FPGA co-processing is not new, AMD has been offering a version of this approach for over two years. Intel is clearly playing catch-up here. It’s also not surprising that Intel would be concerned about any key market moving processing to specialized hardware that can outperform software on Intel processors by 10, 20 or even 50 times. Especially if one box of non-Intel special-purpose hardware can replace the work of 10 to 30 Intel boxes running software.
This is a classic case of if you can’t beat ‘em join ‘em, which has been a successful strategy for Intel in the past. The question is, who exactly will they be joining?


