I've noticed that the Xilinx FFT bit-accurate c simulation calls are very very slow. Anyone else notice this?
I am working on hybrid fixed/floating-point digital signal processing application, and I frequently make calls to the bit-accurate simulation function (anywhere on the order of 1,000 times, to
1,000,000 times per run). As I'm attempting to run longer simulations, the runtime is becoming my critical path.The program is single-threaded right now, and I intend to write a multi-threaded version soon. But first, I decided to use a profiler on my single-threaded code to see what kind of speedup I should expect. I wasn't very surprised to find that 99% of my execution time was taking place inside the xilinx_ip_xfft_v5_0_bitacc_simulate() function. I WAS surprised to find that 55.6% of that function's execution time is being spent in malloc() and free() calls. I have some nice call graphs and an excel spreadsheet I'd be willing to share if someone from Xilinx would like to take a look.
Why not move all of the memory allocations into the "create" and "destroy" state calls of the program? If this were done, all memory allocations could be performed once, when the state were created. A speedup of over 2x could be realized with those changes to the function. On top of that, I fear that multi-threading my code won't lead to very much speedup, due to the spin-locks in the heap allocation kernel calls.
Now before anyone replies to this thread and says, "but that's a lot of memory allocations hanging around!" My reply: If a user is worried about running out of memory, they can create and destroy the state as needed. The memory is going to get allocated anyway. The only difference will be better control of how often the costly heap access kernel functions are called. I could see the memory possibly being wasted when simulating a core with a reconfigurable transform length. A pipelined 64K-point FFT would need roughly 1GB. At that point, however, the math complexity will probably start to eclipse the memory allocation penalties.
I'd be more than happy to make the changes to the source myself, if someone from Xilinx were kind enough to share it.
Cheers. Petersen Curt EM Photonics, Inc. Newark, DE