| Speed: |
| * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs). |
| Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but |
| less wall time. |
| |
| * experiment with compiler flags |
| Special thanks to Oscar Lesta. He suggested some compiler flags |
| for gcc that make a big difference. They shave 10-15% off |
| execution time on some systems. Try some combination of: |
| -march=pentiumpro |
| -ffast-math |
| -fomit-frame-pointer |
| |
| * If the input data has no imaginary component, use the kiss_fftr code under tools/. |
| Real ffts are roughly twice as fast as complex. |
| |
| * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine, |
| then you might want to experiment with the USE_SIMD code. See README.simd |
| |
| |
| Reducing code size: |
| * remove some of the butterflies. There are currently butterflies optimized for radices |
| 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain |
| other factors, they just won't be quite as fast. You can decide for yourself |
| whether to keep radix 2 or 4. If you do some work in this area, let me |
| know what you find. |
| |
| * For platforms where ROM/code space is more plentiful than RAM, |
| consider creating a hardcoded kiss_fft_state. In other words, decide which |
| FFT size(s) you want and make a structure with the correct factors and twiddles. |
| |
| * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation |
| on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro" |
| |
| Some of these were rolled into the mainline code base: |
| - using long casts to promote intermediate results of short*short multiplication |
| - delaying allocation of buffers that are sometimes unused. |
| In some cases, it may be desirable to limit capability in order to better suit the target: |
| - predefining the twiddle tables for the desired fft size. |