Tree - bw/opentoonz - Cool Bug Repo

bw / opentoonz

Blame thirdparty/kiss_fft130/TIPS

Blob Raw

shun_iwasawa	a35b8f	`Speed:`
shun_iwasawa	a35b8f	`* If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs).`
shun_iwasawa	a35b8f	`Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but`
shun_iwasawa	a35b8f	`less wall time.`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`* experiment with compiler flags`
shun_iwasawa	a35b8f	`Special thanks to Oscar Lesta. He suggested some compiler flags`
shun_iwasawa	a35b8f	`for gcc that make a big difference. They shave 10-15% off`
shun_iwasawa	a35b8f	`execution time on some systems. Try some combination of:`
shun_iwasawa	a35b8f	`-march=pentiumpro`
shun_iwasawa	a35b8f	`-ffast-math`
shun_iwasawa	a35b8f	`-fomit-frame-pointer`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`* If the input data has no imaginary component, use the kiss_fftr code under tools/.`
shun_iwasawa	a35b8f	`Real ffts are roughly twice as fast as complex.`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`* If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,`
shun_iwasawa	a35b8f	`then you might want to experiment with the USE_SIMD code. See README.simd`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`Reducing code size:`
shun_iwasawa	a35b8f	`* remove some of the butterflies. There are currently butterflies optimized for radices`
shun_iwasawa	a35b8f	`2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain`
shun_iwasawa	a35b8f	`other factors, they just won't be quite as fast. You can decide for yourself`
shun_iwasawa	a35b8f	`whether to keep radix 2 or 4. If you do some work in this area, let me`
shun_iwasawa	a35b8f	`know what you find.`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`* For platforms where ROM/code space is more plentiful than RAM,`
shun_iwasawa	a35b8f	`consider creating a hardcoded kiss_fft_state. In other words, decide which`
shun_iwasawa	a35b8f	`FFT size(s) you want and make a structure with the correct factors and twiddles.`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`* Frank van der Hulst offered numerous suggestions for smaller code size and correct operation`
shun_iwasawa	a35b8f	`on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro"`
shun_iwasawa	a35b8f
shun_iwasawa	a35b8f	`Some of these were rolled into the mainline code base:`
shun_iwasawa	a35b8f	`- using long casts to promote intermediate results of short*short multiplication`
shun_iwasawa	a35b8f	`- delaying allocation of buffers that are sometimes unused.`
shun_iwasawa	a35b8f	`In some cases, it may be desirable to limit capability in order to better suit the target:`
shun_iwasawa	a35b8f	`- predefining the twiddle tables for the desired fft size.`

bw / opentoonz

Source Code

Blame thirdparty/kiss_fft130/TIPS