|
shun_iwasawa |
a35b8f |
KISS FFT - A mixed-radix Fast Fourier Transform based up on the principle,
|
|
shun_iwasawa |
a35b8f |
"Keep It Simple, Stupid."
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
There are many great fft libraries already around. Kiss FFT is not trying
|
|
shun_iwasawa |
a35b8f |
to be better than any of them. It only attempts to be a reasonably efficient,
|
|
shun_iwasawa |
a35b8f |
moderately useful FFT that can use fixed or floating data types and can be
|
|
shun_iwasawa |
a35b8f |
incorporated into someone's C program in a few minutes with trivial licensing.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
USAGE:
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
The basic usage for 1-d complex FFT is:
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
#include "kiss_fft.h"
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
kiss_fft_cfg cfg = kiss_fft_alloc( nfft ,is_inverse_fft ,0,0 );
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
while ...
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
... // put kth sample in cx_in[k].r and cx_in[k].i
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
kiss_fft( cfg , cx_in , cx_out );
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
... // transformed. DC is in cx_out[0].r and cx_out[0].i
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
free(cfg);
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Note: frequency-domain data is stored from dc up to 2pi.
|
|
shun_iwasawa |
a35b8f |
so cx_out[0] is the dc bin of the FFT
|
|
shun_iwasawa |
a35b8f |
and cx_out[nfft/2] is the Nyquist bin (if exists)
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Declarations are in "kiss_fft.h", along with a brief description of the
|
|
shun_iwasawa |
a35b8f |
functions you'll need to use.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Code definitions for 1d complex FFTs are in kiss_fft.c.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
You can do other cool stuff with the extras you'll find in tools/
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
* multi-dimensional FFTs
|
|
shun_iwasawa |
a35b8f |
* real-optimized FFTs (returns the positive half-spectrum: (nfft/2+1) complex frequency bins)
|
|
shun_iwasawa |
a35b8f |
* fast convolution FIR filtering (not available for fixed point)
|
|
shun_iwasawa |
a35b8f |
* spectrum image creation
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
The core fft and most tools/ code can be compiled to use float, double,
|
|
shun_iwasawa |
a35b8f |
Q15 short or Q31 samples. The default is float.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
BACKGROUND:
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
I started coding this because I couldn't find a fixed point FFT that didn't
|
|
shun_iwasawa |
a35b8f |
use assembly code. I started with floating point numbers so I could get the
|
|
shun_iwasawa |
a35b8f |
theory straight before working on fixed point issues. In the end, I had a
|
|
shun_iwasawa |
a35b8f |
little bit of code that could be recompiled easily to do ffts with short, float
|
|
shun_iwasawa |
a35b8f |
or double (other types should be easy too).
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Once I got my FFT working, I was curious about the speed compared to
|
|
shun_iwasawa |
a35b8f |
a well respected and highly optimized fft library. I don't want to criticize
|
|
shun_iwasawa |
a35b8f |
this great library, so let's call it FFT_BRANDX.
|
|
shun_iwasawa |
a35b8f |
During this process, I learned:
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
1. FFT_BRANDX has more than 100K lines of code. The core of kiss_fft is about 500 lines (cpx 1-d).
|
|
shun_iwasawa |
a35b8f |
2. It took me an embarrassingly long time to get FFT_BRANDX working.
|
|
shun_iwasawa |
a35b8f |
3. A simple program using FFT_BRANDX is 522KB. A similar program using kiss_fft is 18KB (without optimizing for size).
|
|
shun_iwasawa |
a35b8f |
4. FFT_BRANDX is roughly twice as fast as KISS FFT in default mode.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
It is wonderful that free, highly optimized libraries like FFT_BRANDX exist.
|
|
shun_iwasawa |
a35b8f |
But such libraries carry a huge burden of complexity necessary to extract every
|
|
shun_iwasawa |
a35b8f |
last bit of performance.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Sometimes simpler is better, even if it's not better.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
FREQUENTLY ASKED QUESTIONS:
|
|
shun_iwasawa |
a35b8f |
Q: Can I use kissfft in a project with a ___ license?
|
|
shun_iwasawa |
a35b8f |
A: Yes. See LICENSE below.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Q: Why don't I get the output I expect?
|
|
shun_iwasawa |
a35b8f |
A: The two most common causes of this are
|
|
shun_iwasawa |
a35b8f |
1) scaling : is there a constant multiplier between what you got and what you want?
|
|
shun_iwasawa |
a35b8f |
2) mixed build environment -- all code must be compiled with same preprocessor
|
|
shun_iwasawa |
a35b8f |
definitions for FIXED_POINT and kiss_fft_scalar
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Q: Will you write/debug my code for me?
|
|
shun_iwasawa |
a35b8f |
A: Probably not unless you pay me. I am happy to answer pointed and topical questions, but
|
|
shun_iwasawa |
a35b8f |
I may refer you to a book, a forum, or some other resource.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
PERFORMANCE:
|
|
shun_iwasawa |
a35b8f |
(on Athlon XP 2100+, with gcc 2.96, float data type)
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Kiss performed 10000 1024-pt cpx ffts in .63 s of cpu time.
|
|
shun_iwasawa |
a35b8f |
For comparison, it took md5sum twice as long to process the same amount of data.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Transforming 5 minutes of CD quality audio takes less than a second (nfft=1024).
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
DO NOT:
|
|
shun_iwasawa |
a35b8f |
... use Kiss if you need the Fastest Fourier Transform in the World
|
|
shun_iwasawa |
a35b8f |
... ask me to add features that will bloat the code
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
UNDER THE HOOD:
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Kiss FFT uses a time decimation, mixed-radix, out-of-place FFT. If you give it an input buffer
|
|
shun_iwasawa |
a35b8f |
and output buffer that are the same, a temporary buffer will be created to hold the data.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
No static data is used. The core routines of kiss_fft are thread-safe (but not all of the tools directory).
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
No scaling is done for the floating point version (for speed).
|
|
shun_iwasawa |
a35b8f |
Scaling is done both ways for the fixed-point version (for overflow prevention).
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
Optimized butterflies are used for factors 2,3,4, and 5.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
The real (i.e. not complex) optimization code only works for even length ffts. It does two half-length
|
|
shun_iwasawa |
a35b8f |
FFTs in parallel (packed into real&imag), and then combines them via twiddling. The result is
|
|
shun_iwasawa |
a35b8f |
nfft/2+1 complex frequency bins from DC to Nyquist. If you don't know what this means, search the web.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
The fast convolution filtering uses the overlap-scrap method, slightly
|
|
shun_iwasawa |
a35b8f |
modified to put the scrap at the tail.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
LICENSE:
|
|
shun_iwasawa |
a35b8f |
Revised BSD License, see COPYING for verbiage.
|
|
shun_iwasawa |
a35b8f |
Basically, "free to use&change, give credit where due, no guarantees"
|
|
shun_iwasawa |
a35b8f |
Note this license is compatible with GPL at one end of the spectrum and closed, commercial software at
|
|
shun_iwasawa |
a35b8f |
the other end. See http://www.fsf.org/licensing/licenses
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
A commercial license is available which removes the requirement for attribution. Contact me for details.
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
TODO:
|
|
shun_iwasawa |
a35b8f |
*) Add real optimization for odd length FFTs
|
|
shun_iwasawa |
a35b8f |
*) Document/revisit the input/output fft scaling
|
|
shun_iwasawa |
a35b8f |
*) Make doc describing the overlap (tail) scrap fast convolution filtering in kiss_fastfir.c
|
|
shun_iwasawa |
a35b8f |
*) Test all the ./tools/ code with fixed point (kiss_fastfir.c doesn't work, maybe others)
|
|
shun_iwasawa |
a35b8f |
|
|
shun_iwasawa |
a35b8f |
AUTHOR:
|
|
shun_iwasawa |
a35b8f |
Mark Borgerding
|
|
shun_iwasawa |
a35b8f |
Mark@Borgerding.net
|