Blob Blame Raw
        Weird Performance

1. If you see serious performance loss (extremely low performance),
   probably you created too many threads or process. Basically GotoBLAS
   assumes that available cores that you specify are exclusively for
   BLAS computation. Even one small thread/process conflicts with BLAS
   threads, performance will become worse. 

   The best solution is to reduce your number of threads or insert
   some synchronization mechanism and suspend your threads until BLAS
   operation is finished.


2. Simlar problem may happen under virtual machine. If supervisor
   allocates different cores for each scheduling, BLAS performnace
   will be bad. This is because BLAS also utilizes all cache,
   unexpected re-schedule for different core may result of heavy
   performance loss.


Anyway, if you see any weird performance loss, it means your code or
algorithm is not optimal.