kusano 2b45e8
	GotoBLAS2 FAQ
kusano 2b45e8
kusano 2b45e8
1. General
kusano 2b45e8
kusano 2b45e8
1.1  Q Can I find useful paper about GotoBLAS2?
kusano 2b45e8
kusano 2b45e8
     A You may check following URL.
kusano 2b45e8
kusano 2b45e8
     http://www.cs.utexas.edu/users/flame/Publications/index.htm
kusano 2b45e8
kusano 2b45e8
    11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of
kusano 2b45e8
	High-Performance Matrix Multiplication," ACM Transactions on
kusano 2b45e8
	Mathematical Software, accepted.
kusano 2b45e8
kusano 2b45e8
    15. Kazushige Goto and Robert van de Geijn, "High-Performance
kusano 2b45e8
        Implementation of the Level-3 BLAS." ACM Transactions on
kusano 2b45e8
        Mathematical Software, submitted.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.2  Q Does GotoBLAS2 work with Hyperthread (SMT)?
kusano 2b45e8
kusano 2b45e8
     A Yes, it will work. GotoBLAS2 detects Hyperthread and
kusano 2b45e8
       avoid scheduling on the same core.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.3  Q When I type "make", following error occured. What's wrong?
kusano 2b45e8
kusano 2b45e8
	$shell> make
kusano 2b45e8
	"./Makefile.rule", line 58: Missing dependency operator
kusano 2b45e8
	"./Makefile.rule", line 61: Need an operator
kusano 2b45e8
	...
kusano 2b45e8
kusano 2b45e8
     A This error occurs because you didn't use GNU make. Some binary
kusano 2b45e8
       packages install GNU make as "gmake" and it's worth to try.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.4  Q Function "xxx" is slow. Why?
kusano 2b45e8
kusano 2b45e8
     A Generally GotoBLAS2 has many well optimized functions, but it's
kusano 2b45e8
       far and far from perfect. Especially Level 1/2 function
kusano 2b45e8
       performance depends on how you call BLAS. You should understand
kusano 2b45e8
       what happends between your function and GotoBLAS2 by using profile
kusano 2b45e8
       enabled version or hardware performance counter. Again, please
kusano 2b45e8
       don't regard GotoBLAS2 as a black box.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.5  Q I have a commercial C compiler and want to compile GotoBLAS2 with
kusano 2b45e8
       it. Is it possible?
kusano 2b45e8
kusano 2b45e8
     A All function that affects performance is written in assembler
kusano 2b45e8
       and C code is just used for wrapper of assembler functions or
kusano 2b45e8
       complicated functions. Also I use many inline assembler functions,
kusano 2b45e8
       unfortunately most of commercial compiler can't handle inline
kusano 2b45e8
       assembler. Therefore you should use gcc.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.6  Q I use OpenMP compiler. How can I use GotoBLAS2 with it?
kusano 2b45e8
kusano 2b45e8
     A Please understand that OpenMP is a compromised method to use 
kusano 2b45e8
       thread. If you want to use OpenMP based code with GotoBLAS2, you
kusano 2b45e8
       should enable "USE_OPENMP=1" in Makefile.rule.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
1.7  Q Could you tell me how to use profiled library?
kusano 2b45e8
kusano 2b45e8
     A You need to build and link your application with -pg
kusano 2b45e8
       option. After executing your application, "gmon.out" is
kusano 2b45e8
       generated in your current directory.
kusano 2b45e8
kusano 2b45e8
       $shell> gprof <your application="" name=""> gmon.out</your>
kusano 2b45e8
kusano 2b45e8
       Each sample counts as 0.01 seconds.
kusano 2b45e8
	 %   cumulative   self              self     total
kusano 2b45e8
	time   seconds   seconds    calls  Ks/call  Ks/call  name
kusano 2b45e8
	89.86    975.02   975.02    79317     0.00     0.00  .dgemm_kernel
kusano 2b45e8
	 4.19   1020.47    45.45       40     0.00     0.00  .dlaswp00N
kusano 2b45e8
	 2.28   1045.16    24.69     2539     0.00     0.00  .dtrsm_kernel_LT
kusano 2b45e8
	 1.19   1058.03    12.87    79317     0.00     0.00  .dgemm_otcopy
kusano 2b45e8
	 1.05   1069.40    11.37     4999     0.00     0.00  .dgemm_oncopy
kusano 2b45e8
       ....
kusano 2b45e8
kusano 2b45e8
       I think profiled BLAS library is really useful for your
kusano 2b45e8
       research. Please find bottleneck of your application and
kusano 2b45e8
       improve it.
kusano 2b45e8
kusano 2b45e8
1.8  Q Is number of thread limited?
kusano 2b45e8
kusano 2b45e8
     A Basically, there is no limitation about number of threads. You
kusano 2b45e8
       can specify number of threads as many as you want, but larger
kusano 2b45e8
       number of threads will consume extra resource. I recommend you to
kusano 2b45e8
       specify minimum number of threads.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
2. Architecture Specific issue or Implementation
kusano 2b45e8
kusano 2b45e8
2.1 Q GotoBLAS2 seems to support any combination with OS and
kusano 2b45e8
      architecture. Is it possible?
kusano 2b45e8
kusano 2b45e8
    A Combination is limited by current OS and architecture. For
kusano 2b45e8
      examble, the combination OSX with SPARC is impossible. But it
kusano 2b45e8
      will be possible with slight modification if these combination
kusano 2b45e8
      appears in front of us.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
2.2 Q I have POWER architecture systems. Do I need extra work?
kusano 2b45e8
kusano 2b45e8
    A Although POWER architecture defined special instruction
kusano 2b45e8
      like CPUID to detect correct architecture, it's privileged
kusano 2b45e8
      and can't be accessed by user process. So you have to set
kusano 2b45e8
      the architecture that you have manually in getarch.c.
kusano 2b45e8
kusano 2b45e8
kusano 2b45e8
2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong?
kusano 2b45e8
kusano 2b45e8
    A You have to make sure if lib.exe and mspdb80.dll are in Microsoft
kusano 2b45e8
      Studio PATH. The easiest way is to use 'which' command.
kusano 2b45e8
kusano 2b45e8
    $shell> which lib.exe
kusano 2b45e8
    /cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe