|
kusano |
2b45e8 |
GotoBLAS2 FAQ
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1. General
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.1 Q Can I find useful paper about GotoBLAS2?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A You may check following URL.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
http://www.cs.utexas.edu/users/flame/Publications/index.htm
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of
|
|
kusano |
2b45e8 |
High-Performance Matrix Multiplication," ACM Transactions on
|
|
kusano |
2b45e8 |
Mathematical Software, accepted.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
15. Kazushige Goto and Robert van de Geijn, "High-Performance
|
|
kusano |
2b45e8 |
Implementation of the Level-3 BLAS." ACM Transactions on
|
|
kusano |
2b45e8 |
Mathematical Software, submitted.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.2 Q Does GotoBLAS2 work with Hyperthread (SMT)?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Yes, it will work. GotoBLAS2 detects Hyperthread and
|
|
kusano |
2b45e8 |
avoid scheduling on the same core.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.3 Q When I type "make", following error occured. What's wrong?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
$shell> make
|
|
kusano |
2b45e8 |
"./Makefile.rule", line 58: Missing dependency operator
|
|
kusano |
2b45e8 |
"./Makefile.rule", line 61: Need an operator
|
|
kusano |
2b45e8 |
...
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A This error occurs because you didn't use GNU make. Some binary
|
|
kusano |
2b45e8 |
packages install GNU make as "gmake" and it's worth to try.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.4 Q Function "xxx" is slow. Why?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Generally GotoBLAS2 has many well optimized functions, but it's
|
|
kusano |
2b45e8 |
far and far from perfect. Especially Level 1/2 function
|
|
kusano |
2b45e8 |
performance depends on how you call BLAS. You should understand
|
|
kusano |
2b45e8 |
what happends between your function and GotoBLAS2 by using profile
|
|
kusano |
2b45e8 |
enabled version or hardware performance counter. Again, please
|
|
kusano |
2b45e8 |
don't regard GotoBLAS2 as a black box.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.5 Q I have a commercial C compiler and want to compile GotoBLAS2 with
|
|
kusano |
2b45e8 |
it. Is it possible?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A All function that affects performance is written in assembler
|
|
kusano |
2b45e8 |
and C code is just used for wrapper of assembler functions or
|
|
kusano |
2b45e8 |
complicated functions. Also I use many inline assembler functions,
|
|
kusano |
2b45e8 |
unfortunately most of commercial compiler can't handle inline
|
|
kusano |
2b45e8 |
assembler. Therefore you should use gcc.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Please understand that OpenMP is a compromised method to use
|
|
kusano |
2b45e8 |
thread. If you want to use OpenMP based code with GotoBLAS2, you
|
|
kusano |
2b45e8 |
should enable "USE_OPENMP=1" in Makefile.rule.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.7 Q Could you tell me how to use profiled library?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A You need to build and link your application with -pg
|
|
kusano |
2b45e8 |
option. After executing your application, "gmon.out" is
|
|
kusano |
2b45e8 |
generated in your current directory.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
$shell> gprof <your application="" name=""> gmon.out</your>
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
Each sample counts as 0.01 seconds.
|
|
kusano |
2b45e8 |
% cumulative self self total
|
|
kusano |
2b45e8 |
time seconds seconds calls Ks/call Ks/call name
|
|
kusano |
2b45e8 |
89.86 975.02 975.02 79317 0.00 0.00 .dgemm_kernel
|
|
kusano |
2b45e8 |
4.19 1020.47 45.45 40 0.00 0.00 .dlaswp00N
|
|
kusano |
2b45e8 |
2.28 1045.16 24.69 2539 0.00 0.00 .dtrsm_kernel_LT
|
|
kusano |
2b45e8 |
1.19 1058.03 12.87 79317 0.00 0.00 .dgemm_otcopy
|
|
kusano |
2b45e8 |
1.05 1069.40 11.37 4999 0.00 0.00 .dgemm_oncopy
|
|
kusano |
2b45e8 |
....
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
I think profiled BLAS library is really useful for your
|
|
kusano |
2b45e8 |
research. Please find bottleneck of your application and
|
|
kusano |
2b45e8 |
improve it.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
1.8 Q Is number of thread limited?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Basically, there is no limitation about number of threads. You
|
|
kusano |
2b45e8 |
can specify number of threads as many as you want, but larger
|
|
kusano |
2b45e8 |
number of threads will consume extra resource. I recommend you to
|
|
kusano |
2b45e8 |
specify minimum number of threads.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
2. Architecture Specific issue or Implementation
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
2.1 Q GotoBLAS2 seems to support any combination with OS and
|
|
kusano |
2b45e8 |
architecture. Is it possible?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Combination is limited by current OS and architecture. For
|
|
kusano |
2b45e8 |
examble, the combination OSX with SPARC is impossible. But it
|
|
kusano |
2b45e8 |
will be possible with slight modification if these combination
|
|
kusano |
2b45e8 |
appears in front of us.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
2.2 Q I have POWER architecture systems. Do I need extra work?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A Although POWER architecture defined special instruction
|
|
kusano |
2b45e8 |
like CPUID to detect correct architecture, it's privileged
|
|
kusano |
2b45e8 |
and can't be accessed by user process. So you have to set
|
|
kusano |
2b45e8 |
the architecture that you have manually in getarch.c.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong?
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
A You have to make sure if lib.exe and mspdb80.dll are in Microsoft
|
|
kusano |
2b45e8 |
Studio PATH. The easiest way is to use 'which' command.
|
|
kusano |
2b45e8 |
|
|
kusano |
2b45e8 |
$shell> which lib.exe
|
|
kusano |
2b45e8 |
/cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe
|