StarPU Handbook
|
The behavior of the StarPU library and tools may be tuned thanks to the following configure options.
Enable checking that spinlocks are taken and released properly.
Increase the verbosity of the debugging messages. This can be disabled at runtime by setting the environment variable STARPU_SILENT to any value.
$ STARPU_SILENT=1 ./vector_scal
Specify tests and examples should be run on a smaller data set, i.e allowing a faster execution time
Enable some exhaustive checks which take a really long time.
Specify hwloc
should be used by StarPU. hwloc
should be found by the means of the tool pkg-config
.
prefix
Specify hwloc
should be used by StarPU. hwloc
should be found in the directory specified by prefix
doxygen
and latex
(plus the packages latex-xcolor
and texlive-latex-extra
). Additionally, the script configure
recognize many variables, which can be listed by typing ./configure –help
. For example, ./configure NVCCFLAGS="-arch sm_13"
adds a flag for the compilation of CUDA kernels.
count
Use at most count
CPU cores. This information is then available as the macro ::STARPU_MAXCPUS.
Disable the use of CPUs of the machine. Only GPUs etc. will be used.
count
Use at most count
CUDA devices. This information is then available as the macro STARPU_MAXCUDADEVS.
Disable the use of CUDA, even if a valid CUDA installation was detected.
prefix
Search for CUDA under prefix
, which should notably contain the file include/cuda.h
.
dir
Search for CUDA headers under dir
, which should notably contain the file cuda.h
. This defaults to /include
appended to the value given to --with-cuda-dir.
dir
Search for CUDA libraries under dir
, which should notably contain the CUDA shared libraries—e.g., libcuda.so
. This defaults to /lib
appended to the value given to --with-cuda-dir.
count
Use at most count
OpenCL devices. This information is then available as the macro STARPU_MAXOPENCLDEVS.
prefix
Search for an OpenCL implementation under prefix
, which should notably contain include/CL/cl.h
(or include/OpenCL/cl.h
on Mac OS).
dir
Search for OpenCL headers under dir
, which should notably contain CL/cl.h
(or OpenCL/cl.h
on Mac OS). This defaults to /include
appended to the value given to --with-opencl-dir.
dir
Search for an OpenCL library under dir
, which should notably contain the OpenCL shared libraries—e.g. libOpenCL.so
. This defaults to /lib
appended to the value given to --with-opencl-dir.
Enable considering the provided OpenCL implementation as a simulator, i.e. use the kernel duration returned by OpenCL profiling information as wallclock time instead of the actual measured real time. This requires simgrid support.
count
Allow for at most count
codelet implementations for the same target device. This information is then available as the macro ::STARPU_MAXIMPLEMENTATIONS macro.
count
Allow for at most count
scheduling contexts This information is then available as the macro ::STARPU_NMAX_SCHED_CTXS.
Disable asynchronous copies between CPU and GPU devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.
Disable asynchronous copies between CPU and OpenCL devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.
Disable the SOCL extension (SOCL OpenCL Extensions). By default, it is enabled when an OpenCL implementation is found.
Disable the StarPU-Top interface (StarPU-Top Interface). By default, it is enabled when the required dependencies are found.
Disable the GCC plug-in (C Extensions). By default, it is enabled when the GCC compiler provides a plug-in support.
path
Use the compiler mpicc
at path
, for StarPU-MPI. (MPI Support).
(see ../../src/datawizard/datastats.c) Enable gathering of various data statistics (Data Statistics).
Define the maximum number of buffers that tasks will be able to take as parameters, then available as the macro STARPU_NMAXBUFS.
Enable the use of a data allocation cache to avoid the cost of it with CUDA. Still experimental.
Enable the use of OpenGL for the rendering of some examples.