Diagnostics (distributed)¶
The Dask distributed scheduler provides live feedback in two forms:
- An interactive dashboard containing many plots and tables with live information
- A progress bar suitable for interactive use in consoles or notebooks
Dashboard¶
If Bokeh is installed then the dashboard will start up automatically whenever the scheduler is created. For local use this happens when you create a client with no arguments:
from dask.distributed import Client
client = Client() # start distributed scheduler locally. Launch dashboard
It is typically served at http://localhost:8787/status ,
but may be served elsewhere if this port is taken.
The address of the dashboard will be displayed if you are in a Jupyter Notebook,
or can be queriesd from client.scheduler_info()['services']
.
There are numerous pages with information about task runtimes, communication, statistical profiling, load balancing, memory use, and much more. For more information we recommend the video guide above.
Client ([address, loop, timeout, …]) |
Connect to and drive computation on a distributed Dask cluster |
Capture diagnostics¶
get_task_stream ([client, plot, filename]) |
Collect task stream within a context block |
Client.profile ([key, start, stop, workers, …]) |
Collect statistical profiling information about recent work |
You can capture some of the same information that the dashboard presents for
offline processing using the get_task_stream
and Client.profile
functions. These capture the start and stop time of every task and transfer,
as well as the results of a statistical profiler.
with get_task_stream(plot='save', filename="task-stream.html") as ts:
x.compute()
client.profile(filename="dask-profile.html")
history = ts.data
Progress bar¶
progress (*futures, **kwargs) |
Track progress of futures |
The dask.distributed
progress bar differs from the ProgressBar
used for
local diagnostics.
The progress
function takes a Dask object that is executing in the background:
# Single machine progress bar
from dask.diagnostics import ProgressBar
with ProgressBar():
x.compute()
# Distributed scheduler ProgressBar
from dask.distributed import Client, progress
client = Client() # use dask.distributed by default
x = x.persist() # start computation in the background
progress(x) # watch progress
x.compute() # convert to final result when done if desired
External Documentation¶
More in-depth technical documentation about Dask’s distributed scheduler is available at https://distributed.dask.org/en/latest
API¶
-
dask.distributed.
progress
(*futures, **kwargs)¶ Track progress of futures
This operates differently in the notebook and the console
- Notebook: This returns immediately, leaving an IPython widget on screen
- Console: This blocks until the computation completes
Parameters: futures: Futures
A list of futures or keys to track
notebook: bool (optional)
Running in the notebook or not (defaults to guess)
multi: bool (optional)
Track different functions independently (defaults to True)
complete: bool (optional)
Track all keys (True) or only keys that have not yet run (False) (defaults to True)
Notes
In the notebook, the output of progress must be the last statement in the cell. Typically, this means calling progress at the end of a cell.
Examples
>>> progress(futures) # doctest: +SKIP [########################################] | 100% Completed | 1.7s
-
dask.distributed.
get_task_stream
(client=None, plot=False, filename='task-stream.html')¶ Collect task stream within a context block
This provides diagnostic information about every task that was run during the time when this block was active.
This must be used as a context manager.
Parameters: plot: boolean, str
If true then also return a Bokeh figure If plot == ‘save’ then save the figure to a file
filename: str (optional)
The filename to save to if you set
plot='save'
See also
Client.get_task_stream
- Function version of this context manager
Examples
>>> with get_task_stream() as ts: ... x.compute() >>> ts.data [...]
Get back a Bokeh figure and optionally save to a file
>>> with get_task_stream(plot='save', filename='task-stream.html') as ts: ... x.compute() >>> ts.figure <Bokeh Figure>
To share this file with others you may wish to upload and serve it online. A common way to do this is to upload the file as a gist, and then serve it on https://rawgit.com
$ pip install gist $ gist task-stream.html https://gist.github.com/8a5b3c74b10b413f612bb5e250856ceb
You can then navigate to that site, click the “Raw” button to the right of the
task-stream.html
file, and then provide that URL to https://rawgit.com . This process should provide a sharable link that others can use to see your task stream plot.