Time series data¶
A major use case for xarray is multi-dimensional time-series data. Accordingly, we’ve copied many of features that make working with time-series data in pandas such a joy to xarray. In most cases, we rely on pandas for the core functionality.
Creating datetime64 data¶
xarray uses the numpy dtypes datetime64[ns] and timedelta64[ns] to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
To convert to or create regular arrays of datetime64 data, we recommend
using pandas.to_datetime() and pandas.date_range():
In [1]: pd.to_datetime(['2000-01-01', '2000-02-02'])
Out[1]: DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)
In [2]: pd.date_range('2000-01-01', periods=365)
Out[2]:
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
'2000-01-09', '2000-01-10',
...
'2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
'2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
'2000-12-29', '2000-12-30'],
dtype='datetime64[ns]', length=365, freq='D')
Alternatively, you can supply arrays of Python datetime objects. These get
converted automatically when used as arguments in xarray objects:
In [3]: import datetime
In [4]: xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
Out[4]:
<xarray.Dataset>
Dimensions: ()
Data variables:
time datetime64[ns] 2000-01-01
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using CF conventions (that is, by using a units
attribute like 'days since 2000-01-01').
You can manual decode arrays in this form by passing a dataset to
decode_cf():
In [5]: attrs = {'units': 'hours since 2000-01-01'}
In [6]: ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})
In [7]: xr.decode_cf(ds)
Out[7]:
<xarray.Dataset>
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
Data variables:
*empty*
One unfortunate limitation of using datetime64[ns] is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of netcdftime.datetime objects.
Datetime indexing¶
xarray borrows powerful indexing machinery from pandas (see Indexing and selecting data).
This allows for several useful and suscinct forms of indexing, particularly for datetime64 data. For example, we support indexing with strings for single items and with the slice object:
In [8]: time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)
In [9]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
In [10]: ds.sel(time='2000-01')
Out[10]:
<xarray.Dataset>
Dimensions: (time: 744)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
Data variables:
foo (time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
In [11]: ds.sel(time=slice('2000-06-01', '2000-06-10'))
Out[11]:
<xarray.Dataset>
Dimensions: (time: 240)
Coordinates:
* time (time) datetime64[ns] 2000-06-01 2000-06-01T01:00:00 ...
Data variables:
foo (time) int64 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 ...
You can also select a particular time by indexing with a
datetime.time object:
In [12]: ds.sel(time=datetime.time(12))
Out[12]:
<xarray.Dataset>
Dimensions: (time: 365)
Coordinates:
* time (time) datetime64[ns] 2000-01-01T12:00:00 2000-01-02T12:00:00 ...
Data variables:
foo (time) int64 12 36 60 84 108 132 156 180 204 228 252 276 300 ...
For more details, read the pandas documentation.
Datetime components¶
Similar to pandas, the components of datetime objects contained in a
given DataArray can be quickly computed using a special .dt accessor.
In [13]: time = time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
In [14]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
ValueErrorTraceback (most recent call last)
<ipython-input-14-6d0a5b014145> in <module>()
----> 1 ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
/srv/build/xarray/python-xarray-0.9.6/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
351 coords = {}
352 if data_vars is not None or coords is not None:
--> 353 self._set_init_vars_and_dims(data_vars, coords, compat)
354 if attrs is not None:
355 self.attrs = attrs
/srv/build/xarray/python-xarray-0.9.6/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
366
367 variables, coord_names, dims = merge_data_and_coords(
--> 368 data_vars, coords, compat=compat)
369
370 self._variables = variables
/srv/build/xarray/python-xarray-0.9.6/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
363 objs = [data, coords]
364 explicit_coords = coords.keys()
--> 365 return merge_core(objs, compat, join, explicit_coords=explicit_coords)
366
367
/srv/build/xarray/python-xarray-0.9.6/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
422 assert_unique_multiindex_level_names(variables)
423
--> 424 dims = calculate_dimensions(variables)
425
426 for dim, size in dims.items():
/srv/build/xarray/python-xarray-0.9.6/xarray/core/dataset.py in calculate_dimensions(variables)
108 raise ValueError('conflicting sizes for dimension %r: '
109 'length %s on %r and length %s on %r' %
--> 110 (dim, size, k, dims[dim], last_used[dim]))
111 return dims
112
ValueError: conflicting sizes for dimension 'time': length 1460 on 'time' and length 8760 on 'foo'
In [15]: ds.time.dt.hour
Out[15]:
<xarray.DataArray 'hour' (time: 8760)>
array([ 0, 1, 2, ..., 21, 22, 23])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
In [16]: ds.time.dt.dayofweek
Out[16]:
<xarray.DataArray 'dayofweek' (time: 8760)>
array([5, 5, 5, ..., 5, 5, 5])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
The .dt accessor works on both coordinate dimensions as well as
multi-dimensional data.
xarray also supports a notion of “virtual” or “derived” coordinates for datetime components implemented by pandas, including “year”, “month”, “day”, “hour”, “minute”, “second”, “dayofyear”, “week”, “dayofweek”, “weekday” and “quarter”:
In [17]: ds['time.month']
Out[17]:
<xarray.DataArray 'month' (time: 8760)>
array([ 1, 1, 1, ..., 12, 12, 12])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
In [18]: ds['time.dayofyear']
Out[18]:
<xarray.DataArray 'dayofyear' (time: 8760)>
array([ 1, 1, 1, ..., 365, 365, 365])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
For use as a derived coordinate, xarray adds 'season' to the list of
datetime components supported by pandas:
In [19]: ds['time.season']
Out[19]:
<xarray.DataArray 'season' (time: 8760)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'],
dtype='|S3')
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
In [20]: ds['time'].dt.season
Out[20]:
<xarray.DataArray 'season' (time: 8760)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'],
dtype='|S3')
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
The set of valid seasons consists of ‘DJF’, ‘MAM’, ‘JJA’ and ‘SON’, labeled by the first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
Resampling and grouped operations¶
Datetime components couple particularly well with grouped operations (see GroupBy: split-apply-combine) for analyzing features that repeat over time. Here’s how to calculate the mean by time of day:
In [21]: ds.groupby('time.hour').mean()
Out[21]:
<xarray.Dataset>
Dimensions: (hour: 24)
Coordinates:
* hour (hour) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
Data variables:
foo (hour) float64 4.368e+03 4.369e+03 4.37e+03 4.371e+03 4.372e+03 ...
For upsampling or downsampling temporal resolutions, xarray offers a
resample() method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as resample in pandas.
For example, we can downsample our dataset from hourly to 6-hourly:
In [22]: ds.resample('6H', dim='time', how='mean')
Out[22]:
<xarray.Dataset>
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
Data variables:
foo (time) float64 2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5 50.5 56.5 ...
Resample also works for upsampling, in which case intervals without any
values are marked by NaN:
In [23]: ds.resample('30Min', 'time')
Out[23]:
<xarray.Dataset>
Dimensions: (time: 17519)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T00:30:00 ...
Data variables:
foo (time) float64 0.0 nan 1.0 nan 2.0 nan 3.0 nan 4.0 nan 5.0 nan ...
Of course, all of these resampling and groupby operation work on both Dataset and DataArray objects with any number of additional dimensions.
For more examples of using grouped operations on a time dimension, see Toy weather data.