What’s New¶
v0.9.2 (2 April, 2017)¶
The minor release includes bug-fixes and backwards compatible enhancements.
Enhancements¶
.rolling()on Dataset is now supported (GH859). By Keisuke Fujii.- When bottleneck version 1.1 or later is installed, use bottleneck for rolling
var,argmin,argmax, andrankcomputations. Also, rolling median now accepts amin_periodsargument (GH1276). By Joe Hamman. - When
.plot()is called on a 2D DataArray and only one dimension is specified withx=ory=, the other dimension is now guessed (GH1291). By Vincent Noel. - Added new method
assign_attrs()toDataArrayandDataset, a chained-method compatible implementation of thedict.updatemethod on attrs (GH1281). By Henry S. Harrison. - Added new
autoclose=Trueargument toopen_mfdataset()to explicitly close opened files when not in use to prevent occurrence of an OS Error related to too many open files (GH1198). Note, the default isautoclose=False, which is consistent with previous xarray behavior. By Phillip J. Wolfram. - The
repr()ofDatasetandDataArrayattributes uses a similar format to coordinates and variables, with vertically aligned entries truncated to fit on a single line (GH1319). Hopefully this will stop people writingdata.attrs = {}and discarding metadata in notebooks for the sake of cleaner output. The full metadata is still available asdata.attrs. By Zac Hatfield-Dodds. - Enhanced tests suite by use of
@slowand@flakydecorators, which are controlled via--run-flakyand--skip-slowcommand line arguments topy.test(GH1336). By Stephan Hoyer and Phillip J. Wolfram.
Bug fixes¶
- Rolling operations now keep preserve original dimension order (GH1125). By Keisuke Fujii.
- Fixed
selwithmethod='nearest'on Python 2.7 and 64-bit Windows (GH1140). Stephan Hoyer. - Fixed
wherewithdrop='True'for empty masks (GH1341). By Stephan Hoyer and Phillip J. Wolfram.
v0.9.1 (30 January 2017)¶
Renamed the “Unindexed dimensions” section in the Dataset and
DataArray repr (added in v0.9.0) to “Dimensions without coordinates”
(GH1199).
v0.9.0 (25 January 2017)¶
This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:
- Coordinates are now optional in the xarray data model, even for dimensions.
- Changes to caching, lazy loading and pickling to improve xarray’s experience for parallel computing.
- Improvements for accessing and manipulating
pandas.MultiIndexlevels. - Many new methods and functions, including
quantile(),cumsum(),cumprod()combine_firstset_index(),reset_index(),reorder_levels(),full_like(),zeros_like(),ones_like()open_dataarray(),compute(),Dataset.info(),testing.assert_equal(),testing.assert_identical(), andtesting.assert_allclose().
Breaking changes¶
Index coordinates for each dimensions are now optional, and no longer created by default GH1017. You can identify such dimensions without coordinates by their appearance in list of “Dimensions without coordinates” in the
DatasetorDataArrayrepr:In [1]: xr.Dataset({'foo': (('x', 'y'), [[1, 2]])}) Out[1]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: foo (x, y) int64 1 2
This has a number of implications:
align()andreindex()can now error, if dimensions labels are missing and dimensions have different sizes.- Because pandas does not support missing indexes, methods such as
to_dataframe/from_dataframeandstack/unstackno longer roundtrip faithfully on all inputs. Usereset_index()to remove undesired indexes. Dataset.__delitem__anddrop()no longer delete/drop variables that have dimensions matching a deleted/dropped variable.DataArray.coords.__delitem__is now allowed on variables matching dimension names..seland.locnow handle indexing along a dimension without coordinate labels by doing integer based indexing. See Missing coordinate labels for an example.indexesis no longer guaranteed to include all dimensions names as keys. The new methodget_index()has been added to get an index for a dimension guaranteed, falling back to produce a defaultRangeIndexif necessary.
The default behavior of
mergeis nowcompat='no_conflicts', so some merges will now succeed in cases that previously raisedxarray.MergeError. Setcompat='broadcast_equals'to restore the previous default. See Merging with ‘no_conflicts’ for more details.Reading
valuesno longer always caches values in a NumPy array GH1128. Caching of.valueson variables read from netCDF files on disk is still the default whenopen_dataset()is called withcache=True. By Guido Imperiale and Stephan Hoyer.Pickling a
DatasetorDataArraylinked to a file on disk no longer caches its values into memory before pickling (GH1128). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer.Coordinates used to index a dimension are now loaded eagerly into
pandas.Indexobjects, instead of loading the values lazily. By Guido Imperiale.Automatic levels for 2d plots are now guaranteed to land on
vminandvmaxwhen these kwargs are explicitly provided (GH1191). The automated level selection logic also slightly changed. By Fabien Maussion.DataArray.rename()behavior changed to strictly change theDataArray.nameif called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.By default
to_netcdf()add a_FillValue = NaNattributes to float types. By Frederic Laliberte.repronDataArrayobjects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (GH1207). By Stephan Hoyer.xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.
Deprecations¶
- Renamed the
Coordinateclass from xarray’s low level API toIndexVariable.Variable.to_variableandVariable.to_coordhave been renamed toto_base_variable()andto_index_variable(). - Deprecated supplying
coordsas a dictionary to theDataArrayconstructor without also supplying an explicitdimsargument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (GH727). - Removed a number of methods deprecated since v0.7.0 or earlier:
load_data,vars,drop_vars,dump,dumpsand thevariableskeyword argument toDataset. - Removed the dummy module that enabled
import xray.
Enhancements¶
- Added new method
combine_first()toDataArrayandDataset, based on the pandas method of the same name (see Combine). By Chun-Wei Yuan. - Added the ability to change default automatic alignment (arithmetic_join=”inner”)
for binary operations via
set_options()(see Automatic alignment). By Chun-Wei Yuan. - Add checking of
attrnames and values when saving to netCDF, raising useful error messages if they are invalid. (GH911). By Robin Wilson. - Added ability to save
DataArrayobjects directly to netCDF files usingto_netcdf(), and to load directly from netCDF files usingopen_dataarray()(GH915). These remove the need to convert aDataArrayto aDatasetbefore saving as a netCDF file, and deals with names to ensure a perfect ‘roundtrip’ capability. By Robin Wilson. - Multi-index levels are now accessible as “virtual” coordinate variables,
e.g.,
ds['time']can pull out the'time'level of a multi-index (see Coordinates).selalso accepts providing multi-index levels as keyword arguments, e.g.,ds.sel(time='2000-01')(see Multi-level indexing). By Benoit Bovy. - Added
set_index,reset_indexandreorder_levelsmethods to easily create and manipulate (multi-)indexes (see Set and reset index). By Benoit Bovy. - Added the
compatoption'no_conflicts'tomerge, allowing the combination of xarray objects with disjoint (GH742) or overlapping (GH835) coordinates as long as all present data agrees. By Johnnie Gray. See Merging with ‘no_conflicts’ for more details. - It is now possible to set
concat_dim=Noneexplicitly inopen_mfdataset()to disable inferring a dimension along which to concatenate. By Stephan Hoyer. - Added methods
DataArray.compute(),Dataset.compute(), andVariable.compute()as a non-mutating alternative toload(). By Guido Imperiale. - Adds DataArray and Dataset methods
cumsum()andcumprod(). By Phillip J. Wolfram. - New properties
Dataset.sizesandDataArray.sizesfor providing consistent access to dimension length on bothDatasetandDataArray(GH921). By Stephan Hoyer. - New keyword argument
drop=Trueforsel(),isel()andsqueeze()for dropping scalar coordinates that arise from indexing.DataArray(GH242). By Stephan Hoyer. - New top-level functions
full_like(),zeros_like(), andones_like()By Guido Imperiale. - Overriding a preexisting attribute with
register_dataset_accessor()orregister_dataarray_accessor()now issues a warning instead of raising an error (GH1082). By Stephan Hoyer. - Options for axes sharing between subplots are exposed to
FacetGridandplot(), so axes sharing can be disabled for polar plots. By Bas Hoonhout. - New utility functions
assert_equal(),assert_identical(), andassert_allclose()for asserting relationships between xarray objects, designed for use in a pytest test suite. figsize,sizeandaspectplot arguments are now supported for all plots (GH897). See Controlling the figure size for more details. By Stephan Hoyer and Fabien Maussion.- New
info()method to summarizeDatasetvariables and attributes. The method prints to a buffer (e.g.stdout) with output similar to what the command line utilityncdump -hproduces (GH1150). By Joe Hamman. - Added the ability write unlimited netCDF dimensions with the
scipyandnetcdf4backends via the newencodingattribute or via theunlimited_dimsargument toto_netcdf(). By Joe Hamman. - New
quantile()method to calculate quantiles from DataArray objects (GH1187). By Joe Hamman.
Bug fixes¶
groupby_binsnow restores empty bins by default (GH1019). By Ryan Abernathey.- Fix issues for dates outside the valid range of pandas timestamps (GH975). By Mathias Hauser.
- Unstacking produced flipped array after stacking decreasing coordinate values (GH980). By Stephan Hoyer.
- Setting
dtypevia theencodingparameter ofto_netcdffailed if the encoded dtype was the same as the dtype of the original array (GH873). By Stephan Hoyer. - Fix issues with variables where both attributes
_FillValueandmissing_valueare set toNaN(GH997). By Marco Zühlke. .where()and.fillna()now preserve attributes (GH1009). By Fabien Maussion.- Applying
broadcast()to an xarray object based on the dask backend won’t accidentally convert the array from dask to numpy anymore (GH978). By Guido Imperiale. Dataset.concat()now preserves variables order (GH1027). By Fabien Maussion.- Fixed an issue with pcolormesh (GH781). A new
infer_intervalskeyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion. - Grouping over an dimension with non-unique values with
groupbygives correct groups. By Stephan Hoyer. - Fixed accessing coordinate variables with non-string names from
.coords. By Stephan Hoyer. rename()now simultaneously renames the array and any coordinate with the same name, when supplied via adict(GH1116). By Yves Delley.- Fixed sub-optimal performance in certain operations with object arrays (GH1121). By Yves Delley.
- Fix
.groupby(group)whengrouphas datetime dtype (GH1132). By Jonas Sølvsteen. - Fixed a bug with facetgrid (the
normkeyword was ignored, GH1159). By Fabien Maussion. - Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (GH1172). By Stephan Hoyer.
- Fix to make
.copy()actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (GH1180). By Stephan Hoyer. - Fix opening NetCDF files with multi-dimensional time variables (GH1229). By Stephan Hoyer.
Performance improvements¶
isel_points()andsel_points()now use vectorised indexing in numpy and dask (GH1161), which can result in several orders of magnitude speedup. By Jonathan Chambers.
v0.8.2 (18 August 2016)¶
This release includes a number of bug fixes and minor enhancements.
Breaking changes¶
broadcast()andconcat()now auto-align inputs, usingjoin=outer. Previously, these functions raisedValueErrorfor non-aligned inputs. By Guido Imperiale.
Enhancements¶
- New documentation on Transitioning from pandas.Panel to xarray. By Maximilian Roos.
- New
DatasetandDataArraymethodsto_dict()andfrom_dict()to allow easy conversion between dictionaries and xarray objects (GH432). See dictionary IO for more details. By Julia Signell. - Added
excludeandindexesoptional parameters toalign(), andexcludeoptional parameter tobroadcast(). By Guido Imperiale. - Better error message when assigning variables without dimensions (GH971). By Stephan Hoyer.
- Better error message when reindex/align fails due to duplicate index values (GH956). By Stephan Hoyer.
Bug fixes¶
- Ensure xarray works with h5netcdf v0.3.0 for arrays with
dtype=str(GH953). By Stephan Hoyer. Dataset.__dir__()(i.e. the method python calls to get autocomplete options) failed if one of the dataset’s keys was not a string (GH852). By Maximilian Roos.Datasetconstructor can now take arbitrary objects as values (GH647). By Maximilian Roos.- Clarified
copyargument forreindex()andalign(), which now consistently always return new xarray objects (GH927). - Fix
open_mfdatasetwithengine='pynio'(GH936). By Stephan Hoyer. groupby_binssorted bin labels as strings (GH952). By Stephan Hoyer.- Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (GH956).
v0.8.1 (5 August 2016)¶
Bug fixes¶
- Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (GH943). By Stephan Hoyer.
v0.8.0 (2 August 2016)¶
This release includes four months of new features and bug fixes, including several breaking changes.
Breaking changes¶
- Dropped support for Python 2.6 (GH855).
- Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (GH802).
- Contour plots no longer add a colorbar per default (GH866). Filled contour plots are unchanged.
DataArray.valuesand.datanow always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (GH867). Previously,.valuesreturned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the.item()method.
Enhancements¶
- Groupby operations now support grouping over multidimensional variables. A new
method called
groupby_bins()has also been added to allow users to specify bins for grouping. The new features are described in Multidimensional Grouping and Working with Multidimensional Coordinates. By Ryan Abernathey. - DataArray and Dataset method
where()now supports adrop=Trueoption that clips coordinate elements that are fully masked. By Phillip J. Wolfram. - New top level
merge()function allows for combining variables from any number ofDatasetand/orDataArrayvariables. See Merge for more details. By Stephan Hoyer. - DataArray and Dataset method
resample()now supports thekeep_attrs=Falseoption that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon. - Better multi-index support in DataArray and Dataset
sel()andloc()methods, which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (see Multi-level indexing). By Benoit Bovy. - New (experimental) decorators
register_dataset_accessor()andregister_dataarray_accessor()for registering custom xarray extensions without subclassing. They are described in the new documentation page on xarray Internals. By Stephan Hoyer. - Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
- 2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (GH872). By Fabien Maussion.
- New Dataset method
filter_by_attrs(), akin tonetCDF4.Dataset.get_variables_by_attributes, to easily filter data variables using its attributes. Filipe Fernandes.
Bug fixes¶
- Attributes were being retained by default for some resampling
operations when they should not. With the
keep_attrs=Falseoption, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding thekeep_attrs=Trueoption. By Jeremy McGibbon. - Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (GH875). By Stephan Hoyer.
- Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat.
decode_cf_timedeltanow accepts arrays withndim>1 (GH842).- This fixes issue GH665. Filipe Fernandes.
- Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (GH876). By Stephan Hoyer.
- Support for pickling functions from
xarray.ufuncs(GH901). By Stephan Hoyer. Variable.copy(deep=True)no longer converts MultiIndex into a base Index (GH769). By Benoit Bovy.- Fixes for groupby on dimensions with a multi-index (GH867). By Stephan Hoyer.
- Fix printing datasets with unicode attributes on Python 2 (GH892). By Stephan Hoyer.
- Fixed incorrect test for dask version (GH891). By Stephan Hoyer.
- Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
contour()now plots the correct number of contours (GH866). By Fabien Maussion.
v0.7.2 (13 March 2016)¶
This release includes two new, entirely backwards compatible features and several bug fixes.
Enhancements¶
New DataArray method
DataArray.dot()for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil.Rolling window operations on DataArray objects are now supported via a new
DataArray.rolling()method. For example:In [2]: import xarray as xr; import numpy as np In [3]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=('x', 'y')) In [4]: arr Out[4]: <xarray.DataArray (x: 3, y: 5)> array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 2.5, 3. , 3.5, 4. , 4.5], [ 5. , 5.5, 6. , 6.5, 7. ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4 In [5]: arr.rolling(y=3, min_periods=2).mean() Out[5]: <xarray.DataArray (x: 3, y: 5)> array([[ nan, 0.25, 0.5 , 1. , 1.5 ], [ nan, 2.75, 3. , 3.5 , 4. ], [ nan, 5.25, 5.5 , 6. , 6.5 ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4
See Rolling window operations for more details. By Joe Hamman.
Bug fixes¶
- Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted
by the inference of the axis interval breaks. This change chooses not to modify
the coordinate variables when the axes have the attribute
projection, allowing Cartopy to handle the extent of pcolormesh plots (GH781). By Joe Hamman. - 2D plots now better handle additional coordinates which are not
DataArraydimensions (GH788). By Fabien Maussion.
v0.7.1 (16 February 2016)¶
This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.
Enhancements¶
Bug fixes¶
- Restore checks for shape consistency between data and coordinates in the DataArray constructor (GH758).
- Single dimension variables no longer transpose as part of a broader
.transpose. This behavior was causingpandas.PeriodIndexdimensions to lose their type (GH749) Datasetlabels remain as their native type on.to_dataset. Previously they were coerced to strings (GH745)- Fixed a bug where replacing a
DataArrayindex coordinate would improperly align the coordinate (GH725). DataArray.reindex_likenow maintains the dtype of complex numbers when reindexing leads to NaN values (GH738).Dataset.renameandDataArray.renamesupport the old and new names being the same (GH724).- Fix
from_dataset()for DataFrames with Categorical column and a MultiIndex index (GH737). - Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.
Acknowledgments¶
The following individuals contributed to this release:
- Edward Richards
- Maximilian Roos
- Rafael Guedes
- Spencer Hill
- Stephan Hoyer
v0.7.0 (21 January 2016)¶
This major release includes redesign of DataArray
internals, as well as new methods for reshaping, rolling and shifting
data. It includes preliminary support for pandas.MultiIndex,
as well as a number of other features and bug fixes, several of which
offer improved compatibility with pandas.
New name¶
The project formerly known as “xray” is now “xarray”, pronounced “x-array”! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:
To ease the transition, we have simultaneously released v0.7.0 of both
xray and xarray on the Python Package Index. These packages are
identical. For now, import xray still works, except it issues a
deprecation warning. This will be the last xray release. Going forward, we
recommend switching your import statements to import xarray as xr.
Breaking changes¶
The internal data model used by
DataArrayhas been rewritten to fix several outstanding issues (GH367, GH634, this stackoverflow report). Internally,DataArrayis now implemented in terms of._variableand._coordsattributes instead of holding variables in aDatasetobject.This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.
In practice, this means that creating a DataArray with the same
nameas one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here’s the old behavior:In [6]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[6]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6
and the new behavior (compare the values of the
xcoordinate):In [7]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[7]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 0 1 2
It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset()if it is unnamed. This will now raiseValueError. If the array is unnamed, you need to supply thenameargument.
Enhancements¶
Basic support for
MultiIndexcoordinates on xray objects, including indexing,stack()andunstack():In [8]: df = pd.DataFrame({'foo': range(3), ...: 'x': ['a', 'b', 'b'], ...: 'y': [0, 0, 1]}) ...: In [9]: s = df.set_index(['x', 'y'])['foo'] In [10]: arr = xray.DataArray(s, dims='z') In [11]: arr Out[11]: <xray.DataArray 'foo' (z: 3)> array([0, 1, 2]) Coordinates: * z (z) object ('a', 0) ('b', 0) ('b', 1) In [12]: arr.indexes['z'] Out[12]: MultiIndex(levels=[[u'a', u'b'], [0, 1]], labels=[[0, 1, 1], [0, 0, 1]], names=[u'x', u'y']) In [13]: arr.unstack('z') Out[13]: <xray.DataArray 'foo' (x: 2, y: 2)> array([[ 0., nan], [ 1., 2.]]) Coordinates: * x (x) object 'a' 'b' * y (y) int64 0 1 In [14]: arr.unstack('z').stack(z=('x', 'y')) Out[14]: <xray.DataArray 'foo' (z: 4)> array([ 0., nan, 1., 2.]) Coordinates: * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
See Stack and unstack for more details.
Warning
xray’s MultiIndex support is still experimental, and we have a long to- do list of desired additions (GH719), including better display of multi-index levels when printing a
Dataset, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.Support for reading GRIB, HDF4 and other file formats via PyNIO. See Formats supported by PyNIO for more details.
Better error message when a variable is supplied with the same name as one of its dimensions.
Plotting: more control on colormap parameters (GH642).
vminandvmaxwill not be silently ignored anymore. Settingcenter=Falseprevents automatic selection of a divergent colormap.New
shift()androll()methods for shifting/rotating datasets or arrays along a dimension:In [15]: array = xray.DataArray([5, 6, 7, 8], dims='x') In [16]: array.shift(x=2) Out[16]: <xarray.DataArray (x: 4)> array([ nan, nan, 5., 6.]) Dimensions without coordinates: x In [17]: array.roll(x=2)