rsdmx quickstart guide

The goal of this document is to get you up and running with rsdmx as quickly as possible.

rsdmx provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework.

SDMX - a short introduction

The SDMX framework provides two sets of standard specifications to facilitate the exchange of statistical data:

SDMX allows to disseminate both data (a dataset) and metadata (the description of the dataset).

For this, the SDMX standard provides various types of documents, also known as messages. Hence there will be:

For more information about the SDMX standards, you can visit the SDMX website, or this introduction by EUROSTAT.

How to deal with SDMX in R

rsdmx offers a low-level set of tools to read data and metadata in the SDMX-ML format. Its strategy is to make it very easy for the user. For this, a unique function named readSDMX has to be used, whatever it is a data or metadata document, or if it is local or remote datasource.

Let's see then how to use rsdmx!

Install rsdmx

rsdmx can be installed from CRAN or from its development repository hosted in Github. For the latter, you will need the devtools package and run:

devtools::install_github("opensdmx/rsdmx")

Load rsdmx

To load rsdmx in R, do the following:

library(rsdmx)

Read dataset documents

This section will introduce you on how to read SDMX dataset documents, either from remote datasources, or from local SDMX files.

Read remote datasets

using the raw approach (specifying the complete request URL)

The following code snipet shows you how to read a dataset from a remote data source, taking as example the OECD StatExtracts portal: http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011

myUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startTime=2000&endTime=2011"
dataset <- readSDMX(myUrl)
stats <- as.data.frame(dataset)

You can try it out with other datasources, such as from the EUROSTAT portal: http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/cdh_e_fos/..PC.FOS1.BE/?startperiod=2005&endPeriod=2011

The online rsdmx documentation also provides a list of data providers, either from international or national institutions, and https://github.com/opensdmx/rsdmx/wiki#read-remote-datasets.

using the helper approach

Now, the service providers above mentioned are known by rsdmx which let users using readSDMX with the helper parameters. The list of service providers can be retrieved doing:

## [1] "ECB"   "ESTAT" "OECD"  "FAO"   "ILO"

Note it is also possible to add an SDMX service provider at runtime. For registering a new SDMX service provider by default, please contact me!

Let's see how it would look like for querying an OECD datasource:

## http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/MIG/TOT../OECD?startPeriod=2010&endPeriod=2011
##   CO2 VAR GEN COU attrs.df obsTime obsValue OBS_STATUS
## 1 TOT B11 TOT AUS      P1Y    2010   206714       <NA>
## 2 TOT B11 TOT AUS      P1Y    2011   210704       <NA>
## 3 TOT B11 TOT AUT      P1Y    2010    96896       <NA>
## 4 TOT B11 TOT AUT      P1Y    2011   109921       <NA>
## 5 TOT B11 TOT BEL      P1Y    2010   113582          e
## 6 TOT B11 TOT BEL      P1Y    2011   117948       <NA>

Read local datasets

This example shows you how to use rsdmx with local SDMX files, previously downloaded from EUROSTAT.

#bulk download from Eurostat
tf <- tempfile(tmpdir = tdir <- tempdir()) #temp file and folder
download.file("http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Frd_e_gerdsc.sdmx.zip", tf)
sdmx_files <- unzip(tf, exdir = tdir)

#read local SDMX (set isURL = FALSE)
sdmx <- readSDMX(sdmx_files[2], isURL = FALSE)
stats <- as.data.frame(sdmx)

By default, readSDMX considers the data source is remote. To read a local file, add isURL = FALSE.

Read metadata documents

This section will introduce you on how to read SDMX metadata documents, including concepts, codelists and a complete data structure definition (DSD)

Concepts

Read concept schemes from FAO data portal

csUrl <- "http://data.fao.org/sdmx/registry/conceptscheme/FAO/ALL/LATEST/?detail=full&references=none&version=2.1"
csobj <- readSDMX(csUrl)
csdf <- as.data.frame(csobj)
## Warning in as.data.frame.SDMXConcepts(csobj): Using first conceptScheme referenced in SDMXConcepts object: 
## 
##                Specify 'conceptSchemeId' argument for a specific conceptScheme

Codelists

Read codelists from FAO data portal

clUrl <- "http://data.fao.org/sdmx/registry/codelist/FAO/CL_FAO_MAJOR_AREA/0.1"
clobj <- readSDMX(clUrl)
cldf <- as.data.frame(clobj)

###Data Structure Definition (DSD)

This example illustrates how to read a complete DSD using a OECD StatExtracts portal data source.

dsdUrl <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/TABLE1"
dsd <- readSDMX(dsdUrl)

rsdmx is implemented in object-oriented way with S4 classes and methods. The properties of S4 objects are named slots and can be accessed with the slot method. The following code snippet allows to extract the list of codelists contained in the DSD document, and read one codelist as data.frame.

#get codelists from DSD
cls <- slot(dsd, "codelists")

#get list of codelists
codelists <- sapply(slot(cls, "codelists"), function(x) slot(x, "id"))

#get a codelist
codelist <- as.data.frame(slot(dsd, "codelists"), codelistId = "CL_TABLE1_FLOWS") 

In a similar way, the concepts of the dataset can be extracted from the DSD and read as data.frame.

#get concepts from DSD
concepts <- as.data.frame(slot(dsd, "concepts"))