[ VIGRA Homepage | Function Index | Class Index | Namespaces | File List | Main Page ]

Feature Accumulators

The namespace vigra::acc provides the function vigra::acc::extractFeatures() along with associated statistics functors and accumulator classes. Together, they provide a framework for efficient compution of a wide variety of statistical features, both globally for an entire image, and locally for each region defined by a label array. Many different statistics can be composed out of a small number of fundamental statistics and suitable modifiers. The user simply selects the desired statistics by means of their tags (see below), and a template meta-program automatically generates an efficient functor that computes exactly those statistics.

The function extractFeatures() scans the data in as few passes as the selected statstics permit (usually one or two passes are sufficient). Statistics are computed by accurate incremental algorithms, whose internal state is maintained by accumulator objects. The state is updated by passing data to the accumulator one sample at a time. Accumulators are grouped within an accumulator chain. Dependencies between accumulators in the accumulator chain are automatically resolved and missing dependencies are inserted. For example, to compute the mean, you also need to count the number of samples. This allows accumulators to offload some of their computations on other accumulators, making the algorithms more efficient. Each accumulator only sees data in the appropriate pass through the data, called its "working pass".

#include <vigra/accumulator.hxx>

<b>Basic statistics:</b>
- PowerSum<N> ( \form#0)
- AbsPowerSum<N> ( \form#1)
- Skewness, UnbiasedSkewness
- Kurtosis, UnbiasedKurtosis
- Minimum, Maximum
- FlatScatterMatrix (flattened upper-triangular part of scatter matrix)
- 4 histogram classes (see \ref histogram "below")
- StandardQuantiles (0%, 10%, 25%, 50%, 75%, 90%, 100%)
- ArgMinWeight, ArgMaxWeight (store data or coordinate where weight assumes its minimal or maximal value)
- CoordinateSystem (identity matrix of appropriate size)

<b>Modifiers:</b> (S is the statistc to be modified)
- Normalization
  <table border="0">
  <tr><td> DivideByCount<S>        </td><td>  S/Count           </td></tr>
  <tr><td> RootDivideByCount<S>    </td><td>  sqrt( S/Count )     </td></tr>
  <tr><td> DivideUnbiased<S>       </td><td>  S/(Count-1)       </td></tr>
  <tr><td> RootDivideUnbiased<S> &nbsp; &nbsp;  </td><td>  sqrt( S/(Count-1) ) </td></tr>
  </table>    
- Data preparation:
  <table border="0">
  <tr><td>  Central<S>   </td><td> substract mean before computing S </td></tr>
  <tr><td>  Principal<S> </td><td> project onto PCA eigenvectors   </td></tr>
  <tr><td>  Whitened<S> &nbsp; &nbsp;  </td><td> scale to unit variance after PCA   </td></tr>
  <tr><td>  Coord<S>        </td><td> compute S from pixel coordinates rather than from pixel values    </td></tr>
  <tr><td>  Weighted<S>     </td><td> compute weighted version of S   </td></tr>
  <tr><td>  Global<S>       </td><td> compute S globally rather than per region (per region is default if labels are given)   </td></tr>
  </table>

Aliases for a couple of important features are implemented (mainly as <tt>typedef FullName Alias</tt>). The alias names are equivalent to full names. 
Here are some examples for supported alias names (these examples also show how to compose statistics from the fundamental statistics and modifiers):

<table border="0">
<tr><th> Alias           </th><th>   Full Name                 </th></tr>
<tr><td> Count           </td><td>  PowerSum<0>                </td></tr>
<tr><td> Sum             </td><td>  PowerSum<1>                </td></tr>
<tr><td> SumOfSquares    </td><td>  PowerSum<2>                </td></tr>
<tr><td> Mean            </td><td>  DivideByCount<PowerSum<1>> </td></tr>
<tr><td> RootMeanSquares &nbsp; </td><td>  RootDivideByCount<PowerSum<2>> </td></tr>
<tr><td> Moment<N>       </td><td>  DivideByCount<PowerSum<N>>  </td></tr>
<tr><td> Variance        </td><td>  DivideByCount<Central<PowerSum<2>>>  </td></tr>
<tr><td> StdDev          </td><td>  RootDivideByCount<Central<PowerSum<2>>>  </td></tr>
<tr><td> Covariance      </td><td>  DivideByCount<FlatScatterMatrix> </td></tr>
<tr><td> RegionCenter    </td><td>  Coord<Mean>                </td></tr>
<tr><td> CenterOfMass    </td><td>  Weighted<Coord<Mean>>      </td></tr>
</table>

There are a few <b>rules for composing statistics</b>:
- modifiers can be specified in any order, but are internally transformed to standard order: Global<Weighted<Coord<normalization<data preparation<basic statistic
- only one normalization modifier and one data preparation modifier (Central or Principal or Whitened) is permitted 
- Count ignores all modifiers except Global and Weighted
- Sum ignores Central and Principal, because sum would be zero
- ArgMinWeight and ArgMaxWeight are automatically Weighted


Here is an example how to use \ref acc::AccumulatorChain to compute statistics. (To use Weighted<> or Coord<> modifiers, see below):
#include <vigra/multi_array.hxx>
#include <vigra/impex.hxx>
#include <vigra/accumulator.hxx>
using namespace vigra::acc;
typedef double DataType;
int size = 1000;
AccumulatorChain<DataType,
a;
std::cout << "passes required: " << a.passesRequired() << std::endl;
extractFeatures(data.begin(), data.end(), a);
std::cout << "Mean: " << get<Mean>(a) << std::endl;
std::cout << "Variance: " << get<Variance>(a) << std::endl;

The acc::AccumulatorChain object contains the selected statistics and their dependencies. Statistics have to be wrapped with acc::Select. The statistics are computed with the acc::extractFeatures function and the statistics can be accessed with acc::get .

Rules and notes:

The Accumulators can also be used with vector-valued data (vigra::RGBValue, vigra::TinyVector, vigra::MultiArray or vigra::MultiArrayView):

typedef vigra::RGBValue<double> DataType;
AccumulatorChain<DataType, Select<...> > a;
...

To compute weighted statistics (Weighted<>) or statistics over coordinates (Coord<>), the accumulator chain can be used with CoupledScanOrderIterator. The coupled iterator provides simultaneous access to several images (e.g. weight and data) and pixel coordinates. The first parameter in the accumulator chain is the type of the CoupledHandle. The indeces at which the CoupledHandle holds the data, weights etc. can be specified inside the Select wrapper.

These index specifiers are: (INDEX is of type int)

- DataArg<INDEX>: CoupledHandle holds data at index 'INDEX' (default INDEX=1)
- LabelArg<INDEX>: CoupledHandle holds labels at index 'INDEX' (default INDEX=2)
- WeightArg<INDEX>: CoupledHandle holds weights at index 'INDEX' (default INDEX=outermost index)

Pixel coordinates are always at index 0.

using namespace vigra::acc;
vigra::MultiArray<3, double> data(...), weights(...);
typedef vigra::CoupledIteratorType<3, double, double>::type Iterator; //type of the CoupledScanOrderIterator
typedef Iterator::value_type Handle; //type of the corresponding CoupledHandle
Select<DataArg<1>, WeightArg<2>, //where to look in the Handle (coordinates are always arg 0)
Mean, Variance, //statistics over values
Coord<Mean>, Coord<Variance>, //statistics over coordinates,
Weighted<Mean>, Weighted<Variance>, //weighted values,
Weighted<Coord<Mean> > > > //weighted coordinates.
a;
Iterator start = createCoupledIterator(data, weights); //coord->index 0, data->index 1, weights->index 2
Iterator end = start.getEndIterator();
extractFeatures(start,end,a);

To compute region statistics, use acc::AccumulatorChainArray :

using namespace vigra::acc;
typedef Iterator::value_type Handle;
Select<DataArg<1>, LabelArg<2>, //where to look in the Handle (coordinates are always arg 0)
Mean, Variance, //per-region statistics over values
Coord<Mean>, Coord<Variance>, //per-region statistics over coordinates
Global<Mean>, Global<Variance> > > //global statistics
a;
Iterator start = createCoupledIterator(data, labels);
Iterator end = start.getEndIterator();
a.ignoreLabel(0); //statistics will not be computed for region 0 (e.g. background)
extractFeatures(start,end,a);
int regionlabel = ...;
std::cout << get<Mean>(a, regionlabel) << std::endl; //get Mean of region with label 'regionlabel'
In some application it will be known only at run-time which statistics have to be computed. An Accumulator with <b>run-time activation</b> is provided by the \ref acc::DynamicAccumulatorChain class. One specifies a set of statistics at compile-time and from this set one can activate the needed statistics at run-time:
using namespace vigra::acc;
activate<Mean>(a); //at run-time
a.activate("Minimum"); //same as activate<Minimum>(a) (alias names are not recognized)
extractFeatures(data.begin(), data.end(), a);
std::cout << "Mean: " << get<Mean>(a) << std::endl; //ok
//std::cout << "Maximum: " << get<Maximum>(a) << std::endl; // run-time error because Maximum not activated

Likewise, for run-time activation of region statistics, use acc::DynamicAccumulatorChainArray.

Accumulator merging (e.g. for parallelization or hierarchical segmentation) is possible for many accumulators:

using namespace vigra::acc;
extractFeatures(data.begin(), data.end(), a); //process entire data set at once
extractFeatures(data.begin(), data.begin()+data.size()/2, a1); //process first half
extractFeatures(data.begin()+data.size()/2, data.end(), a2); //process second half
a1 += a2; // merge: a1 now equals a0 (with numerical tolerances)

Not all statistics can be merged (e.g. Principal usually cannot, except for some important specializations). A statistic can be merged if the "+=" operator is supported (see the documentation of that particular statistic). If the accumulator chain only requires one pass to collect the data, it is also possible to just apply the extractFeatures() function repeatedly:

using namespace vigra::acc;
extractFeatures(data.begin(), data.begin()+data.size()/2, a); // this works because
extractFeatures(data.begin()+data.size()/2, data.end(), a); // all statistics only work in pass 1

Four kinds of histograms are currently implemented:

IntegerHistogram Data values are equal to bin indices
UserRangeHistogram User provides lower and upper bounds for linear range mapping from values to indices.
AutoRangeHistogram Range mapping bounds are defiend by minimum and maximum of the data (2 passes needed!)
GlobalRangeHistogram   Likewise, but use global min/max rather than region min/max as AutoRangeHistogram will
- The number of bins is specified at compile time (as template parameter int BinCount) or at run-time (if BinCount is zero at compile time). In the first case the return type of the accumulator is TinyVector<double, BinCount> (number of bins cannot be changed). In the second case, the return type is MultiArray<1, double> and the number of bins must be set before seeing data (see example below). 
- If UserRangeHistogram is used, the bounds for the linear range mapping from values to indices must be set before seeing data (see below).
- Options can be set by passing an instance of HistogramOptions to the accumulator chain (same options for all histograms in the chain) or by directly calling the appropriate member functions of the accumulators.
- Merging is supported if the range mapping of the histograms is the same.
- Histogram accumulators have two members for outliers (left_outliers, right_outliers).

With the StandardQuantiles class, <b>histogram quantiles</b> (0%, 10%, 25%, 50%, 75%, 90%, 100%) are computed from a given histgram using linear interpolation. The return type is TinyVector<double, 7> .

Usage:

using namespace vigra::acc;
typedef double DataType;
typedef UserRangeHistogram<40> SomeHistogram; //binCount set at compile time
typedef UserRangeHistogram<0> SomeHistogram2; // binCount must be set at run-time
typedef AutoRangeHistogram<0> SomeHistogram3;
//set options for all histograms in the accumulator chain:
vigra::HistogramOptions histogram_opt;
histogram_opt = histogram_opt.setBinCount(50);
//histogram_opt = histogram_opt.setMinMax(0.1, 0.9); // this would set min/max for all three histograms, but range bounds
// shall be set automatically by min/max of data for SomeHistogram3
a.setHistogramOptions(histogram_opt);
// set options for a specific histogram in the accumulator chain:
getAccumulator<SomeHistogram>(a).setMinMax(0.1, 0.9); // number of bins must be set before setting min/max
getAccumulator<SomeHistogram2>(a).setMinMax(0.0, 1.0);
extractFeatures(data.begin(), data.end(), a);
vigra::TinyVector<double, 40> hist = get<SomeHistogram>(a);
vigra::MultiArray<1, double> hist2 = get<SomeHistogram2>(a);
vigra::TinyVector<double, 7> quant = get<Quantiles3>(a);
double right_outliers = getAccumulator<SomeHistogram>(a).right_outliers;

© Ullrich Köthe (ullrich.koethe@iwr.uni-heidelberg.de)
Heidelberg Collaboratory for Image Processing, University of Heidelberg, Germany

html generated using doxygen and Python
vigra 1.9.0 (Sat Oct 5 2013)