|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.commons.math3.ml.clustering.Clusterer<T>
org.apache.commons.math3.ml.clustering.KMeansPlusPlusClusterer<T>
T
- type of the points to clusterpublic class KMeansPlusPlusClusterer<T extends Clusterable>
Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.
Nested Class Summary | |
---|---|
static class |
KMeansPlusPlusClusterer.EmptyClusterStrategy
Strategies to use for replacing an empty cluster. |
Field Summary | |
---|---|
private KMeansPlusPlusClusterer.EmptyClusterStrategy |
emptyStrategy
Selected strategy for empty clusters. |
private int |
k
The number of clusters. |
private int |
maxIterations
The maximum number of iterations. |
private RandomGenerator |
random
Random generator for choosing initial centers. |
Constructor Summary | |
---|---|
KMeansPlusPlusClusterer(int k)
Build a clusterer. |
|
KMeansPlusPlusClusterer(int k,
int maxIterations)
Build a clusterer. |
|
KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure)
Build a clusterer. |
|
KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure,
RandomGenerator random)
Build a clusterer. |
|
KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure,
RandomGenerator random,
KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer. |
Method Summary | |
---|---|
private int |
assignPointsToClusters(List<CentroidCluster<T>> clusters,
Collection<T> points,
int[] assignments)
Adds the given points to the closest Cluster . |
private Clusterable |
centroidOf(Collection<T> points,
int dimension)
Computes the centroid for a set of points. |
private List<CentroidCluster<T>> |
chooseInitialCenters(Collection<T> points)
Use K-means++ to choose the initial centers. |
List<CentroidCluster<T>> |
cluster(Collection<T> points)
Runs the K-means++ clustering algorithm. |
KMeansPlusPlusClusterer.EmptyClusterStrategy |
getEmptyClusterStrategy()
Returns the KMeansPlusPlusClusterer.EmptyClusterStrategy used by this instance. |
private T |
getFarthestPoint(Collection<CentroidCluster<T>> clusters)
Get the point farthest to its cluster center |
int |
getK()
Return the number of clusters this instance will use. |
int |
getMaxIterations()
Returns the maximum number of iterations this instance will use. |
private int |
getNearestCluster(Collection<CentroidCluster<T>> clusters,
T point)
Returns the nearest Cluster to the given point |
private T |
getPointFromLargestNumberCluster(Collection<? extends Cluster<T>> clusters)
Get a random point from the Cluster with the largest number of points |
private T |
getPointFromLargestVarianceCluster(Collection<CentroidCluster<T>> clusters)
Get a random point from the Cluster with the largest distance variance. |
RandomGenerator |
getRandomGenerator()
Returns the random generator this instance will use. |
Methods inherited from class org.apache.commons.math3.ml.clustering.Clusterer |
---|
distance, getDistanceMeasure |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private final int k
private final int maxIterations
private final RandomGenerator random
private final KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy
Constructor Detail |
---|
public KMeansPlusPlusClusterer(int k)
The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
k
- the number of clusters to split the data intopublic KMeansPlusPlusClusterer(int k, int maxIterations)
The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure)
The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.measure
- the distance measure to usepublic KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random)
The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.measure
- the distance measure to userandom
- random generator to use for choosing initial centerspublic KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, RandomGenerator random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.
If negative, no maximum will be used.measure
- the distance measure to userandom
- random generator to use for choosing initial centersemptyStrategy
- strategy to use for handling empty clusters that
may appear during algorithm iterationsMethod Detail |
---|
public int getK()
public int getMaxIterations()
public RandomGenerator getRandomGenerator()
public KMeansPlusPlusClusterer.EmptyClusterStrategy getEmptyClusterStrategy()
KMeansPlusPlusClusterer.EmptyClusterStrategy
used by this instance.
KMeansPlusPlusClusterer.EmptyClusterStrategy
public List<CentroidCluster<T>> cluster(Collection<T> points) throws MathIllegalArgumentException, ConvergenceException
cluster
in class Clusterer<T extends Clusterable>
points
- the points to cluster
MathIllegalArgumentException
- if the data points are null or the number
of clusters is larger than the number of data points
ConvergenceException
- if an empty cluster is encountered and the
emptyStrategy
is set to ERROR
private int assignPointsToClusters(List<CentroidCluster<T>> clusters, Collection<T> points, int[] assignments)
Cluster
.
clusters
- the Cluster
s to add the points topoints
- the points to add to the given Cluster
sassignments
- points assignments to clusters
private List<CentroidCluster<T>> chooseInitialCenters(Collection<T> points)
points
- the points to choose the initial centers from
private T getPointFromLargestVarianceCluster(Collection<CentroidCluster<T>> clusters) throws ConvergenceException
Cluster
with the largest distance variance.
clusters
- the Cluster
s to search
ConvergenceException
- if clusters are all emptyprivate T getPointFromLargestNumberCluster(Collection<? extends Cluster<T>> clusters) throws ConvergenceException
Cluster
with the largest number of points
clusters
- the Cluster
s to search
ConvergenceException
- if clusters are all emptyprivate T getFarthestPoint(Collection<CentroidCluster<T>> clusters) throws ConvergenceException
clusters
- the Cluster
s to search
ConvergenceException
- if clusters are all emptyprivate int getNearestCluster(Collection<CentroidCluster<T>> clusters, T point)
Cluster
to the given point
clusters
- the Cluster
s to searchpoint
- the point to find the nearest Cluster
for
Cluster
to the given pointprivate Clusterable centroidOf(Collection<T> points, int dimension)
points
- the set of pointsdimension
- the point dimension
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |