MLpp
ml::Clustering::KMeans Class Reference

Naive K-means clustering method. More...

#include <KMeans.hpp>

Inheritance diagram for ml::Clustering::KMeans:
Collaboration diagram for ml::Clustering::KMeans:

Public Member Functions

 KMeans (unsigned int number_clusters)
 Constructs a K-means model ready to fit. More...
 
bool fit (Eigen::Ref< const Eigen::MatrixXd > data) override
 Fits the model. More...
 
unsigned int number_clusters () const override
 Returns the number of clusters. More...
 
const std::vector< unsigned int > & labels () const override
 Returns a const reference to resulting cluster labels for each datapoint. Value make sense only if fitting converged successfully.
 
const Eigen::MatrixXd & centroids () const override
 Returns a const reference to the matrix of cluster centroids (in columns). More...
 
void set_seed (unsigned int seed)
 Sets PRNG seed. More...
 
void set_absolute_tolerance (double absolute_tolerance)
 Sets absolute tolerance for convergence test: || old centroids - new centroids ||^2 < absolute tolerance. More...
 
void set_maximum_steps (unsigned int maximum_steps)
 Sets maximum number of K-means steps. More...
 
void set_number_initialisations (unsigned int number_initialisations)
 Sets number of initialisations to try, to find the clusters with lowest inertia. More...
 
void set_centroids_initialiser (std::shared_ptr< const CentroidsInitialiser > centroids_initialiser)
 Sets centroids initialiser. More...
 
void set_verbose (bool verbose)
 Switches between verbose and quiet mode. More...
 
std::pair< unsigned int, double > assign_label (Eigen::Ref< const Eigen::VectorXd > x) const
 Given a data point x, assign it to its cluster and return the correct label and squared Euclidean distance to the assigned centroid. More...
 
double inertia () const
 Sum of squared distances to the nearest centroid. More...
 
bool converged () const override
 Reports if the model converged.
 
- Public Member Functions inherited from ml::Clustering::Model
virtual ~Model ()
 Virtual destructor.
 

Detailed Description

Naive K-means clustering method.

Converges if exactly the same cluster assignments are chosen twice, or if sum of squared differences between new and old centroids is lower than tolerance.

Constructor & Destructor Documentation

◆ KMeans()

ml::Clustering::KMeans::KMeans ( unsigned int  number_clusters)

Constructs a K-means model ready to fit.

Parameters
[in]number_clustersNumber of clusters.
Exceptions
std::invalid_argumentIf number_clusters == 0.

Member Function Documentation

◆ fit()

bool ml::Clustering::KMeans::fit ( Eigen::Ref< const Eigen::MatrixXd >  data)
overridevirtual

Fits the model.

Parameters
[in]dataMatrix (column-major order) with a data point in every column.
Returns
true if fitting converged successfully.
Exceptions
std::invalid_argumentIf data has no rows, or if the sample size (number of columns in data) is too low.

Implements ml::Clustering::Model.

◆ number_clusters()

unsigned int ml::Clustering::KMeans::number_clusters ( ) const
inlineoverridevirtual

Returns the number of clusters.

Value make sense only if fitting converged successfully.

Implements ml::Clustering::Model.

◆ centroids()

const Eigen::MatrixXd& ml::Clustering::KMeans::centroids ( ) const
inlineoverridevirtual

Returns a const reference to the matrix of cluster centroids (in columns).

A centroid represent the central location of the cluster. It is e.g. a mean of all points in the cluster. Value make sense only if fitting converged successfully.

Implements ml::Clustering::Model.

◆ set_seed()

void ml::Clustering::KMeans::set_seed ( unsigned int  seed)

Sets PRNG seed.

Parameters
[in]seedPRNG seed.

◆ set_absolute_tolerance()

void ml::Clustering::KMeans::set_absolute_tolerance ( double  absolute_tolerance)

Sets absolute tolerance for convergence test: || old centroids - new centroids ||^2 < absolute tolerance.

Parameters
[in]absolute_toleranceAbsolute tolerance.
Exceptions
std::domain_errorIf absolute_tolerance < 0.

◆ set_maximum_steps()

void ml::Clustering::KMeans::set_maximum_steps ( unsigned int  maximum_steps)

Sets maximum number of K-means steps.

Parameters
[in]maximum_stepsMaximum number of steps.
Exceptions
std::invalid_argumentIf maximum_steps < 2.

◆ set_number_initialisations()

void ml::Clustering::KMeans::set_number_initialisations ( unsigned int  number_initialisations)

Sets number of initialisations to try, to find the clusters with lowest inertia.

Parameters
number_initialisationsNumber of initialisations.
Exceptions
std::invalid_argumentIf number_initialisations < 1.

◆ set_centroids_initialiser()

void ml::Clustering::KMeans::set_centroids_initialiser ( std::shared_ptr< const CentroidsInitialiser centroids_initialiser)

Sets centroids initialiser.

Parameters
[in]centroids_initialiserPointer to CentroidsInitialiser implementation.
Exceptions
std::invalid_argumentIf centroids_initialiser is null.

◆ set_verbose()

void ml::Clustering::KMeans::set_verbose ( bool  verbose)
inline

Switches between verbose and quiet mode.

Parameters
[in]verbosetrue if we want verbose output.

◆ assign_label()

std::pair<unsigned int, double> ml::Clustering::KMeans::assign_label ( Eigen::Ref< const Eigen::VectorXd >  x) const

Given a data point x, assign it to its cluster and return the correct label and squared Euclidean distance to the assigned centroid.

Parameters
[in]xData point with correct dimension.
Exceptions
std::invalid_argumentIf x.size() != means().rows().

◆ inertia()

double ml::Clustering::KMeans::inertia ( ) const
inline

Sum of squared distances to the nearest centroid.

Returns
Non-negative number;

The documentation for this class was generated from the following file: