Naive K-means clustering method. More...
#include <KMeans.hpp>
Public Member Functions | |
KMeans (unsigned int number_clusters) | |
Constructs a K-means model ready to fit. More... | |
bool | fit (Eigen::Ref< const Eigen::MatrixXd > data) override |
Fits the model. More... | |
unsigned int | number_clusters () const override |
Returns the number of clusters. More... | |
const std::vector< unsigned int > & | labels () const override |
Returns a const reference to resulting cluster labels for each datapoint. Value make sense only if fitting converged successfully. | |
const Eigen::MatrixXd & | centroids () const override |
Returns a const reference to the matrix of cluster centroids (in columns). More... | |
void | set_seed (unsigned int seed) |
Sets PRNG seed. More... | |
void | set_absolute_tolerance (double absolute_tolerance) |
Sets absolute tolerance for convergence test: || old centroids - new centroids ||^2 < absolute tolerance. More... | |
void | set_maximum_steps (unsigned int maximum_steps) |
Sets maximum number of K-means steps. More... | |
void | set_number_initialisations (unsigned int number_initialisations) |
Sets number of initialisations to try, to find the clusters with lowest inertia. More... | |
void | set_centroids_initialiser (std::shared_ptr< const CentroidsInitialiser > centroids_initialiser) |
Sets centroids initialiser. More... | |
void | set_verbose (bool verbose) |
Switches between verbose and quiet mode. More... | |
std::pair< unsigned int, double > | assign_label (Eigen::Ref< const Eigen::VectorXd > x) const |
Given a data point x, assign it to its cluster and return the correct label and squared Euclidean distance to the assigned centroid. More... | |
double | inertia () const |
Sum of squared distances to the nearest centroid. More... | |
bool | converged () const override |
Reports if the model converged. | |
![]() | |
virtual | ~Model () |
Virtual destructor. | |
Naive K-means clustering method.
Converges if exactly the same cluster assignments are chosen twice, or if sum of squared differences between new and old centroids is lower than tolerance.
ml::Clustering::KMeans::KMeans | ( | unsigned int | number_clusters | ) |
Constructs a K-means model ready to fit.
[in] | number_clusters | Number of clusters. |
std::invalid_argument | If number_clusters == 0 . |
|
overridevirtual |
Fits the model.
[in] | data | Matrix (column-major order) with a data point in every column. |
true
if fitting converged successfully. std::invalid_argument | If data has no rows, or if the sample size (number of columns in data ) is too low. |
Implements ml::Clustering::Model.
|
inlineoverridevirtual |
Returns the number of clusters.
Value make sense only if fitting converged successfully.
Implements ml::Clustering::Model.
|
inlineoverridevirtual |
Returns a const reference to the matrix of cluster centroids (in columns).
A centroid represent the central location of the cluster. It is e.g. a mean of all points in the cluster. Value make sense only if fitting converged successfully.
Implements ml::Clustering::Model.
void ml::Clustering::KMeans::set_seed | ( | unsigned int | seed | ) |
Sets PRNG seed.
[in] | seed | PRNG seed. |
void ml::Clustering::KMeans::set_absolute_tolerance | ( | double | absolute_tolerance | ) |
Sets absolute tolerance for convergence test: || old centroids - new centroids ||^2 < absolute tolerance.
[in] | absolute_tolerance | Absolute tolerance. |
std::domain_error | If absolute_tolerance < 0 . |
void ml::Clustering::KMeans::set_maximum_steps | ( | unsigned int | maximum_steps | ) |
Sets maximum number of K-means steps.
[in] | maximum_steps | Maximum number of steps. |
std::invalid_argument | If maximum_steps < 2 . |
void ml::Clustering::KMeans::set_number_initialisations | ( | unsigned int | number_initialisations | ) |
Sets number of initialisations to try, to find the clusters with lowest inertia.
number_initialisations | Number of initialisations. |
std::invalid_argument | If number_initialisations < 1 . |
void ml::Clustering::KMeans::set_centroids_initialiser | ( | std::shared_ptr< const CentroidsInitialiser > | centroids_initialiser | ) |
Sets centroids initialiser.
[in] | centroids_initialiser | Pointer to CentroidsInitialiser implementation. |
std::invalid_argument | If centroids_initialiser is null. |
|
inline |
Switches between verbose and quiet mode.
[in] | verbose | true if we want verbose output. |
std::pair<unsigned int, double> ml::Clustering::KMeans::assign_label | ( | Eigen::Ref< const Eigen::VectorXd > | x | ) | const |
Given a data point x, assign it to its cluster and return the correct label and squared Euclidean distance to the assigned centroid.
[in] | x | Data point with correct dimension. |
std::invalid_argument | If x.size() != means().rows() . |
|
inline |
Sum of squared distances to the nearest centroid.