MLpp
ml::Statistics Namespace Reference

Statistical functions. More...

Functions

template<class Iter >
std::pair< double, double > sse_and_mean (const Iter begin, const Iter end)
 Calculates the average and sum of squared error for a sample. More...
 
template<class Iter >
double sse (const Iter begin, const Iter end)
 Calculates the average and sum of squared error for a sample. More...
 
template<class Iter >
std::pair< double, unsigned int > gini_index_and_mode (const Iter begin, const Iter end, const unsigned int K)
 Calculates the Gini index of the sample. More...
 
template<class Iter >
double gini_index (const Iter begin, const Iter end, const unsigned int K)
 Calculates the Gini index of the sample. More...
 
template<class Iter >
unsigned int mode (const Iter begin, const Iter end, const unsigned int K)
 Calculates the mode (most frequent value) of a sample. More...
 
template<class R >
covariance (const std::vector< R > &xs, const std::vector< R > &ys)
 Calculates sample covariance of two vectors. More...
 
double covariance (Eigen::Ref< const Eigen::VectorXd > xs, Eigen::Ref< const Eigen::VectorXd > ys)
 Calculates sample covariance of two vectors. More...
 

Detailed Description

Statistical functions.

Function Documentation

◆ sse_and_mean()

template<class Iter >
std::pair<double, double> ml::Statistics::sse_and_mean ( const Iter  begin,
const Iter  end 
)

Calculates the average and sum of squared error for a sample.

Given a range [begin, end) with N values, calculates

\( \mathrm{SSE} = \sum_{i=1}^{N} (x_i - \bar{x})^2 \)

and

\( \bar{x} = N^{-1} \sum_{i=1}^{N} x_i \).

Parameters
[in]beginIterator pointing to the beginning of the range of sample values.
[in]endIterator pointing one past to the end of the range of sample values.
Template Parameters
IterIterator type.
Returns
( \(\mathrm{SSE}\), \(\bar{x}\)) pair.

◆ sse()

template<class Iter >
double ml::Statistics::sse ( const Iter  begin,
const Iter  end 
)

Calculates the average and sum of squared error for a sample.

Given a range [begin, end) with N values, calculates

\( \mathrm{SSE} = \sum_{i=1}^{N} (x_i - \bar{x})^2 \),

where

\( \bar{x} = N^{-1} \sum_{i=1}^{N} x_i \).

Parameters
[in]beginIterator pointing to the beginning of the range of sample values.
[in]endIterator pointing one past to the end of the range of sample values.
Template Parameters
IterIterator type.
Returns
SSE.

◆ gini_index_and_mode()

template<class Iter >
std::pair<double, unsigned int> ml::Statistics::gini_index_and_mode ( const Iter  begin,
const Iter  end,
const unsigned int  K 
)

Calculates the Gini index of the sample.

Gini index is defined as

\( \sum_{k=1}^K \hat{p}_k (1 - \hat{p}_k) \)

where \(\hat{p}_k\) is the frequency of occurrence of class k in data.

Takes as argument a range [begin, end) of class values from 0 to K - 1.

Parameters
[in]beginIterator pointing to the beginning of the range of sample values.
[in]endIterator pointing one past to the end of the range of sample values.
[in]KNumber of classes, positive.
Template Parameters
IterIterator type.
Returns
Gini index and the most frequent class. If begin == end, mode == K.

◆ gini_index()

template<class Iter >
double ml::Statistics::gini_index ( const Iter  begin,
const Iter  end,
const unsigned int  K 
)

Calculates the Gini index of the sample.

Gini index is defined as

\( \sum_{k=1}^K \hat{p}_k (1 - \hat{p}_k) \)

where \(\hat{p}_k\) is the frequency of occurrence of class k in data.

Takes as argument a range [begin, end) of class values from 0 to K - 1.

Parameters
[in]beginIterator pointing to the beginning of the range of sample values.
[in]endIterator pointing one past to the end of the range of sample values.
[in]KNumber of classes, positive.
Template Parameters
IterIterator type.
Returns
Gini index.

◆ mode()

template<class Iter >
unsigned int ml::Statistics::mode ( const Iter  begin,
const Iter  end,
const unsigned int  K 
)

Calculates the mode (most frequent value) of a sample.

The sample is assumed to contain values in the [0, K - 1] range.

Parameters
[in]beginIterator pointing to the beginning of the range of sample values.
[in]endIterator pointing one past to the end of the range of sample values.
[in]KPositive number of distinct values.
Template Parameters
IterIterator type.
Returns
Mode of the sample.

◆ covariance() [1/2]

template<class R >
R ml::Statistics::covariance ( const std::vector< R > &  xs,
const std::vector< R > &  ys 
)

Calculates sample covariance of two vectors.

Parameters
[in]xsX values.
[in]ysY values.
Template Parameters
RScalar value type.
Returns
Sample covariance (unbiased estimate of population covariance) or NaN if xs.size() < 2.
Exceptions
std::invalid_argumentIf xs.size() != ys.size().

◆ covariance() [2/2]

double ml::Statistics::covariance ( Eigen::Ref< const Eigen::VectorXd >  xs,
Eigen::Ref< const Eigen::VectorXd >  ys 
)

Calculates sample covariance of two vectors.

Parameters
xsX values.
ysY values.
Returns
Sample covariance (unbiased estimate of population covariance) or NaN if xs.size() < 2.
Exceptions
std::invalid_argumentIf xs.size() != ys.size().