MLpp
ml::DecisionTrees Namespace Reference

Helper functions and classes for decision trees. More...

Classes

struct  LeafNode
 Terminal node, which returns a constant prediction value for features which ended up on it. More...
 
struct  Node
 Tree node. Nodes are split (non-terminal) or leaf (terminal). More...
 
struct  SplitNode
 Non-terminal node, which splits data depending on a threshold value of some feature. More...
 

Functions

std::tuple< RegressionTree, double, double > regression_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds)
 Grows a regression tree with pruning. More...
 
std::tuple< ClassificationTree, double, double > classification_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds)
 Grows a classification tree with pruning. More...
 
RegressionTree regression_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size)
 Grows a regression tree without pruning. More...
 
ClassificationTree classification_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size)
 Grows a classification tree without pruning. More...
 
template<typename Y >
void cost_complexity_prune (DecisionTree< Y > &tree, const double alpha)
 Performs cost-complexity pruning in-place. More...
 
double regression_tree_mean_squared_error (const RegressionTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y)
 Calculates tree mean squared error (MSE) for a sample. More...
 
double classification_tree_accuracy (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y)
 Calculates tree accuracy for a sample. More...
 
double classification_tree_misclassification_rate (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y)
 Calculates tree misclassification rate for a sample. More...
 

Detailed Description

Helper functions and classes for decision trees.

Functions for manipulating decision trees.

Function Documentation

◆ regression_tree_auto_prune()

std::tuple<RegressionTree, double, double> ml::DecisionTrees::regression_tree_auto_prune ( Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y,
unsigned int  max_split_levels,
unsigned int  min_sample_size,
const std::vector< double > &  alphas,
const unsigned int  num_folds 
)

Grows a regression tree with pruning.

Parameters
[in]XIndependent variables (column-wise).
[in]yDependent variable.
[in]max_split_levelsMaximum number of split nodes on the way to any leaf node.
[in]min_sample_sizeMinimum sample size which can be split (at least 2).
[in]alphasCandidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used.
[in]num_foldsNumber of folds for cross-validation. Ignored if cross-validation is not done.
Returns
Tuple of: trained regression tree, chosen alpha (NaN if no pruning was done) and minimum cross-validation test error (NaN if no cross-validation was done).
Exceptions
std::invalid_argumentIf min_sample_size < 2, y.size() < 2 or X.cols() != y.size().

◆ classification_tree_auto_prune()

std::tuple<ClassificationTree, double, double> ml::DecisionTrees::classification_tree_auto_prune ( Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y,
unsigned int  max_split_levels,
unsigned int  min_sample_size,
const std::vector< double > &  alphas,
const unsigned int  num_folds 
)

Grows a classification tree with pruning.

Parameters
[in]XFeatures (column-wise).
[in]yClass indices.
[in]max_split_levelsMaximum number of split nodes on the way to any leaf node.
[in]min_sample_sizeMinimum sample size which can be split (at least 2).
[in]alphasCandidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used.
[in]num_foldsNumber of folds for cross-validation. Ignored if cross-validation is not done.
Returns
Tuple of: trained classification tree, chosen alpha (NaN if no pruning was done) and minimum cross-validation test error (NaN if no cross-validation was done).
Exceptions
std::invalid_argumentIf min_sample_size < 2, y.size() < 2 or X.cols() != y.size().

◆ regression_tree()

RegressionTree ml::DecisionTrees::regression_tree ( Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y,
unsigned int  max_split_levels,
unsigned int  min_sample_size 
)

Grows a regression tree without pruning.

Parameters
[in]XIndependent variables (column-wise).
[in]yDependent variable.
[in]max_split_levelsMaximum number of split nodes on the way to any leaf node.
[in]min_sample_sizeMinimum sample size which can be split (at least 2).
Returns
Trained regression tree.
Exceptions
std::invalid_argumentIf min_sample_size < 2, y.size() < 2 or X.cols() != y.size().

◆ classification_tree()

ClassificationTree ml::DecisionTrees::classification_tree ( Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y,
unsigned int  max_split_levels,
unsigned int  min_sample_size 
)

Grows a classification tree without pruning.

Parameters
[in]XClassification features (column-wise).
[in]yClass indices.
[in]max_split_levelsMaximum number of split nodes on the way to any leaf node.
[in]min_sample_sizeMinimum sample size which can be split (at least 2).
Returns
Trained classification tree.
Exceptions
std::invalid_argumentIf min_sample_size < 2, y.size() < 2 or X.cols() != y.size().

◆ cost_complexity_prune()

template<typename Y >
void ml::DecisionTrees::cost_complexity_prune ( DecisionTree< Y > &  tree,
const double  alpha 
)

Performs cost-complexity pruning in-place.

Parameters
[in,out]treeTree to be pruned.
[in]alphaCost of complexity per node.
Template Parameters
YDecision tree output value type.
Exceptions
std::domain_errorIf alpha < 0.

◆ regression_tree_mean_squared_error()

double ml::DecisionTrees::regression_tree_mean_squared_error ( const RegressionTree tree,
Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y 
)

Calculates tree mean squared error (MSE) for a sample.

MSE for tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N (y_i - f(\vec{X}_i))^2 \).

Parameters
[in]treeRegression tree.
[in]XIndependent variables (column-wise).
[in]yDependent variable.
Returns
MSE or NaN if y.size() == 0.
Exceptions
std::invalid_argumentIf X.cols() != y.size().

◆ classification_tree_accuracy()

double ml::DecisionTrees::classification_tree_accuracy ( const ClassificationTree tree,
Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y 
)

Calculates tree accuracy for a sample.

Accuracy for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i = f(\vec{X}_i)} \).

Parameters
[in]treetree Classification tree.
[in]XClassification features (column-wise).
[in]yClass indices.
Exceptions
std::invalid_argumentIf X.cols() != y.size().

◆ classification_tree_misclassification_rate()

double ml::DecisionTrees::classification_tree_misclassification_rate ( const ClassificationTree tree,
Eigen::Ref< const Eigen::MatrixXd >  X,
Eigen::Ref< const Eigen::VectorXd >  y 
)
inline

Calculates tree misclassification rate for a sample.

Misclassification rate for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i \neq f(\vec{X}_i)} \).

Parameters
[in]treeClassification tree.
[in]XClassification features (column-wise).
[in]yClass indices.
Exceptions
std::invalid_argumentIf X.cols() != y.size().