Helper functions and classes for decision trees. More...
Classes | |
struct | LeafNode |
Terminal node, which returns a constant prediction value for features which ended up on it. More... | |
struct | Node |
Tree node. Nodes are split (non-terminal) or leaf (terminal). More... | |
struct | SplitNode |
Non-terminal node, which splits data depending on a threshold value of some feature. More... | |
Functions | |
std::tuple< RegressionTree, double, double > | regression_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds) |
Grows a regression tree with pruning. More... | |
std::tuple< ClassificationTree, double, double > | classification_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds) |
Grows a classification tree with pruning. More... | |
RegressionTree | regression_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size) |
Grows a regression tree without pruning. More... | |
ClassificationTree | classification_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size) |
Grows a classification tree without pruning. More... | |
template<typename Y > | |
void | cost_complexity_prune (DecisionTree< Y > &tree, const double alpha) |
Performs cost-complexity pruning in-place. More... | |
double | regression_tree_mean_squared_error (const RegressionTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
Calculates tree mean squared error (MSE) for a sample. More... | |
double | classification_tree_accuracy (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
Calculates tree accuracy for a sample. More... | |
double | classification_tree_misclassification_rate (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
Calculates tree misclassification rate for a sample. More... | |
Helper functions and classes for decision trees.
Functions for manipulating decision trees.
std::tuple<RegressionTree, double, double> ml::DecisionTrees::regression_tree_auto_prune | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
Eigen::Ref< const Eigen::VectorXd > | y, | ||
unsigned int | max_split_levels, | ||
unsigned int | min_sample_size, | ||
const std::vector< double > & | alphas, | ||
const unsigned int | num_folds | ||
) |
Grows a regression tree with pruning.
[in] | X | Independent variables (column-wise). |
[in] | y | Dependent variable. |
[in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
[in] | min_sample_size | Minimum sample size which can be split (at least 2). |
[in] | alphas | Candidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used. |
[in] | num_folds | Number of folds for cross-validation. Ignored if cross-validation is not done. |
std::invalid_argument | If min_sample_size < 2 , y.size() < 2 or X.cols() != y.size() . |
std::tuple<ClassificationTree, double, double> ml::DecisionTrees::classification_tree_auto_prune | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
Eigen::Ref< const Eigen::VectorXd > | y, | ||
unsigned int | max_split_levels, | ||
unsigned int | min_sample_size, | ||
const std::vector< double > & | alphas, | ||
const unsigned int | num_folds | ||
) |
Grows a classification tree with pruning.
[in] | X | Features (column-wise). |
[in] | y | Class indices. |
[in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
[in] | min_sample_size | Minimum sample size which can be split (at least 2). |
[in] | alphas | Candidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used. |
[in] | num_folds | Number of folds for cross-validation. Ignored if cross-validation is not done. |
std::invalid_argument | If min_sample_size < 2 , y.size() < 2 or X.cols() != y.size() . |
RegressionTree ml::DecisionTrees::regression_tree | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
Eigen::Ref< const Eigen::VectorXd > | y, | ||
unsigned int | max_split_levels, | ||
unsigned int | min_sample_size | ||
) |
Grows a regression tree without pruning.
[in] | X | Independent variables (column-wise). |
[in] | y | Dependent variable. |
[in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
[in] | min_sample_size | Minimum sample size which can be split (at least 2). |
std::invalid_argument | If min_sample_size < 2 , y.size() < 2 or X.cols() != y.size() . |
ClassificationTree ml::DecisionTrees::classification_tree | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
Eigen::Ref< const Eigen::VectorXd > | y, | ||
unsigned int | max_split_levels, | ||
unsigned int | min_sample_size | ||
) |
Grows a classification tree without pruning.
[in] | X | Classification features (column-wise). |
[in] | y | Class indices. |
[in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
[in] | min_sample_size | Minimum sample size which can be split (at least 2). |
std::invalid_argument | If min_sample_size < 2 , y.size() < 2 or X.cols() != y.size() . |
void ml::DecisionTrees::cost_complexity_prune | ( | DecisionTree< Y > & | tree, |
const double | alpha | ||
) |
Performs cost-complexity pruning in-place.
[in,out] | tree | Tree to be pruned. |
[in] | alpha | Cost of complexity per node. |
Y | Decision tree output value type. |
std::domain_error | If alpha < 0. |
double ml::DecisionTrees::regression_tree_mean_squared_error | ( | const RegressionTree & | tree, |
Eigen::Ref< const Eigen::MatrixXd > | X, | ||
Eigen::Ref< const Eigen::VectorXd > | y | ||
) |
Calculates tree mean squared error (MSE) for a sample.
MSE for tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N (y_i - f(\vec{X}_i))^2 \).
[in] | tree | Regression tree. |
[in] | X | Independent variables (column-wise). |
[in] | y | Dependent variable. |
y.size() == 0
.std::invalid_argument | If X.cols() != y.size() . |
double ml::DecisionTrees::classification_tree_accuracy | ( | const ClassificationTree & | tree, |
Eigen::Ref< const Eigen::MatrixXd > | X, | ||
Eigen::Ref< const Eigen::VectorXd > | y | ||
) |
Calculates tree accuracy for a sample.
Accuracy for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i = f(\vec{X}_i)} \).
[in] | tree | tree Classification tree. |
[in] | X | Classification features (column-wise). |
[in] | y | Class indices. |
std::invalid_argument | If X.cols() != y.size() . |
|
inline |
Calculates tree misclassification rate for a sample.
Misclassification rate for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i \neq f(\vec{X}_i)} \).
[in] | tree | Classification tree. |
[in] | X | Classification features (column-wise). |
[in] | y | Class indices. |
std::invalid_argument | If X.cols() != y.size() . |