Helper functions and classes for decision trees. More...
Classes | |
| struct | LeafNode |
| Terminal node, which returns a constant prediction value for features which ended up on it. More... | |
| struct | Node |
| Tree node. Nodes are split (non-terminal) or leaf (terminal). More... | |
| struct | SplitNode |
| Non-terminal node, which splits data depending on a threshold value of some feature. More... | |
Functions | |
| std::tuple< RegressionTree, double, double > | regression_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds) |
| Grows a regression tree with pruning. More... | |
| std::tuple< ClassificationTree, double, double > | classification_tree_auto_prune (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size, const std::vector< double > &alphas, const unsigned int num_folds) |
| Grows a classification tree with pruning. More... | |
| RegressionTree | regression_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size) |
| Grows a regression tree without pruning. More... | |
| ClassificationTree | classification_tree (Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y, unsigned int max_split_levels, unsigned int min_sample_size) |
| Grows a classification tree without pruning. More... | |
| template<typename Y > | |
| void | cost_complexity_prune (DecisionTree< Y > &tree, const double alpha) |
| Performs cost-complexity pruning in-place. More... | |
| double | regression_tree_mean_squared_error (const RegressionTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
| Calculates tree mean squared error (MSE) for a sample. More... | |
| double | classification_tree_accuracy (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
| Calculates tree accuracy for a sample. More... | |
| double | classification_tree_misclassification_rate (const ClassificationTree &tree, Eigen::Ref< const Eigen::MatrixXd > X, Eigen::Ref< const Eigen::VectorXd > y) |
| Calculates tree misclassification rate for a sample. More... | |
Helper functions and classes for decision trees.
Functions for manipulating decision trees.
| std::tuple<RegressionTree, double, double> ml::DecisionTrees::regression_tree_auto_prune | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
| Eigen::Ref< const Eigen::VectorXd > | y, | ||
| unsigned int | max_split_levels, | ||
| unsigned int | min_sample_size, | ||
| const std::vector< double > & | alphas, | ||
| const unsigned int | num_folds | ||
| ) |
Grows a regression tree with pruning.
| [in] | X | Independent variables (column-wise). |
| [in] | y | Dependent variable. |
| [in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
| [in] | min_sample_size | Minimum sample size which can be split (at least 2). |
| [in] | alphas | Candidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used. |
| [in] | num_folds | Number of folds for cross-validation. Ignored if cross-validation is not done. |
| std::invalid_argument | If min_sample_size < 2, y.size() < 2 or X.cols() != y.size(). |
| std::tuple<ClassificationTree, double, double> ml::DecisionTrees::classification_tree_auto_prune | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
| Eigen::Ref< const Eigen::VectorXd > | y, | ||
| unsigned int | max_split_levels, | ||
| unsigned int | min_sample_size, | ||
| const std::vector< double > & | alphas, | ||
| const unsigned int | num_folds | ||
| ) |
Grows a classification tree with pruning.
| [in] | X | Features (column-wise). |
| [in] | y | Class indices. |
| [in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
| [in] | min_sample_size | Minimum sample size which can be split (at least 2). |
| [in] | alphas | Candidate alphas for pruning to be selected by cross-validation. If this vector is empty, no pruning is done. If it has just one element, this value is used for pruning. If it has more than one, the one with smallest k-fold cross-validation test error is used. |
| [in] | num_folds | Number of folds for cross-validation. Ignored if cross-validation is not done. |
| std::invalid_argument | If min_sample_size < 2, y.size() < 2 or X.cols() != y.size(). |
| RegressionTree ml::DecisionTrees::regression_tree | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
| Eigen::Ref< const Eigen::VectorXd > | y, | ||
| unsigned int | max_split_levels, | ||
| unsigned int | min_sample_size | ||
| ) |
Grows a regression tree without pruning.
| [in] | X | Independent variables (column-wise). |
| [in] | y | Dependent variable. |
| [in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
| [in] | min_sample_size | Minimum sample size which can be split (at least 2). |
| std::invalid_argument | If min_sample_size < 2, y.size() < 2 or X.cols() != y.size(). |
| ClassificationTree ml::DecisionTrees::classification_tree | ( | Eigen::Ref< const Eigen::MatrixXd > | X, |
| Eigen::Ref< const Eigen::VectorXd > | y, | ||
| unsigned int | max_split_levels, | ||
| unsigned int | min_sample_size | ||
| ) |
Grows a classification tree without pruning.
| [in] | X | Classification features (column-wise). |
| [in] | y | Class indices. |
| [in] | max_split_levels | Maximum number of split nodes on the way to any leaf node. |
| [in] | min_sample_size | Minimum sample size which can be split (at least 2). |
| std::invalid_argument | If min_sample_size < 2, y.size() < 2 or X.cols() != y.size(). |
| void ml::DecisionTrees::cost_complexity_prune | ( | DecisionTree< Y > & | tree, |
| const double | alpha | ||
| ) |
Performs cost-complexity pruning in-place.
| [in,out] | tree | Tree to be pruned. |
| [in] | alpha | Cost of complexity per node. |
| Y | Decision tree output value type. |
| std::domain_error | If alpha < 0. |
| double ml::DecisionTrees::regression_tree_mean_squared_error | ( | const RegressionTree & | tree, |
| Eigen::Ref< const Eigen::MatrixXd > | X, | ||
| Eigen::Ref< const Eigen::VectorXd > | y | ||
| ) |
Calculates tree mean squared error (MSE) for a sample.
MSE for tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N (y_i - f(\vec{X}_i))^2 \).
| [in] | tree | Regression tree. |
| [in] | X | Independent variables (column-wise). |
| [in] | y | Dependent variable. |
y.size() == 0.| std::invalid_argument | If X.cols() != y.size(). |
| double ml::DecisionTrees::classification_tree_accuracy | ( | const ClassificationTree & | tree, |
| Eigen::Ref< const Eigen::MatrixXd > | X, | ||
| Eigen::Ref< const Eigen::VectorXd > | y | ||
| ) |
Calculates tree accuracy for a sample.
Accuracy for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i = f(\vec{X}_i)} \).
| [in] | tree | tree Classification tree. |
| [in] | X | Classification features (column-wise). |
| [in] | y | Class indices. |
| std::invalid_argument | If X.cols() != y.size(). |
|
inline |
Calculates tree misclassification rate for a sample.
Misclassification rate for classification tree \( f \) is defined as \( N^{-1} \sum_{i=1}^N \mathbb{1}_{y_i \neq f(\vec{X}_i)} \).
| [in] | tree | Classification tree. |
| [in] | X | Classification features (column-wise). |
| [in] | y | Class indices. |
| std::invalid_argument | If X.cols() != y.size(). |