Class GradientTreeBoost
- All Implemented Interfaces:
Serializable
,ToDoubleFunction<Tuple>
,SHAP<Tuple>
,TreeSHAP
,DataFrameRegression
,Regression<Tuple>
Generic gradient boosting at the t-th step would fit a regression tree to
pseudo-residuals. Let J be the number of its leaves. The tree partitions
the input space into J disjoint regions and predicts a constant value in
each region. The parameter J controls the maximum allowed
level of interaction between variables in the model. With J = 2 (decision
stumps), no interaction between variables is allowed. With J = 3 the model
may include effects of the interaction between up to two variables, and
so on. Hastie et al. comment that typically 4 <= J <= 8
work well
for boosting and results are fairly insensitive to the choice of in
this range, J = 2 is insufficient for many applications, and J > 10
is
unlikely to be required.
Fitting the training set too closely can lead to degradation of the model's generalization ability. Several so-called regularization techniques reduce this over-fitting effect by constraining the fitting procedure. One natural regularization parameter is the number of gradient boosting iterations T (i.e. the number of trees in the model when the base learner is a decision tree). Increasing T reduces the error on training set, but setting it too high may lead to over-fitting. An optimal value of T is often selected by monitoring prediction error on a separate validation data set.
Another regularization approach is the shrinkage which times a parameter
η (called the "learning rate") to update term.
Empirically it has been found that using small learning rates (such as
η < 0.1
) yields dramatic improvements in model's generalization ability
over gradient boosting without shrinking (η = 1). However, it comes at
the price of increasing computational time both during training and
prediction: lower learning rate requires more iterations.
Soon after the introduction of gradient boosting Friedman proposed a minor modification to the algorithm, motivated by Breiman's bagging method. Specifically, he proposed that at each iteration of the algorithm, a base learner should be fit on a subsample of the training set drawn at random without replacement. Friedman observed a substantional improvement in gradient boosting's accuracy with this modification.
Subsample size is some constant fraction f of the size of the training set. When f = 1, the algorithm is deterministic and identical to the one described above. Smaller values of f introduce randomness into the algorithm and help prevent over-fitting, acting as a kind of regularization. The algorithm also becomes faster, because regression trees have to be fit to smaller datasets at each iteration. Typically, f is set to 0.5, meaning that one half of the training set is used to build each base learner.
Also, like in bagging, sub-sampling allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the next base learner. Out-of-bag estimates help avoid the need for an independent validation dataset, but often underestimate actual performance improvement and the optimal number of iterations.
Gradient tree boosting implementations often also use regularization by limiting the minimum number of observations in trees' terminal nodes. It's used in the tree building process by ignoring any splits that lead to nodes containing fewer than this number of training set instances. Imposing this limit helps to reduce variance in predictions at leaves.
References
- J. H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine, 1999.
- J. H. Friedman. Stochastic Gradient Boosting, 1999.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface smile.regression.DataFrameRegression
DataFrameRegression.Trainer<M extends DataFrameRegression>
-
Constructor Summary
ConstructorDescriptionGradientTreeBoost
(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic GradientTreeBoost
Fits a gradient tree boosting for regression.static GradientTreeBoost
fit
(Formula formula, DataFrame data, Properties params) Fits a gradient tree boosting for regression.static GradientTreeBoost
fit
(Formula formula, DataFrame data, Loss loss, int ntrees, int maxDepth, int maxNodes, int nodeSize, double shrinkage, double subsample) Fits a gradient tree boosting for regression.formula()
Returns the model formula.double[]
Returns the variable importance.double
Predicts the dependent variable of an instance.schema()
Returns the schema of predictors.int
size()
Returns the number of trees in the model.double[][]
Test the model on a validation dataset.trees()
Returns the decision trees.void
trim
(int ntrees) Trims the tree model set to a smaller size in case of over-fitting.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface smile.regression.DataFrameRegression
predict
Methods inherited from interface smile.regression.Regression
applyAsDouble, online, predict, predict, predict, update, update, update
-
Constructor Details
-
GradientTreeBoost
public GradientTreeBoost(Formula formula, RegressionTree[] trees, double b, double shrinkage, double[] importance) Constructor. Fits a gradient tree boosting for regression.- Parameters:
formula
- a symbolic description of the model to be fitted.trees
- forest of regression trees.b
- the interceptshrinkage
- the shrinkage parameter in (0, 1] controls the learning rate of procedure.importance
- variable importance
-
-
Method Details
-
fit
Fits a gradient tree boosting for regression.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.- Returns:
- the model.
-
fit
Fits a gradient tree boosting for regression.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.params
- the hyper-parameters.- Returns:
- the model.
-
fit
public static GradientTreeBoost fit(Formula formula, DataFrame data, Loss loss, int ntrees, int maxDepth, int maxNodes, int nodeSize, double shrinkage, double subsample) Fits a gradient tree boosting for regression.- Parameters:
formula
- a symbolic description of the model to be fitted.data
- the data frame of the explanatory and response variables.loss
- loss function for regression. By default, least absolute deviation is employed for robust regression.ntrees
- the number of iterations (trees).maxDepth
- the maximum depth of the tree.maxNodes
- the maximum number of leaf nodes in the tree.nodeSize
- the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.shrinkage
- the shrinkage parameter in (0, 1] controls the learning rate of procedure.subsample
- the sampling fraction for stochastic tree boosting.- Returns:
- the model.
-
formula
Description copied from interface:DataFrameRegression
Returns the model formula.- Specified by:
formula
in interfaceDataFrameRegression
- Specified by:
formula
in interfaceTreeSHAP
- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegression
Returns the schema of predictors.- Specified by:
schema
in interfaceDataFrameRegression
- Returns:
- the schema of predictors.
-
importance
public double[] importance()Returns the variable importance. Every time a split of a node is made on variable the impurity criterion for the two descendant nodes is less than the parent node. Adding up the decreases for each individual variable over all trees in the forest gives a simple measure of variable importance.- Returns:
- the variable importance
-
size
public int size()Returns the number of trees in the model.- Returns:
- the number of trees in the model
-
trees
Description copied from interface:TreeSHAP
Returns the decision trees. -
trim
public void trim(int ntrees) Trims the tree model set to a smaller size in case of over-fitting. Or if extra decision trees in the model don't improve the performance, we may remove them to reduce the model size and also improve the speed of prediction.- Parameters:
ntrees
- the new (smaller) size of tree model set.
-
predict
Description copied from interface:Regression
Predicts the dependent variable of an instance.- Specified by:
predict
in interfaceRegression<Tuple>
- Parameters:
x
- an instance.- Returns:
- the predicted value of dependent variable.
-
test
Test the model on a validation dataset.- Parameters:
data
- the test data set.- Returns:
- the predictions with first 1, 2, ..., regression trees.
-