Class RidgeRegression

java.lang.Object
smile.regression.RidgeRegression

public class RidgeRegression extends Object
Ridge Regression. Coefficient estimates for multiple linear regression models rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix X'X becomes close to singular. As a result, the least-squares estimate becomes highly sensitive to random errors in the observed response Y, producing a large variance.

Ridge regression is one method to address these issues. In ridge regression, the matrix X'X is perturbed to make its determinant appreciably different from 0.

Ridge regression is a kind of Tikhonov regularization, which is the most commonly used method of regularization of ill-posed problems. Ridge regression shrinks the regression coefficients by imposing a penalty on their size. By allowing a small amount of bias in the estimates, more reasonable coefficients may often be obtained. Often, small amounts of bias lead to dramatic reductions in the variance of the estimated model coefficients.

Another interpretation of ridge regression is available through Bayesian estimation. In this setting the belief that weight should be small is coded into a prior distribution.

The penalty term is unfair if the predictor variables are not on the same scale. Therefore, if we know that the variables are not measured in the same units, we typically scale the columns of X (to have sample variance 1), and then we perform ridge regression.

When including an intercept term in the regression, we usually leave this coefficient unpenalized. Otherwise, we could add some constant amount to the vector y, and this would not result in the same solution. If we center the columns of X, then the intercept estimate ends up just being the mean of y.

Ridge regression does not set coefficients exactly to zero unless λ = ∞, in which case they're all zero. Hence, ridge regression cannot perform variable selection, and even though it performs well in terms of prediction accuracy, it does poorly in terms of offering a clear interpretation.

  • Constructor Details

    • RidgeRegression

      public RidgeRegression()
  • Method Details

    • fit

      public static LinearModel fit(Formula formula, DataFrame data)
      Fits a ridge regression model.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      Returns:
      the model.
    • fit

      public static LinearModel fit(Formula formula, DataFrame data, Properties params)
      Fits a ridge regression model. The hyper-parameters in prop include
      • smile.ridge.lambda is the shrinkage/regularization parameter. Large lambda means more shrinkage. Choosing an appropriate value of lambda is important, and also difficult.
      • smile.ridge.standard.error is a boolean. If true, compute the estimated standard errors of the estimate of parameters
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      params - the hyper-parameters.
      Returns:
      the model.
    • fit

      public static LinearModel fit(Formula formula, DataFrame data, double lambda)
      Fits a ridge regression model.
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      lambda - the shrinkage/regularization parameter. Large lambda means more shrinkage. Choosing an appropriate value of lambda is important, and also difficult.
      Returns:
      the model.
    • fit

      public static LinearModel fit(Formula formula, DataFrame data, double[] weights, double[] lambda, double[] beta0)
      Fits a generalized ridge regression model that minimizes a weighted least squares criterion augmented with a generalized ridge penalty:
      
           (Y - X'*beta)' * W * (Y - X'*beta) + (beta - beta0)' * lambda * (beta - beta0)
       
      Parameters:
      formula - a symbolic description of the model to be fitted.
      data - the data frame of the explanatory and response variables. NO NEED to include a constant column of 1s for bias.
      weights - sample weights.
      lambda - the shrinkage/regularization parameter. Large lambda means more shrinkage. Choosing an appropriate value of lambda is important, and also difficult. Its length may be 1 so that its value is applied to all variables.
      beta0 - generalized ridge penalty target. Its length may be 1 so that its value is applied to all variables.
      Returns:
      the model.