Class LinearModel
- All Implemented Interfaces:
Serializable
,ToDoubleFunction<Tuple>
,DataFrameRegression
,Regression<Tuple>
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the model and the statistical significance of the estimated parameters. Commonly used checks of goodness of fit include the R-squared, analysis of the pattern of residuals and hypothesis testing. Statistical significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
Interpretations of these diagnostic tests rest heavily on the model assumptions. Although examination of the residuals can be used to invalidate a model, the results of a t-test or F-test are sometimes more difficult to interpret if the model's assumptions are violated. For example, if the error term does not have a normal distribution, in small samples the estimated parameters will not follow normal distributions and complicate inference. With relatively large samples, however, a central limit theorem can be invoked such that hypothesis testing may proceed using asymptotic approximations.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface smile.regression.DataFrameRegression
DataFrameRegression.Trainer<M extends DataFrameRegression>
-
Constructor Summary
ConstructorDescriptionLinearModel
(Formula formula, StructType schema, Matrix X, double[] y, double[] w, double b) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondouble
Returns adjusted R2 statistic.double[]
Returns the linear coefficients without intercept.int
df()
Returns the degree-of-freedom of residual standard error.double
error()
Returns the residual standard error.double[]
Returns the fitted values.formula()
Returns the model formula.double
ftest()
Returns the F-statistic of goodness-of-fit.double
Returns the intercept.boolean
online()
Returns true if this is an online learner.double
predict
(double[] x) Predicts the dependent variable of an instance.double[]
Predicts the dependent variables of a data frame.double
Predicts the dependent variable of an instance.double
pvalue()
Returns the p-value of goodness-of-fit test.double[]
Returns the residuals, which is response minus fitted values.double
RSquared()
Returns R2 statistic.double
RSS()
Returns the residual sum of squares.schema()
Returns the schema of predictors.toString()
double[][]
ttest()
Returns the t-test of the coefficients (including intercept).void
update
(double[] x, double y) Growing window recursive least squares with lambda = 1.void
update
(double[] x, double y, double lambda) Recursive least squares.void
Online update the regression model with a new data frame.void
Online update the regression model with a new training instance.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.regression.Regression
applyAsDouble, predict, predict, predict, update, update, update
-
Constructor Details
-
LinearModel
Constructor.- Parameters:
formula
- a symbolic description of the model to be fitted.schema
- the schema of input data.X
- the design matrix.y
- the responsible variable.w
- the linear weights.b
- the intercept.
-
-
Method Details
-
formula
Description copied from interface:DataFrameRegression
Returns the model formula.- Specified by:
formula
in interfaceDataFrameRegression
- Returns:
- the model formula.
-
schema
Description copied from interface:DataFrameRegression
Returns the schema of predictors.- Specified by:
schema
in interfaceDataFrameRegression
- Returns:
- the schema of predictors.
-
ttest
public double[][] ttest()Returns the t-test of the coefficients (including intercept). The first column is the coefficients, the second column is the standard error of coefficients, the third column is the t-score of the hypothesis test if the coefficient is zero, the fourth column is the p-values of test. The last row is of intercept.- Returns:
- the t-test of the coefficients.
-
coefficients
public double[] coefficients()Returns the linear coefficients without intercept.- Returns:
- the linear coefficients without intercept.
-
intercept
public double intercept()Returns the intercept.- Returns:
- the intercept.
-
residuals
public double[] residuals()Returns the residuals, which is response minus fitted values.- Returns:
- the residuals
-
fittedValues
public double[] fittedValues()Returns the fitted values.- Returns:
- the fitted values.
-
RSS
public double RSS()Returns the residual sum of squares.- Returns:
- the residual sum of squares.
-
error
public double error()Returns the residual standard error.- Returns:
- the residual standard error.
-
df
public int df()Returns the degree-of-freedom of residual standard error.- Returns:
- the degree-of-freedom of residual standard error.
-
RSquared
public double RSquared()Returns R2 statistic. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data.In the case of ordinary least-squares regression, R2 increases as we increase the number of variables in the model (R2 will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2.
- Returns:
- R2 statistic.
-
adjustedRSquared
public double adjustedRSquared()Returns adjusted R2 statistic. The adjusted R2 has almost same explanation as R2 but it penalizes the statistic as extra variables are included in the model.- Returns:
- adjusted R2 statistic.
-
ftest
public double ftest()Returns the F-statistic of goodness-of-fit.- Returns:
- the F-statistic of goodness-of-fit.
-
pvalue
public double pvalue()Returns the p-value of goodness-of-fit test.- Returns:
- the p-value of goodness-of-fit test.
-
predict
public double predict(double[] x) Predicts the dependent variable of an instance.- Parameters:
x
- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:Regression
Predicts the dependent variable of an instance.- Specified by:
predict
in interfaceRegression<Tuple>
- Parameters:
x
- an instance.- Returns:
- the predicted value of dependent variable.
-
predict
Description copied from interface:DataFrameRegression
Predicts the dependent variables of a data frame.- Specified by:
predict
in interfaceDataFrameRegression
- Parameters:
df
- the data frame.- Returns:
- the predicted values.
-
update
Online update the regression model with a new training instance.- Parameters:
data
- the training data.
-
update
Online update the regression model with a new data frame.- Parameters:
data
- the training data.
-
online
public boolean online()Description copied from interface:Regression
Returns true if this is an online learner.- Specified by:
online
in interfaceRegression<Tuple>
- Returns:
- true if online learner.
-
update
public void update(double[] x, double y) Growing window recursive least squares with lambda = 1. RLS updates an ordinary least squares with samples that arrive sequentially.- Parameters:
x
- training instance.y
- response variable.
-
update
public void update(double[] x, double y, double lambda) Recursive least squares. RLS updates an ordinary least squares with samples that arrive sequentially.In some adaptive configurations it can be useful not to give equal importance to all the historical data but to assign higher weights to the most recent data (and then to forget the oldest one). This may happen when the phenomenon underlying the data is non-stationary or when we want to approximate a nonlinear dependence by using a linear model which is local in time. Both these situations are common in adaptive control problems.
- Parameters:
x
- training instance.y
- response variable.lambda
- The forgetting factor in (0, 1]. The smaller lambda is, the smaller is the contribution of previous samples to the covariance matrix. This makes the filter more sensitive to recent samples, which means more fluctuations in the filter coefficients. The lambda = 1 case is referred to as the growing window RLS algorithm. In practice, lambda is usually chosen between 0.98 and 1.
-
toString
-