Package-level declarations

Regression analysis.

Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables. Therefore, the estimation target is a function of the independent variables called the regression function. Regression analysis is widely used for prediction and forecasting.

Types

Link copied to clipboard
object gpr

Gaussian Process for Regression.

Functions

Link copied to clipboard
fun cart(formula: Formula, data: DataFrame, maxDepth: Int = 20, maxNodes: Int = 0, nodeSize: Int = 5): RegressionTree

Regression tree. A classification/regression tree can be learned by splitting the training set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions.

Link copied to clipboard
fun gbm(formula: Formula, data: DataFrame, loss: Loss = Loss.lad(), ntrees: Int = 500, maxDepth: Int = 20, maxNodes: Int = 6, nodeSize: Int = 5, shrinkage: Double = 0.05, subsample: Double = 0.7): GradientTreeBoost

Gradient boosted regression trees.

Link copied to clipboard
fun <T> gpr(x: Array<T>, y: DoubleArray, kernel: MercerKernel<T>, noise: Double, normalize: Boolean = true, tol: Double = 1.0E-5, maxIter: Int = 0): GaussianProcessRegression<T>

Gaussian Process for Regression. A Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution.

Link copied to clipboard
fun lasso(formula: Formula, data: DataFrame, lambda: Double, tol: Double = 0.001, maxIter: Int = 5000): LinearModel

Least absolute shrinkage and selection operator. The Lasso is a shrinkage and selection method for linear regression. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients (i.e. L1-regularized). It has connections to soft-thresholding of wavelet coefficients, forward stage-wise regression, and boosting methods.

Link copied to clipboard
fun lm(formula: Formula, data: DataFrame, method: String = "qr", stderr: Boolean = true, recursive: Boolean = true): LinearModel

Fitting linear models (ordinary least squares). In linear regression, the model specification is that the dependent variable is a linear combination of the parameters (but need not be linear in the independent variables). The residual is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable. Ordinary least squares obtains parameter estimates that minimize the sum of squared residuals, SSE (also denoted RSS).

Link copied to clipboard
fun randomForest(formula: Formula, data: DataFrame, ntrees: Int = 500, mtry: Int = 0, maxDepth: Int = 20, maxNodes: Int = 500, nodeSize: Int = 5, subsample: Double = 1.0): RandomForest

Random forest for regression. Random forest is an ensemble classifier that consists of many decision trees and outputs the majority vote of individual trees. The method combines bagging idea and the random selection of features.

Link copied to clipboard
fun <T> rbfnet(x: Array<T>, y: DoubleArray, neurons: Array<RBF<T>>, normalized: Boolean = false): RBFNetwork<T>

Radial basis function networks. A radial basis function network is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions. They are used in function approximation, time series prediction, and control.

fun rbfnet(x: Array<DoubleArray>, y: DoubleArray, k: Int, normalized: Boolean = false): RBFNetwork<DoubleArray>

Trains a Gaussian RBF network with k-means.

Link copied to clipboard
fun ridge(formula: Formula, data: DataFrame, lambda: Double): LinearModel

Ridge Regression. When the predictor variables are highly correlated amongst themselves, the coefficients of the resulting least squares fit may be very imprecise. By allowing a small amount of bias in the estimates, more reasonable coefficients may often be obtained. Ridge regression is one method to address these issues. Often, small amounts of bias lead to dramatic reductions in the variance of the estimated model coefficients. Ridge regression is such a technique which shrinks the regression coefficients by imposing a penalty on their size. Ridge regression was originally developed to overcome the singularity of the X'X matrix. This matrix is perturbed so as to make its determinant appreciably different from 0.

Link copied to clipboard
fun <T> svm(x: Array<T>, y: DoubleArray, kernel: MercerKernel<T>, eps: Double, C: Double, tol: Double = 0.001): KernelMachine<T>

Support vector regression. Like SVM for classification, the model produced by SVR depends only on a subset of the training data, because the cost function ignores any training data close to the model prediction (within a threshold).