Package smile.feature.selection
Class GAFE
java.lang.Object
smile.feature.selection.GAFE
Genetic algorithm based feature selection. This method finds many (random)
subsets of variables of expected classification power using a Genetic
Algorithm. The "fitness" of each subset of variables is determined by its
ability to classify the samples according to a given classification
method. When many such subsets of variables are obtained, the one with the best
performance may be used as selected features. Alternatively, the frequencies
with which variables are selected may be analyzed further. The most
frequently selected variables may be presumed to be the most relevant to
sample distinction and are finally used for prediction. Although GA avoids
brute-force search, it is still much slower than univariate feature selection.
References
- Leping Li and Clarice R. Weinberg. Gene Selection and Sample Classification Using a Genetic Algorithm/k-Nearest Neighbor Method.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionGenetic algorithm based feature selection for classification.fitness
(double[][] x, double[] y, double[][] testx, double[] testy, RegressionMetric metric, BiFunction<double[][], double[], Regression<double[]>> trainer) Returns the fitness of the regression model.fitness
(double[][] x, int[] y, double[][] testx, int[] testy, ClassificationMetric metric, BiFunction<double[][], int[], Classifier<double[]>> trainer) Returns the fitness of the classification model.fitness
(String y, DataFrame train, DataFrame test, ClassificationMetric metric, BiFunction<Formula, DataFrame, DataFrameClassifier> trainer) Returns the fitness of the classification model.fitness
(String y, DataFrame train, DataFrame test, RegressionMetric metric, BiFunction<Formula, DataFrame, DataFrameRegression> trainer) Returns the fitness of the regression model.
-
Constructor Details
-
GAFE
public GAFE()Constructor. -
GAFE
public GAFE(Selection selection, int elitism, Crossover crossover, double crossoverRate, double mutationRate) Constructor.- Parameters:
selection
- the selection strategy.elitism
- the number of best chromosomes to copy to new population.crossover
- the strategy of crossover operation.crossoverRate
- the crossover rate.mutationRate
- the mutation rate.
-
-
Method Details
-
apply
Genetic algorithm based feature selection for classification.- Parameters:
size
- the population size of Genetic Algorithm.generation
- the maximum number of iterations.length
- the length of bit string, i.e. the number of features.fitness
- the fitness function.- Returns:
- bit strings of last generation.
-
fitness
public static Fitness<BitString> fitness(double[][] x, int[] y, double[][] testx, int[] testy, ClassificationMetric metric, BiFunction<double[][], int[], Classifier<double[]>> trainer) Returns the fitness of the classification model.- Parameters:
x
- training samples.y
- training labels.testx
- testing samples.testy
- testing labels.metric
- classification metric.trainer
- the lambda to train a model.- Returns:
- the fitness of model.
-
fitness
public static Fitness<BitString> fitness(double[][] x, double[] y, double[][] testx, double[] testy, RegressionMetric metric, BiFunction<double[][], double[], Regression<double[]>> trainer) Returns the fitness of the regression model.- Parameters:
x
- training samples.y
- training response.testx
- testing samples.testy
- testing response.metric
- classification metric.trainer
- the lambda to train a model.- Returns:
- the fitness of model.
-
fitness
public static Fitness<BitString> fitness(String y, DataFrame train, DataFrame test, ClassificationMetric metric, BiFunction<Formula, DataFrame, DataFrameClassifier> trainer) Returns the fitness of the classification model.- Parameters:
y
- the column name of class labels.train
- training data.test
- testing data.metric
- classification metric.trainer
- the lambda to train a model.- Returns:
- the fitness of model.
-
fitness
public static Fitness<BitString> fitness(String y, DataFrame train, DataFrame test, RegressionMetric metric, BiFunction<Formula, DataFrame, DataFrameRegression> trainer) Returns the fitness of the regression model.- Parameters:
y
- the column name of response variable.train
- training data.test
- testing data.metric
- classification metric.trainer
- the lambda to train a model.- Returns:
- the fitness of model.
-