Class InformationValue
- All Implemented Interfaces:
Comparable<InformationValue>
IV is a good measure of the predictive power of a feature. It also helps point out the suspicious feature. Unlike other feature selection methods available, the features selected using IV might not be the best feature set for a non-linear model building.
Information Value | Predictive power |
---|---|
<0.02 | Useless |
0.02 to 0.1 | Weak predictors |
0.1 to 0.3 | Medium Predictors |
0.3 to 0.5 | Strong predictors |
>0.5 | Suspicious |
WoE = ln (percentage of events / percentage of non-events).Note that the conditional log odds is exactly what a logistic regression model tries to predict.
WoE values of a categorical variable can be used to convert a categorical feature to a numerical feature. If a continuous feature does not have a linear relationship with the log odds, the feature can be binned into groups and a new feature created by replaced each bin with its WoE value. Therefore, WoE is a good variable transformation method for logistic regression.
On arranging a numerical feature in ascending order, if the WoE values are all linear, we know that the feature has the right linear relation with the target. However, if the feature's WoE is non-linear, we should either discard it or consider some other variable transformation to ensure the linearity. Hence, WoE helps check the linear relationship of a feature with its dependent variable to be used in the model. Though WoE and IV are highly useful, always ensure that it is only used with logistic regression.
WoE is better than on-hot encoding as it does not increase the complexity of the model.
-
Field Summary
-
Constructor Summary
ConstructorDescriptionInformationValue
(String feature, double iv, double[] woe, double[] breaks) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionint
compareTo
(InformationValue other) static InformationValue[]
Calculates the information value.static InformationValue[]
Calculates the information value.toString()
static String
toString
(InformationValue[] iv) Returns a string representation of the array of information values.static ColumnTransform
toTransform
(InformationValue[] values) Returns the data transformation that covert feature value to its weight of evidence.
-
Field Details
-
feature
The feature name. -
iv
public final double ivInformation value. -
woe
public final double[] woeWeight of evidence. -
breaks
public final double[] breaksBreakpoints of intervals for numerical variables.
-
-
Constructor Details
-
InformationValue
Constructor.- Parameters:
feature
- The feature name.iv
- Information value.woe
- Weight of evidence.breaks
- Breakpoints of intervals for numerical variables.
-
-
Method Details
-
compareTo
- Specified by:
compareTo
in interfaceComparable<InformationValue>
-
toString
-
toString
Returns a string representation of the array of information values.- Parameters:
iv
- the array of information values.- Returns:
- a string representation of information values
-
toTransform
Returns the data transformation that covert feature value to its weight of evidence.- Parameters:
values
- the information value objects of features.- Returns:
- the transform.
-
fit
Calculates the information value.- Parameters:
data
- the data frame of the explanatory and response variables.clazz
- the column name of binary class labels.- Returns:
- the information value.
-
fit
Calculates the information value.- Parameters:
data
- the data frame of the explanatory and response variables.clazz
- the column name of binary class labels.nbins
- the number of bins to discretize numeric variables in WOE calculation.- Returns:
- the information value.
-