SelectKBest
What is SelectKBest?
SelectKBest is one of the most commonly used feature selection methods. SelectKBest is a type of filter-based feature selection method in machine learning.
SelectKBest uses statistical tests like chi-squared test, ANOVA F-test, or mutual information score to score and rank the features based on their relationship with the output variable. Then, it selects the K features with the highest scores to be included in the final feature subset.
Syntax
SelectKBest has 2 parameters: score function & number of fetures(k)
Score function
Score function is used to evaluate the feature importance. We have different types of score functions.
Some of the commonly used score_func
functions in SelectKBest
:
f_regression
: It is used for linear regression problems and computes F-value between feature and target.mutual_info_regression
: It is used for regression problems and computes mutual information between two random variables.f_classif
: It is used for classification problems and computes ANOVA F-value between feature and target.mutual_info_classif
: It is used for classification problems and computes mutual information between two discrete variables.chi2
: It is used for classification problems and computes chi-squared statistics between each feature and target.SelectPercentile
: It is used to select the highest X% of the features based on the score_func.
How to select the right score function?
For regression, the most commonly used scoring functions are f_regression
and mutual_info_regression
For classification, the most commonly used scoring function is chi_2
, mutual_info_classif
and f_classif
Commands
Last updated