public class ChiSqSelector
extends Object
implements scala.Serializable
numTopFeatures, percentile, fpr.
- numTopFeatures chooses a fixed number of top features according to a chi-squared test.
- percentile is similar but chooses a fraction of all features instead of a fixed number.
- fpr chooses all features whose p-value is below a threshold, thus controlling the false
positive rate of selection.
By default, the selection method is numTopFeatures, with the default number of top features
set to 50.| Constructor and Description |
|---|
ChiSqSelector() |
ChiSqSelector(int numTopFeatures)
The is the same to call this() and setNumTopFeatures(numTopFeatures)
|
| Modifier and Type | Method and Description |
|---|---|
ChiSqSelectorModel |
fit(RDD<LabeledPoint> data)
Returns a ChiSquared feature selector.
|
double |
fpr() |
static String |
FPR()
String name for `fpr` selector type.
|
int |
numTopFeatures() |
static String |
NumTopFeatures()
String name for
numTopFeatures selector type. |
double |
percentile() |
static String |
Percentile()
String name for
percentile selector type. |
String |
selectorType() |
ChiSqSelector |
setFpr(double value) |
ChiSqSelector |
setNumTopFeatures(int value) |
ChiSqSelector |
setPercentile(double value) |
ChiSqSelector |
setSelectorType(String value) |
static String[] |
supportedSelectorTypes()
Set of selector types that ChiSqSelector supports.
|
public ChiSqSelector()
public ChiSqSelector(int numTopFeatures)
numTopFeatures - (undocumented)public static String NumTopFeatures()
numTopFeatures selector type.public static String Percentile()
percentile selector type.public static String FPR()
public static String[] supportedSelectorTypes()
public int numTopFeatures()
public double percentile()
public double fpr()
public String selectorType()
public ChiSqSelector setNumTopFeatures(int value)
public ChiSqSelector setPercentile(double value)
public ChiSqSelector setFpr(double value)
public ChiSqSelector setSelectorType(String value)
public ChiSqSelectorModel fit(RDD<LabeledPoint> data)
data - an RDD[LabeledPoint] containing the labeled dataset with categorical features.
Real-valued features will be treated as categorical for each distinct value.
Apply feature discretizer before using this function.