A decision tree model must contain a key column, input columns, and one predictable column.
The Microsoft Decision Trees algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.
|
Input column content types
|
Continuous, Cyclical, Discrete, Discretized, Key, Table, and Ordered
|
|
Predictable column content types
|
Continuous, Cyclical, Discrete, Discretized, Table, and Ordered
|
|
Modeling flags
|
MODEL_EXISTENCE_ONLY, NOT NULL, and REGRESSOR
|
All Microsoft algorithms support a common set of functions. However, the Microsoft Decision Trees algorithm supports additional functions, listed in the following table.
For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.
The Microsoft Decision Trees algorithm supports using the Predictive Model Markup Language (PMML) to create mining models.
The Microsoft Decision Trees algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.
|
Parameter
|
Description
|
|---|
|
MAXIMUM_INPUT_ATTRIBUTES
|
Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.
The default is 255.
|
|
MAXIMUM_OUTPUT_ATTRIBUTES
|
Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.
The default is 255.
|
|
SCORE_METHOD
|
Determines the method that is used to calculate the split score. Available options: Entropy (1), Bayesian with K2 Prior (2), or Bayesian Dirichlet Equivalent (BDE) Prior (3).
The default is 3.
|
|
SPLIT_METHOD
|
Determines the method that is used to split the node. Available options: Binary (1), Complete (2), or Both (3).
The default is 3.
|
|
MINIMUM_SUPPORT
|
Determines the minimum number of leaf cases that is required to generate a split in the decision tree.
The default is 10.
|
|
COMPLEXITY_PENALTY
|
Controls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list:
-
For 1 through 9 attributes, the default is 0.5.
-
For 10 through 99 attributes, the default is 0.9.
-
For 100 or more attributes, the default is 0.99.
|
|
FORCED_REGRESSOR
|
Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.
|