Finding importance of features with forests of trees

In a classification problem, not all features have the same importance to predict the label of a record. Different approaches are used by classification algorithms to determine the important features for the classification. E.g. XGBoost uses one of these three parameters for measuring feature importance: weight, cover, and gain.

In the following example, I am using forests of trees to evaluate the importance of features on a classification task using synthetic data and labels.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier

# Create a synthetic data and labels for the classification model
X, y = make_classification(n_samples=1000,

# Use ExtraTreesClassifier for the model
forest = ExtraTreesClassifier(n_estimators=150,

# Train the model using the synthetic data and compute feature importance, y)
importances = forest.feature_importances_
# print(importances)
indices = np.argsort(importances)[::-1]  # sort in descending order
# print(indices)

# Print the feature ranking
print("Feature ranking:")
for f in range(X.shape[1]):
    print("{0}. feature {1}: {2}".format(f + 1, indices[f], importances[indices[f]]))

The above code will give the following output:

Feature ranking:

  1. feature 3: 0.7275378616791859
  2. feature 5: 0.04782038837422823
  3. feature 1: 0.029892177779750752
  4. feature 7: 0.02936523022130539
  5. feature 4: 0.02884410936359628
  6. feature 9: 0.028589731215027326
  7. feature 0: 0.02791793639608188
  8. feature 2: 0.027551788416172958
  9. feature 6: 0.026578769169484397
  10. feature 8: 0.025902007385166802

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.