Skip to content

port2077/cfi-random-forests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Conditional Feature Importance for tree based models in python

Code for the implementation of conditional feature importance from the paper https://pmc.ncbi.nlm.nih.gov/articles/PMC2491635/)

Decision Trees / Random Forests

import pandas as pd
import joblib

model = joblib.load('model.joblib')
df = pd.read_csv('test_df.csv',index_col=False)
target = df['label'].copy()
test_df = df.drop('label', axis=1)
score = dt_cfi_scores(model,test_df,target) # rf_cfi_scores for random forest

XGBoost

XGBoost models are made of boosted trees so each tree is not indpendent from the last one. In order to generate the split points for the permutted dataset we cannot use the trees within the model. In this case, a decision tree trained on the same train dataset can be used to generate them. Then we evaluate the score of the XGBoost model on the val dataset and the permutted dataset.

import pandas as pd
import joblib
import xgboost as xgb

model = xgb.XGBClassifier()
model.load_model('xgboost.json')
dt = joblib.load('model.joblib')
df = pd.read_csv('test_df.csv',index_col=False)
target = df['label'].copy()
test_df = df.drop('label', axis=1)
score = xgb_cfi_scores(model,dt,test_df,target) 

About

Conditional variable importance for tree based models in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages