Conditional Feature Importance for tree based models in python
Code for the implementation of conditional feature importance from the paper https://pmc.ncbi.nlm.nih.gov/articles/PMC2491635/)
import pandas as pd
import joblib
model = joblib.load('model.joblib')
df = pd.read_csv('test_df.csv',index_col=False)
target = df['label'].copy()
test_df = df.drop('label', axis=1)
score = dt_cfi_scores(model,test_df,target) # rf_cfi_scores for random forest
XGBoost models are made of boosted trees so each tree is not indpendent from the last one. In order to generate the split points for the permutted dataset we cannot use the trees within the model. In this case, a decision tree trained on the same train dataset can be used to generate them. Then we evaluate the score of the XGBoost model on the val dataset and the permutted dataset.
import pandas as pd
import joblib
import xgboost as xgb
model = xgb.XGBClassifier()
model.load_model('xgboost.json')
dt = joblib.load('model.joblib')
df = pd.read_csv('test_df.csv',index_col=False)
target = df['label'].copy()
test_df = df.drop('label', axis=1)
score = xgb_cfi_scores(model,dt,test_df,target)