Nice method.
I am examining the code and the thesis more closely as it appears to be very useful.
I don't fully understand the point of perturbation strategy and it's not fully expanded on in the thesis.
I started reading the code and I spotted some bugs.
Firstly
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L120
takes the strategy but ignores it, see:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L217
Also, I think that with constant_zero and median perturbation strategies this loop is redundant:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L205
As each run ignores random_sample_selected anyway, so each run should produce the same output_difference_col and total_difference. (because data_col_ptb and total_ptb_data are identical each run).
Finally, it would be great if you could explain more in the documentation the purpose of direct_input_pertubation_strategy. Is it necessary at all to "zero-out" a column? Why?
It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?
Many thanks for the code by the way!
Nice method.
I am examining the code and the thesis more closely as it appears to be very useful.
I don't fully understand the point of perturbation strategy and it's not fully expanded on in the thesis.
I started reading the code and I spotted some bugs.
Firstly
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L120
takes the strategy but ignores it, see:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L217
Also, I think that with constant_zero and median perturbation strategies this loop is redundant:
https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L205
As each run ignores
random_sample_selectedanyway, so each run should produce the sameoutput_difference_colandtotal_difference. (becausedata_col_ptbandtotal_ptb_dataare identical each run).Finally, it would be great if you could explain more in the documentation the purpose of
direct_input_pertubation_strategy. Is it necessary at all to "zero-out" a column? Why?It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?
Many thanks for the code by the way!