Skip to content

Keep meaningful categorical labels#70

Open
felixschmitz wants to merge 2 commits intomainfrom
keep-meaningful-categorical-labels
Open

Keep meaningful categorical labels#70
felixschmitz wants to merge 2 commits intomainfrom
keep-meaningful-categorical-labels

Conversation

@felixschmitz
Copy link
Collaborator

Closes #66

series=out["med_subjective_status_pequiv"],
value_for_comparison=5,
comparison_type="leq",
value_for_comparison=["Zufriedenstellend", "Weniger gut", "Schlecht"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in creating med_subjective_status_dummy_pequiv just came straight from the old repository. I just noticed that this is different to the creation of med_subjective_status_dummy_pl. To combine the two variables later on correctly, I adapted the definition of the variable med_subjective_status_dummy_pequiv here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely makes sense to harmonize them -- but why this way around and not the other?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using med_subjective_status_dummy_pequiv in the calculation for a frailty score. The other dummy variables there are 1/True if a medical condition is present, e.g. med_schwierigkeiten_anziehen_pequiv is True for individuals with the condition.

[
"med_schwierigkeit_treppen_pl",
"med_schwierigkeit_taten_pl",
"med_schwierigkeiten_treppen_dummy_pl",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use the _dummy_ versions of the two variables here, than the full scale. See also med_subjective_status_dummy_pl below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be reasonable -- but it seems to be a change relative to what we had before, so it would be useful to explain why you think so.

Piecing the evidence together, it seems like the previous thing was combining two incompatible variables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one, the previous version had most of the variables in the dummy representation indicating whether or not a condition is present. These two variables/conditions do not vary from the others, and should hence also use the dummy representation. Further, the previous version had values greater 1 in these two variables, "giving them more mass" in the calculation of the frailty score (mean of all medical condition variables provided).

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks! Comments are meant more in a way to clarify what information others (well, me) need in order to provide a review without digging up information rather than substantive issues.

[
"med_schwierigkeit_treppen_pl",
"med_schwierigkeit_taten_pl",
"med_schwierigkeiten_treppen_dummy_pl",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be reasonable -- but it seems to be a change relative to what we had before, so it would be useful to explain why you think so.

Piecing the evidence together, it seems like the previous thing was combining two incompatible variables?

series=out["med_subjective_status_pequiv"],
value_for_comparison=5,
comparison_type="leq",
value_for_comparison=["Zufriedenstellend", "Weniger gut", "Schlecht"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely makes sense to harmonize them -- but why this way around and not the other?

Copy link
Collaborator Author

@felixschmitz felixschmitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the handling of variables to calculate the frailty scores. What do you think about transforming all medical condition variables to dummies, omitting self-reported intensity categories?

series=out["med_subjective_status_pequiv"],
value_for_comparison=5,
comparison_type="leq",
value_for_comparison=["Zufriedenstellend", "Weniger gut", "Schlecht"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using med_subjective_status_dummy_pequiv in the calculation for a frailty score. The other dummy variables there are 1/True if a medical condition is present, e.g. med_schwierigkeiten_anziehen_pequiv is True for individuals with the condition.

[
"med_schwierigkeit_treppen_pl",
"med_schwierigkeit_taten_pl",
"med_schwierigkeiten_treppen_dummy_pl",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For one, the previous version had most of the variables in the dummy representation indicating whether or not a condition is present. These two variables/conditions do not vary from the others, and should hence also use the dummy representation. Further, the previous version had values greater 1 in these two variables, "giving them more mass" in the calculation of the frailty score (mean of all medical condition variables provided).

@hmgaudecker
Copy link
Collaborator

Thanks for the explanations -- they were exactly what I was looking for!

What do you think about transforming all medical condition variables to dummies, omitting self-reported intensity categories?

I would actually prefer it the other way around:

  1. Keep only the information-preserving variables in our pipeline
  2. Only convert them to dummies in the function calculating the frailty score.

out["med_schwierigkeit_treppen_dummy_pl"] = create_dummy(
series=out["med_schwierigkeit_treppen_pl"],
value_for_comparison=[1, 2],
out["med_schwierigkeiten_treppen_dummy_pl"] = create_dummy(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we want to merge with med_schwierigkeiten_treppen_pequiv in combine_modules/pequiv_pl.py, we have to define this variable here. Otherwise we would calculate it when calculating the pl frailty score and combining variables from the two modules.

Copy link
Collaborator

@hmgaudecker hmgaudecker Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So med_schwierigkeiten_treppen_pequiv is a dummy right from the start?

(in that case, I'd be seriously worried whether we actually want to combine the two variables)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, med_ variables in _pequiv are dummy, in _pl they are categorical variables with some intensity information (e.g. ["[3] Gar nicht", "[2] Ein wenig", "[1] Stark"], which we convert to a dummy where observations with ["[2] Ein wenig", "[1] Stark"] are coded as 1)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me how much we gain by combining the variables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Get rid of int_categorical dtype when n_outcomes <= 5

2 participants