-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi, thank you for the great work and for releasing the code.
I am trying to reproduce the results from the paper on the TCGA dataset, and I noticed an issue related to the GlobalCancerCellDensity feature. As far as I can tell, this feature corresponds to NoOfNuclei.CancerEpithelium based on the S5 appendix.
After running SlideFeatureExtractor.py, I see NoOfNuclei.CancerEpithelium reported in both
GlobalRoiBasedFeatures.csv, and
RoiFeatureSummary_Means.csv.
Since this is a global feature, my understanding is that the correct value should come from GlobalRoiBasedFeatures.csv. However, to be thorough, I checked both tables. Unfortunately, neither of them matches the value reported in Table S27 – TCGA-Feature-Raw in the appendix.
For example, for case TCGA-A8-A099, I obtain:
| Source | Value |
|---|---|
| GlobalRoiBasedFeatures.csv | 149255.0 |
| RoiFeatureSummary_Means.csv | 984.8576498111884 |
| Appendix S27 | 78256.02185 |
Neither computed value corresponds to the appendix value.
Because 78256.02185 is not an integer, it seems that some normalization (e.g., by area, ROI weighting, tissue fraction, etc.) must be applied. However, this normalization step does not appear to be documented in the paper or the code, so it is unclear how GlobalCancerCellDensity was actually computed.
Could you please clarify the exact computation used to derive GlobalCancerCellDensity in the paper?
Specifically:
Is GlobalCancerCellDensity directly equal to NoOfNuclei.CancerEpithelium, or was additional normalization applied?
If normalization is required, what variables are involved (e.g., tissue area, ROI area, number of patches, magnification, etc.)?
Which of the two tables (global or ROI means) corresponds to the value used in the published results?
A clarification on this would greatly help in reproducing the reported results.
Thank you again for providing the code and data!