Fed-FDR: Federated Feature Selection with False Discovery Rate Control

Outline

Description
Fed-FDR workflow
Repository layout
Software requirements
Reproducing the simulation studies
Reproducing the real data analysis
Notes on output and reproducibility
Support

1. Description

This repository contains code to reproduce the simulation studies and the real data analysis reported in the manuscript. All code is written in R. Results are written to .rds files and figures are produced from dedicated plotting scripts.

In this repository, we include a synthetic dataset sample_data_to_run.csv that was generated to approximate the structure of the real-world COVID-19 pediatric dataset. We generated a synthetic dataset with 3,990 patients across 34 clinical sites, containing 243 binary covariates and one binary outcome. The marginal distributions and correlation structure of the features were designed to resemble those of the original EHR dataswet, ensuring that the synthetic data are representative for testing and reproducing the analysis pipeline.

2. Fed-FDR Workflow

Stage I:
- Each collaborating site $k \in {1, \ldots, K}$ fits a GLM–Lasso to obtain its support $\hat{S}^{(k)}$, which is then shared with all other sites.
- Each collaborating site fits a refined de-sparsified Lasso using the aggregated support $\hat{S}^{(-k)} = \bigcup_{j \neq k} \hat{S}^{(j)}$.
- Each collaborating site transfers the resulting estimator $\hat{\beta}_{\hat{S}^{(-k)}}$ to the central site.
Stage II:
- The central site constructs mirror statistics to select the final support while controlling the FDR.
- NOTE: Privacy-Preserving Distributed Algorithms (PDA) is a framework of statistical and machine learning methods that enables secure analysis across multiple institutions without sharing individual patient data (IPD). In this document, we use PDA to refer to the central site.

3. Repository layout

Simulation studies

Folder: simulation_result

Scripts to run:

simulation_n500p500.R
simulation_n500p1000.R
simulation_n1000p500.R
simulation_scalebility.R

Real data application

Folder: use case

Main script:

Table1.R

Support file loaded by the main script:

Fed_simulation_functions.R

Sample dataset:

sample_data_to_run.csv

4. Software requirements

R version 4.4.1 or later.
RStudio is recommended for interactive work.
Base R packages only, unless a script prompts you to install an additional package.

5. Reproducing the simulation studies

Open R or RStudio.
Set the working directory to the repository root.

Run one or more of the simulation scripts listed above. For example:

source("simulation_result/simulation_n500p500.R")
source("simulation_result/simulation_n500p1000.R")
source("simulation_result/simulation_n1000p500.R")
source("simulation_result/simulation_scalebility.R")

Each script writes its outputs as .rds files inside simulation_result.

To recreate the figures in the manuscript, run:

source("simulation_result/Figure1.R")
source("simulation_result/Figure2.R")
source("simulation_result/Figure3.R")

6. Reproducing the real data analysis

Open R or RStudio.
Set the working directory to the folder use case.
Ensure the sample dataset sample_data_to_run.csv is present in the same folder.
Run the main script:
```
source("use case/Table1.R")
```
The file Fed_simulation_functions.R is sourced automatically by Table1.R.
Outputs are written as .rds files inside use case.
To produce the ROC figure from the manuscript, run:
```
source("use case/Figure4.R")
```

7. Notes on output and reproducibility

All scripts set their own random seeds when applicable. If you require exact replication, do not modify those seeds.
Figures are regenerated from the .rds result files. If you delete or relocate those files, recreate them by rerunning the corresponding simulation or analysis script.

8. Support

For questions about the code or the study design, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
simulation_result		simulation_result
use_case		use_case
Fed_FDR_workflow.png		Fed_FDR_workflow.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fed-FDR: Federated Feature Selection with False Discovery Rate Control

Outline

1. Description

2. Fed-FDR Workflow

3. Repository layout

Simulation studies

Real data application

4. Software requirements

5. Reproducing the simulation studies

6. Reproducing the real data analysis

7. Notes on output and reproducibility

8. Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fed-FDR: Federated Feature Selection with False Discovery Rate Control

Outline

1. Description

2. Fed-FDR Workflow

3. Repository layout

Simulation studies

Real data application

4. Software requirements

5. Reproducing the simulation studies

6. Reproducing the real data analysis

7. Notes on output and reproducibility

8. Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages