You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GitHub Issue Log: Support migrating existing R script to JupyterLab (R Docker): packages, paths, and DB connectivity
Context
We have a fully functional R script developed in RStudio for our analysis. We are requesting assistance or guidance on the most efficient and recommended path to adapt and utilize this existing R code within the new JupyterLab environment (R Docker image). We want to ensure we do not have to re-create our work. A meeting with a data engineer has not yet been scheduled; we would like to align this issue with that discussion when it is scheduled.
Reference script: shakya_sepsis_july2025-Long.R. Credentials are suppressed. Line numbers in this issue refer to the original script; the shared copy may differ slightly.
1. R package availability
Request: Please confirm whether the following R packages can be installed (or are pre-installed) in the R Docker image used in the ML Workspace / JupyterLab:
Package
Purpose in script
rstudioapi
Used only for script directory / working directory (see §2).
tidyverse
dplyr, tidyr, ggplot2, purrr, stringr, tibble, lubridate, etc.
data.table
rbindlist for combining batched query results.
DatabaseConnector
PostgreSQL connection and querySql for OMOP CDM.
caret
nearZeroVar for feature screening.
janitor
make_clean_names for variable names.
mice
Multiple imputation.
fpc
clusterboot, kmeansCBI for cluster stability.
FSA
dunnTest for post-hoc pairwise comparisons.
Note:DatabaseConnector may require Java/JDBC drivers for PostgreSQL. Please confirm whether Java is available in the Docker image and whether any additional driver setup is required.
Request (install from session): Are users allowed to install packages from CRAN from within the R session (e.g. via install.packages(..., repos = "http://cran.us.r-project.org") for updates or extra packages), or must all packages be pre-installed in the image?
2. RStudio-specific code (adaptation for JupyterLab)
The script currently relies on RStudio in two places and will need equivalent patterns in JupyterLab:
Working directory (lines 23–24)
Current:path <- dirname(rstudioapi::getActiveDocumentContext()$path) and setwd(path).
Issue: In JupyterLab there is no “active document path” in the same way.
Request: What is the recommended way to set the working directory in the R kernel (e.g., fixed project path, environment variable, or “notebook directory” equivalent) so we can replace this without breaking file paths?
Graphics device (lines 4–6)
Current:while (!is.null(dev.list())) { dev.off(dev.list()["RStudioGD"]) } to clear the RStudio graphics device.
Issue: JupyterLab may use a different graphics device.
Request: What is the recommended way to clear or close plots in the R kernel under JupyterLab?
3. Selective environment cleanup
The script uses selective removal of objects to keep a clean workspace and separation of concerns (e.g. rm(list = ls()[!ls() %in% c("path", "ipak", "conn", "connectionDetails")]) after DB setup, and a helper stable_clean_environment() that keeps only a defined set of names).
Request: What is the recommended way in the R kernel under JupyterLab to achieve the same (remove all objects except a chosen set), or should we rely on restarting the kernel or a different workflow?
4. File paths and output location
The script:
Creates and uses a subfolder outcomes/ for cached RDS files and figures.
Depends on a pre-existing file:outcomes/var_selection.rds (see script ~line 471). This file is read and must exist (or be generated) in the new environment.
Request: How does persistent storage and project-relative pathing work in the ML Workspace? Is there a project or home directory that persists across sessions, and how should we set the working directory so that paths like outcomes/... work consistently?
5. Database connectivity
The script connects to Azure PostgreSQL (OMOP CDM) using DatabaseConnector with host, user, password, and database name (script lines 31–44). Credentials are currently hardcoded.
Request:
Confirm that outbound connections to Azure PostgreSQL (or the provided OMOP database) are allowed from the JupyterLab/R Docker environment.
Recommend the supported way to provide credentials (e.g., environment variables, secrets service, or config file) so we can remove hardcoded credentials and comply with NIST/CHoRUS policies.
6. Cached / long-running steps
The script uses RDS files to cache heavy steps (measurements, prior conditions, imputation, cluster stability) so that re-runs do not repeat long queries or computations.
Request: Confirm that read/write access to a persistent directory (e.g. outcomes/) is supported and that such files persist across sessions (or that we can rely on a documented persistent volume/mount).
7. Progress bars
The script uses utils::txtProgressBar in batched DB steps (e.g. get_measurements_before_after_event_batched, get_prior_conditions_batched). In JupyterLab, text progress bars can behave differently (buffering, display).
Request: Do txtProgressBar (or other progress APIs) work as expected in the R kernel, or is there a recommended alternative for long-running steps?
8. Long-running and resource-heavy steps
The script includes heavy operations (large batched DB queries, mice with m = 20, clusterboot with B = 1000).
Request: Are there limits on cell runtime or memory for the R kernel (e.g. timeouts, OOM), and is there a recommended way to run long jobs (e.g. chunking, running as a batch script, or increasing resources)?
9. Reproducibility and “no re-creation of work”
Request:
Any documented “migration checklist” or template for moving an RStudio R script into JupyterLab (e.g., replace rstudioapi with X, use Y for working directory, use Z for plots).
Confirmation that the same R version and key package versions can be fixed (e.g., via Docker image tag or a lockfile) so our analysis remains reproducible.
10. External integrations (for tracking only)
We understand that integrations with VS Code, GitHub, or tools like Claude are not currently supported; we will log those as separate requests. This issue is focused only on running our existing R script inside the provided JupyterLab R environment.
Summary checklist for this issue
Install or allow installation of: rstudioapi, tidyverse, data.table, DatabaseConnector, caret, janitor, mice, fpc, FSA in the R Docker image.
Clarify whether users can install packages from CRAN from within the R session, or must rely on pre-installed packages.
Document or support working directory and script/notebook directory equivalent (replacement for rstudioapi::getActiveDocumentContext()$path).
Document recommended way to clear/display plots in R under JupyterLab (replace RStudioGD usage).
Document recommended way to selective environment cleanup in R under JupyterLab (remove all except a chosen set of objects), or recommend workflow (e.g. kernel restart).
Clarify persistent storage and project-relative paths for outcomes/ and other inputs/outputs.
Confirm Azure PostgreSQL (or OMOP DB) connectivity and credential management best practices.
Confirm Java/JDBC availability for DatabaseConnector if required.
Confirm progress bars (txtProgressBar or alternative) work as expected in the R kernel.
Clarify limits on cell runtime/memory and recommended approach for long-running jobs (chunking, batch script, or increased resources).
Provide or link to a migration guide for RStudio → JupyterLab R scripts.
GitHub Issue Log: Support migrating existing R script to JupyterLab (R Docker): packages, paths, and DB connectivity
Context
We have a fully functional R script developed in RStudio for our analysis. We are requesting assistance or guidance on the most efficient and recommended path to adapt and utilize this existing R code within the new JupyterLab environment (R Docker image). We want to ensure we do not have to re-create our work. A meeting with a data engineer has not yet been scheduled; we would like to align this issue with that discussion when it is scheduled.
Reference script: shakya_sepsis_july2025-Long.R. Credentials are suppressed. Line numbers in this issue refer to the original script; the shared copy may differ slightly.
1. R package availability
Request: Please confirm whether the following R packages can be installed (or are pre-installed) in the R Docker image used in the ML Workspace / JupyterLab:
rbindlistfor combining batched query results.querySqlfor OMOP CDM.nearZeroVarfor feature screening.make_clean_namesfor variable names.clusterboot,kmeansCBIfor cluster stability.dunnTestfor post-hoc pairwise comparisons.Note:
DatabaseConnectormay require Java/JDBC drivers for PostgreSQL. Please confirm whether Java is available in the Docker image and whether any additional driver setup is required.Request (install from session): Are users allowed to install packages from CRAN from within the R session (e.g. via
install.packages(..., repos = "http://cran.us.r-project.org")for updates or extra packages), or must all packages be pre-installed in the image?2. RStudio-specific code (adaptation for JupyterLab)
The script currently relies on RStudio in two places and will need equivalent patterns in JupyterLab:
Working directory (lines 23–24)
path <- dirname(rstudioapi::getActiveDocumentContext()$path)andsetwd(path).Graphics device (lines 4–6)
while (!is.null(dev.list())) { dev.off(dev.list()["RStudioGD"]) }to clear the RStudio graphics device.3. Selective environment cleanup
The script uses selective removal of objects to keep a clean workspace and separation of concerns (e.g.
rm(list = ls()[!ls() %in% c("path", "ipak", "conn", "connectionDetails")])after DB setup, and a helperstable_clean_environment()that keeps only a defined set of names).Request: What is the recommended way in the R kernel under JupyterLab to achieve the same (remove all objects except a chosen set), or should we rely on restarting the kernel or a different workflow?
4. File paths and output location
The script:
outcomes/for cached RDS files and figures.outcomes/var_selection.rds(see script ~line 471). This file is read and must exist (or be generated) in the new environment.outcomes/measurement_snapshot_72_after_before_batch_July2025.rdsoutcomes/selected_var_imputed_data_july2025.rdsoutcomes/cluster_results_1000_iter.rdsoutcomes/observed_imputed_distribution.pngoutcomes/cluster_results.pngoutcomes/DunningZtest.pngRequest: How does persistent storage and project-relative pathing work in the ML Workspace? Is there a project or home directory that persists across sessions, and how should we set the working directory so that paths like
outcomes/...work consistently?5. Database connectivity
The script connects to Azure PostgreSQL (OMOP CDM) using
DatabaseConnectorwith host, user, password, and database name (script lines 31–44). Credentials are currently hardcoded.Request:
6. Cached / long-running steps
The script uses RDS files to cache heavy steps (measurements, prior conditions, imputation, cluster stability) so that re-runs do not repeat long queries or computations.
Request: Confirm that read/write access to a persistent directory (e.g.
outcomes/) is supported and that such files persist across sessions (or that we can rely on a documented persistent volume/mount).7. Progress bars
The script uses
utils::txtProgressBarin batched DB steps (e.g.get_measurements_before_after_event_batched,get_prior_conditions_batched). In JupyterLab, text progress bars can behave differently (buffering, display).Request: Do
txtProgressBar(or other progress APIs) work as expected in the R kernel, or is there a recommended alternative for long-running steps?8. Long-running and resource-heavy steps
The script includes heavy operations (large batched DB queries,
micewithm = 20,clusterbootwithB = 1000).Request: Are there limits on cell runtime or memory for the R kernel (e.g. timeouts, OOM), and is there a recommended way to run long jobs (e.g. chunking, running as a batch script, or increasing resources)?
9. Reproducibility and “no re-creation of work”
Request:
rstudioapiwith X, use Y for working directory, use Z for plots).10. External integrations (for tracking only)
We understand that integrations with VS Code, GitHub, or tools like Claude are not currently supported; we will log those as separate requests. This issue is focused only on running our existing R script inside the provided JupyterLab R environment.
Summary checklist for this issue
rstudioapi::getActiveDocumentContext()$path).RStudioGDusage).outcomes/and other inputs/outputs.DatabaseConnectorif required.txtProgressBaror alternative) work as expected in the R kernel.