Skip to content

Merge TPR simulation function into Development#6

Open
swaraj-neu wants to merge 2 commits intodevelfrom
feat/tpr-function
Open

Merge TPR simulation function into Development#6
swaraj-neu wants to merge 2 commits intodevelfrom
feat/tpr-function

Conversation

@swaraj-neu
Copy link

@swaraj-neu swaraj-neu commented Mar 20, 2026

Transfer the TPR simulation function from MSstatsShiny and export it for external use

Motivation and Context

This PR moves the True Positive Rate (TPR) simulation and interactive plotting functionality from MSstatsShiny into MSstatsResponse and exports it for external use. The goal is to provide standalone functions for sweeping experimental designs (varying concentrations and replicates) to estimate detection power (TPR) and to visualize results interactively with plotly.

Short solution summary: two new exported functions were added — run_tpr_simulation(rep_range, n_proteins = 1000) to run grid simulations across concentration counts (2–9) and replicate counts (bounded to max 5), and plot_tpr_power_curve(simulation_results) to produce a two-panel interactive plotly visualization (Strong / Weak interactions). Package metadata and NAMESPACE were updated and man pages for the new functions were added.

Detailed Changes

  • New R source

    • Added R/TPR_Power_Curve.R defining:
      • run_tpr_simulation(rep_range, n_proteins = 1000)
        • Validates rep_range (numeric length-2, min ≤ max), enforces max replicates = 5.
        • Uses hardcoded CONC_MAP for concentration ladders (k = 2..9).
        • Builds grid over replicate levels and k values; calls futureExperimentSimulation() per combination with deterministic seeding.
        • Extracts and returns Hit_Rates_Data filtered to "TPR (Strong)" and "TPR (Weak)" with columns Interaction, TPR, N_rep, NumConcs.
        • Wraps individual runs in tryCatch and warns on failures; errors if all runs fail.
      • plot_tpr_power_curve(simulation_results)
        • Builds ggplot2 line+point panels (NumConcs × TPR) with linetype mapped to replicate count (max 5).
        • Requires plotly (stop() with message if missing), converts ggplots to plotly and composes a 1×2 subplot with shared Y axis and annotations.
        • Uses custom linetype mapping and theme_bw styling.
  • NAMESPACE

    • Exports: run_tpr_simulation, plot_tpr_power_curve
    • Added imports: dplyr::if_else, ggplot2::scale_linetype_manual, ggplot2::theme_bw (plus other ggplot2/dplyr functions referenced via roxygen imports in file).
  • DESCRIPTION

    • Added plotly to Suggests.
  • Documentation (man/)

    • Added man/run_tpr_simulation.Rd
    • Added man/plot_tpr_power_curve.Rd
  • Documentation removals

    • Removed man pages: ConvertGroupToNumericDose.Rd, DoseResponseFit.Rd, FutureExperimentSimulation.Rd, PredictIC50Parallel.Rd, VisualizeResponseProtein.Rd (these .Rd files are missing in the repo tree).

Unit Tests

  • No new unit tests were added for run_tpr_simulation() or plot_tpr_power_curve().
  • Existing test suite references TPR/hit-rate behavior (e.g., test-ExperimentalDesignSimulation.R) but there are no dedicated tests validating:
    • run_tpr_simulation argument validation, grid coverage, or error handling.
    • Correct filtering and column structure of run_tpr_simulation output.
    • plot_tpr_power_curve behavior when plotly is missing or when replicate levels exceed 5.
      Recommendation: add unit tests for input validation, a small deterministic simulation mock (or stub futureExperimentSimulation) to validate output structure/contents, and a test for plot_tpr_power_curve requiring plotly (or using requireNamespace mocking).

Coding Guidelines / Policy Violations

  • Documentation mismatch: multiple previously present .Rd files were removed (ConvertGroupToNumericDose, DoseResponseFit, FutureExperimentSimulation, PredictIC50Parallel, VisualizeResponseProtein) while their corresponding R implementations remain in source and exports appear unchanged. Exported functions should have corresponding documentation; removing .Rd files for still-exported functions creates a documentation/export inconsistency and violates the guideline that all exported/public functions must be documented. Either restore/regenerate these man pages or remove the associated exports/implementations as appropriate.
  • Hardcoded plotting limit: run_tpr_simulation enforces max replicates = 5 due to plotting linetype availability. This coupling of simulation input constraints to plotting aesthetics may be surprising; consider decoupling or documenting the rationale clearly.
  • External dependency handling: plot_tpr_power_curve requires plotly at runtime (requireNamespace check). Because plotly is only in Suggests, callers in non-interactive contexts may encounter runtime errors; acceptable if documented, but tests and examples should account for optional dependency.

@swaraj-neu swaraj-neu requested a review from tonywu1999 March 20, 2026 20:50
@swaraj-neu swaraj-neu self-assigned this Mar 20, 2026
@swaraj-neu swaraj-neu added the enhancement New feature or request label Mar 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f6b30687-ca8d-4042-a433-5603fb8d2d75

📥 Commits

Reviewing files that changed from the base of the PR and between 5a26bdc and 8b7eaf9.

📒 Files selected for processing (1)
  • R/TPR_Power_Curve.R
🚧 Files skipped from review as they are similar to previous changes (1)
  • R/TPR_Power_Curve.R

📝 Walkthrough

Walkthrough

Adds TPR power-curve simulation and interactive plotting: new R/TPR_Power_Curve.R with run_tpr_simulation() and plot_tpr_power_curve(), updates NAMESPACE and DESCRIPTION (adds plotly to Suggests), removes five existing man pages and adds two new man pages for the TPR functions.

Changes

Cohort / File(s) Summary
Package Configuration
DESCRIPTION
Added plotly to Suggests.
Namespace & Imports
NAMESPACE
Exported run_tpr_simulation, plot_tpr_power_curve; imported if_else (dplyr), scale_linetype_manual, theme_bw (ggplot2).
TPR Power Curve Implementation
R/TPR_Power_Curve.R
New file adding run_tpr_simulation(rep_range, n_proteins=1000) (sweeps replicate counts and concentration ladder sizes, calls futureExperimentSimulation, collects TPR results) and plot_tpr_power_curve(simulation_results) (builds ggplot panels, requires plotly::ggplotly, combines into interactive 1×2 subplot).
Removed Documentation
man/ConvertGroupToNumericDose.Rd, man/DoseResponseFit.Rd, man/FutureExperimentSimulation.Rd, man/PredictIC50Parallel.Rd, man/VisualizeResponseProtein.Rd
Five man pages deleted (documentation-only removals).
New Documentation
man/run_tpr_simulation.Rd, man/plot_tpr_power_curve.Rd
Added man pages documenting run_tpr_simulation() and plot_tpr_power_curve() (parameters, usage, return values).

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Runner as run_tpr_simulation
    participant Simulator as futureExperimentSimulation
    participant Plotter as plot_tpr_power_curve
    participant GG as ggplot2
    participant Plotly as plotly

    User->>Runner: rep_range, n_proteins
    loop for each (N_rep, NumConcs)
        Runner->>Simulator: invoke simulation (params, template)
        Simulator-->>Runner: Hit_Rates_Data
        Runner->>Runner: filter TPR Strong/Weak, add N_rep & NumConcs
    end
    Runner-->>User: consolidated data.frame

    User->>Plotter: simulation_results
    Plotter->>GG: create Strong panel
    GG-->>Plotter: ggplot (Strong)
    Plotter->>GG: create Weak panel
    GG-->>Plotter: ggplot (Weak)
    Plotter->>Plotly: ggplotly + subplot(1x2)
    Plotly-->>Plotter: interactive subplot
    Plotter-->>User: plotly object (1×2)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇✨ I hopped through ladders, counts, and code,
measuring TPR down each winding road.
Two panels now sparkle with interactive cheer —
Strong and Weak sing what testers want to hear.
— a rabbit celebrating plots and nodes 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Merge TPR simulation function into Development' is vague and generic, using 'Merge' and 'into Development' which are standard version-control operations that don't convey what actually changed. Use a more descriptive title like 'Add TPR power curve simulation and visualization functions' to clearly indicate the main additions to the codebase.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/tpr-function

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
R/TPR_Power_Curve.R (1)

34-46: Consider simplifying by removing the intermediate sim_args list.

The sim_args list is created and then immediately unpacked in the function call. This adds verbosity without benefit.

♻️ Simplified version
   run_one <- function(n_rep, k_conc, seed = 123) {
     set.seed(seed + n_rep * 100 + k_conc)
     concs_k <- CONC_MAP[[as.character(k_conc)]]
 
-    sim_args <- list(
-      N_proteins = n_proteins,
-      N_rep = n_rep,
-      Concentrations = concs_k,
-      IC50_Prediction = FALSE
-    )
-
-    temp_res <- futureExperimentSimulation(
-      N_proteins = sim_args$N_proteins,
-      N_rep = sim_args$N_rep,
-      Concentrations = sim_args$Concentrations,
-      IC50_Prediction = sim_args$IC50_Prediction
-    )
+    temp_res <- futureExperimentSimulation(
+      N_proteins = n_proteins,
+      N_rep = n_rep,
+      Concentrations = concs_k,
+      IC50_Prediction = FALSE
+    )
     temp_res$Hit_Rates_Data |>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 34 - 46, The sim_args list is unnecessary
boilerplate: remove the sim_args variable and call futureExperimentSimulation
directly with the original variables (n_proteins, n_rep, concs_k, and FALSE) by
passing them to the corresponding parameters N_proteins, N_rep, Concentrations,
and IC50_Prediction in the futureExperimentSimulation call so you only use the
function name futureExperimentSimulation and the original symbols n_proteins,
n_rep, concs_k, and FALSE.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/TPR_Power_Curve.R`:
- Around line 91-92: The linetype mapping currently uses a fixed vector ltypes
and assigns ltype_values <- setNames(ltypes[seq_along(rep_levels)],
as.character(rep_levels)), which will produce NAs when length(rep_levels) >
length(ltypes); change the assignment to repeat or recycle ltypes to cover all
replicate levels (e.g., generate a vector of length length(rep_levels) by
repeating ltypes with length.out = length(rep_levels) or using rep(ltypes,
length.out = length(rep_levels))) and then call setNames(...) so ltype_values
maps every element of rep_levels to a valid linetype; refer to the variables
ltypes, ltype_values and rep_levels when making the change.
- Around line 26-28: The function run_tpr_simulation currently assumes rep_range
is a two-element integer vector; add input validation at the top of
run_tpr_simulation to check that rep_range is a length-2 numeric (or integer)
vector with no NA/NaN/Inf values, both elements are whole numbers (or can be
safely coerced to integers), and rep_range[1] <= rep_range[2]; if any check
fails, stop with a clear error message mentioning rep_range and expected form
(e.g., "rep_range must be a length-2 integer vector with min <= max"). After
validation, coerce to integers (if needed) before creating rep_grid to avoid
downstream surprises when using rep_grid.

---

Nitpick comments:
In `@R/TPR_Power_Curve.R`:
- Around line 34-46: The sim_args list is unnecessary boilerplate: remove the
sim_args variable and call futureExperimentSimulation directly with the original
variables (n_proteins, n_rep, concs_k, and FALSE) by passing them to the
corresponding parameters N_proteins, N_rep, Concentrations, and IC50_Prediction
in the futureExperimentSimulation call so you only use the function name
futureExperimentSimulation and the original symbols n_proteins, n_rep, concs_k,
and FALSE.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0b0cfe25-885d-428f-890c-886741a75f85

📥 Commits

Reviewing files that changed from the base of the PR and between 8739ce4 and 5a26bdc.

📒 Files selected for processing (10)
  • DESCRIPTION
  • NAMESPACE
  • R/TPR_Power_Curve.R
  • man/ConvertGroupToNumericDose.Rd
  • man/DoseResponseFit.Rd
  • man/FutureExperimentSimulation.Rd
  • man/PredictIC50Parallel.Rd
  • man/VisualizeResponseProtein.Rd
  • man/plot_tpr_power_curve.Rd
  • man/run_tpr_simulation.Rd
💤 Files with no reviewable changes (5)
  • man/ConvertGroupToNumericDose.Rd
  • man/VisualizeResponseProtein.Rd
  • man/PredictIC50Parallel.Rd
  • man/DoseResponseFit.Rd
  • man/FutureExperimentSimulation.Rd

Comment on lines +26 to +28
run_tpr_simulation <- function(rep_range, n_proteins = 1000) {
k_grid <- as.integer(names(CONC_MAP))
rep_grid <- seq(rep_range[1], rep_range[2])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add input validation for rep_range.

The function assumes rep_range is a valid integer vector of length 2 with min <= max, but doesn't validate this. Invalid input could cause confusing errors or unexpected behavior.

🛡️ Proposed fix to add validation
 run_tpr_simulation <- function(rep_range, n_proteins = 1000) {
+  if (!is.numeric(rep_range) || length(rep_range) != 2) {
+    stop("rep_range must be an integer vector of length 2, e.g., c(2, 5).")
+  }
+  if (rep_range[1] > rep_range[2]) {
+    stop("rep_range[1] (min) must be <= rep_range[2] (max).")
+  }
   k_grid <- as.integer(names(CONC_MAP))
   rep_grid <- seq(rep_range[1], rep_range[2])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/TPR_Power_Curve.R` around lines 26 - 28, The function run_tpr_simulation
currently assumes rep_range is a two-element integer vector; add input
validation at the top of run_tpr_simulation to check that rep_range is a
length-2 numeric (or integer) vector with no NA/NaN/Inf values, both elements
are whole numbers (or can be safely coerced to integers), and rep_range[1] <=
rep_range[2]; if any check fails, stop with a clear error message mentioning
rep_range and expected form (e.g., "rep_range must be a length-2 integer vector
with min <= max"). After validation, coerce to integers (if needed) before
creating rep_grid to avoid downstream surprises when using rep_grid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant