Skip to content

Refactor C++ logic to correctly handle NaN sentinel initializations#1268

Open
madhavgairola wants to merge 2 commits intoNOAA-FIMS:devfrom
madhavgairola:fix-missing-values
Open

Refactor C++ logic to correctly handle NaN sentinel initializations#1268
madhavgairola wants to merge 2 commits intoNOAA-FIMS:devfrom
madhavgairola:fix-missing-values

Conversation

@madhavgairola
Copy link
Copy Markdown

This PR replaces the floating-point sentinel value -999 with proper NaN handling in the C++ codebase.

Summary of changes

  • Replaced floating-point sentinel initializations with std::numeric_limits<double>::quiet_NaN()
  • Updated conditional checks that previously compared against -999 to use std::isnan(...)
  • Added the required headers (<cmath>, <limits>) where needed
  • Updated unit tests where necessary:
    • Replaced equality checks involving NaN with EXPECT_TRUE(std::isnan(...))
    • Updated one floating-point comparison from EXPECT_DOUBLE_EQ to EXPECT_NEAR to account for small floating-point precision differences introduced by NaN initialization

Scope

  • Only floating-point sentinel values were refactored in this PR
  • Integer sentinel values (-999 used with integer types) were intentionally left unchanged, as the issue suggests handling them in a later refactor using std::optional

Validation

  • Project builds successfully
  • All unit tests pass locally (ctest → 100%)

Future work (not included in this PR)

  • JSON serialization support for NaN / Inf
  • R-side replacement of -999 with NA_real_
  • Removal of na_value after the integer refactor

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 5, 2026

Thank you for contributing to FIMS and opening your first PR here! We are happy to have your contributions. Please ensure that the PR is made to the dev branch and let us know if you need any help! Also, we encourage you to introduce yourself to the community on the introduction thread in our Discussions.

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

@madhavgairola thanks for making these changes so quickly. I am wondering if I told you to format the code incorrectly though 🤦‍♀️ because it looks like many files changed just because of formatting and not because of actual changes that you made. Here are my instructions that I gave on the issue

clang-format -i --style="{BasedOnStyle: Google, SortIncludes: false}" $(find ./inst/include ./src ./tests/gtest -name "*.hpp" -o -name "*.cpp")

and for example there are changes where the header files are resorted even those SortIncludes: false is in the instructions. Do you know why there are so many formatting changes? Like I mentioned, I might have specified how to run clang-format incorrectly. If needed, I can try and run it on your branch to see if it reduces the number of changes but I thought I would ask you first. Thanks.

@madhavgairola
Copy link
Copy Markdown
Author

Thanks for pointing that out
I ran clang-format from a Windows/PowerShell environment, so I couldn’t use the exact find command from the instructions and used a PowerShell equivalent instead. It looks like that ended up formatting more files than the ones I actually changed.I might also be on a slightly different clang-format version locally, which could explain the include reordering even with SortIncludes: false.
My goal was just to format the files I touched, but clearly it affected more than that. I’m happy to clean this up,either by reverting the formatting-only changes or by rerunning clang-format the way the repo expects.
If it’s easier, I’m also totally fine with you running clang-format on my branch. Just let me know what you’d prefer!

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

Why don't you try and minimize the formatting changes the best you can and then we will go from there 😁

@madhavgairola
Copy link
Copy Markdown
Author

Thanks for the suggestion! I reverted the formatting-only changes so the PR now only includes the files with the actual logic updates. The diff should be much smaller now.

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

I installed your branch and changed an input value for landings in the data frame from a viable number to NA, NA_real_, and 0 and all three things led to the model not being able to optimize. So, a quick question. What value should we be using in R for missing values that should not be evaluated by the likelihood?

@madhavgairola
Copy link
Copy Markdown
Author

I took a closer look at how the C++ side decides whether to skip evaluating data in the likelihood.
Right now data_object.hpp still defines na_value as -999. In the likelihood loops (for example in catch_at_age.hpp), missing observations are identified by comparing values against that sentinel, e.g. if (observed_data->at(i) != observed_data->na_value).
Since this PR intentionally left the integer sentinel refactor and na_value unchanged (based on the note in the issue about removing it after the integer refactor), the model still expects -999 to be passed from the R side in order to recognize and skip missing observations.
So to answer your question directly: for now it looks like -999 should still be used in R for missing values that shouldn't be evaluated by the likelihood. If NA_real_ or 0 is passed instead, the current checks won’t catch it, so the likelihood still tries to evaluate it, which seems to be what’s causing the optimization to fail.

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

Okay, I got back around to testing this and I am not getting a successful json. I ran a model with -999 values in the catch stream which used to lead to -inf in the json which was causing an error. I am still getting the following error message

Error: lexical error: invalid string in json text.
                        "uncertainty": nan,
                     (right here) ------^

because {jsonlite} cannot handle reading the file.

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

@madhavgairola I just wanted to check that you received my previous comment and what the status of this PR is?

@madhavgairola
Copy link
Copy Markdown
Author

Thanks for the follow-up, and sorry for the delay — I missed your earlier message.

I see the issue now.I checked and found out that with the switch to NaN on the C++ side, those values are getting written directly into the JSON output, but since JSON doesn’t support NaN, it breaks parsing on the R side (jsonlite).

I haven’t handled the JSON serialization layer yet in this PR, so that part is definitely incomplete right now.

Would you prefer that I extend this PR to handle NaN/Inf serialization (e.g., mapping to null or a string), or should that be handled in a separate follow-up PR?

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

That would be great if we can do it all in one PR because we cannot merge it in if downstream workflows are breaking.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 22, 2026

🎨 Chore: code formatting workflow

Our automated workflows cannot run on forks because of permission issues, and thus, we ask that you run the following code locally and push any changes that are created to your feature branch. You will only be reminded of this once per PR. Thank you!

Format C++ code

  1. Install clang-format version 18.0.0
  2. Run the following command from the repository root:
    clang-format -i --style="{BasedOnStyle: Google, SortIncludes: false}" $(find ./inst/include ./src ./tests/gtest -name "*.hpp" -o -name "*.cpp")

Format R code

  1. Install {styler} and {roxygen2}
  2. Run the following commands in R from the repository root:
styler::style_pkg() # Style R code
roxygen2::roxygenise() # Update documentation
styler::style_pkg() # Style R code again
roxygen2::roxygenise() # Update documentation again
usethis::use_tidy_description() # Style DESCRIPTION file

Push changes

  1. Commit the formatting with a commit message of "Chore: format feature branch"
  2. Push to your fork

@madhavgairola
Copy link
Copy Markdown
Author

Hey! I just pushed a fix for the JSON serialization issue.Could you try it out and see if it works on your end?

@kellijohnson-NOAA
Copy link
Copy Markdown
Contributor

Okay, I tested this again using

library(FIMS)
# Load sample data
data("data_big")

data_big[data_big$type == "landings" & data_big$timing == 2,"value"]<- NA_real_

# Prepare data for FIMS model
data_4_model <- FIMSFrame(data_big)

# Create parameters
parameters <- data_4_model |>
  create_default_configurations() |>
  create_default_parameters(data = data_4_model)

# Run the  model with optimization
fit <- parameters |>
  initialize_fims(data = data_4_model) |>
  fit_fims(optimize = FALSE)

Then, I had to augment reshape_json_output() with the following helper function instead of jsonlite::read_json()

read_json_with_invalid_numbers <- function(path, simplifyVector = FALSE) {
  txt <- paste(readLines(path, warn = FALSE, encoding = "UTF-8"), collapse = "\n")

  # Replace only unquoted invalid numeric tokens with JSON null
  txt <- gsub(
    '(?<!["A-Za-z0-9_])(?:NaN|nan|Infinity|-Infinity)(?!["A-Za-z0-9_])',
    "null",
    txt,
    perl = TRUE
  )

  jsonlite::fromJSON(txt, simplifyVector = simplifyVector)
}

I can now read the json but I am running into problems with other helpers inside of reshape_json_output(). Second, I cannot run the model with optimize = TRUE because it leads to NA with nlminb.

So, I am honestly not sure what we have gained by changing the code in the C++ because I am still running into all the same problems that I was before. Can you please explain in a little bit more plain language and less C++ speak what the changes you have made are trying to do so I make sure that I am testing the right thing?

@madhavgairola
Copy link
Copy Markdown
Author

Thanks for running those tests and bringing this up. I can explain exactly what’s happening here, as there are two separate things colliding in your test script that are causing this confusion.

  1. The NA_real_ issue (C++ Optimization Crash) As I mentioned in my previous message and noted in the PR description, this PR intentionally does not touch the R data inputs. The internal data sentinel that tells the C++ likelihood loop to skip an observation is still definitively hardcoded to -999. If you pass NA_real_ in R right now, the C++ engine doesn't recognize it as "missing data", tries to evaluate it as physics math, produces NaN in the objective function, and instantly breaks nlminb. To test this, you still have to pass -999 from R for missing inputs until the std::optional refactor handles the input streams in a later PR.

  2. The JSON Parsing issue (String Coercion) You were totally correct that my recent C++ output updates were breaking the R downstream helpers! In my last commit, I updated the C++ to serialize uninitialized parameters and default uncertainties as "NaN" strings so jsonlite wouldn't crash. However, when jsonlite reads an array like [0.5, "NaN"], it forces the entire R array to become characters, which breaks your downstream numeric helpers inside reshape_json_output().

To fix this natively, I just pushed an update to the C++ code to output standard unquoted JSON null instead of text strings. R's jsonlite will now seamlessly convert the internal null emissions directly into NA without coercing your helper arrays into strings, so you shouldn't need your custom regex anymore.

Regarding the Scope of the Issue: On a separate note, I wanted to respectfully bring up the scope of this work. I originally picked this up precisely because it was labeled as a "good first issue" focusing strictly on C++ sentinel replacement. However, replacing the internal sentinels has inevitably cascaded into touching cross-language optimization bindings, R-side downstream architecture constraints, and JSON deserialization workflows. While I am happy to navigate these layers and help implement this properly, I would respectfully suggest removing the "good first issue" label, as the downstream complexity involved here is far beyond an intro-level task.

If you pull the newest commit and test the model using -999 for your R inputs, does the JSON output parse natively into your helpers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants