Refactor C++ logic to correctly handle NaN sentinel initializations by madhavgairola · Pull Request #1268 · NOAA-FIMS/FIMS

madhavgairola · 2026-03-05T19:19:00Z

This PR replaces the floating-point sentinel value -999 with proper NaN handling in the C++ codebase.

Summary of changes

Replaced floating-point sentinel initializations with std::numeric_limits<double>::quiet_NaN()
Updated conditional checks that previously compared against -999 to use std::isnan(...)
Added the required headers (<cmath>, <limits>) where needed
Updated unit tests where necessary:
- Replaced equality checks involving NaN with EXPECT_TRUE(std::isnan(...))
- Updated one floating-point comparison from EXPECT_DOUBLE_EQ to EXPECT_NEAR to account for small floating-point precision differences introduced by NaN initialization

Scope

Only floating-point sentinel values were refactored in this PR
Integer sentinel values (-999 used with integer types) were intentionally left unchanged, as the issue suggests handling them in a later refactor using std::optional

Validation

Project builds successfully
All unit tests pass locally (ctest → 100%)

Future work (not included in this PR)

JSON serialization support for NaN / Inf
R-side replacement of -999 with NA_real_
Removal of na_value after the integer refactor

github-actions · 2026-03-05T19:19:09Z

Thank you for contributing to FIMS and opening your first PR here! We are happy to have your contributions. Please ensure that the PR is made to the dev branch and let us know if you need any help! Also, we encourage you to introduce yourself to the community on the introduction thread in our Discussions.

kellijohnson-NOAA · 2026-03-05T20:51:02Z

@madhavgairola thanks for making these changes so quickly. I am wondering if I told you to format the code incorrectly though 🤦‍♀️ because it looks like many files changed just because of formatting and not because of actual changes that you made. Here are my instructions that I gave on the issue

clang-format -i --style="{BasedOnStyle: Google, SortIncludes: false}" $(find ./inst/include ./src ./tests/gtest -name "*.hpp" -o -name "*.cpp")

and for example there are changes where the header files are resorted even those SortIncludes: false is in the instructions. Do you know why there are so many formatting changes? Like I mentioned, I might have specified how to run clang-format incorrectly. If needed, I can try and run it on your branch to see if it reduces the number of changes but I thought I would ask you first. Thanks.

madhavgairola · 2026-03-05T21:02:36Z

Thanks for pointing that out
I ran clang-format from a Windows/PowerShell environment, so I couldn’t use the exact find command from the instructions and used a PowerShell equivalent instead. It looks like that ended up formatting more files than the ones I actually changed.I might also be on a slightly different clang-format version locally, which could explain the include reordering even with SortIncludes: false.
My goal was just to format the files I touched, but clearly it affected more than that. I’m happy to clean this up,either by reverting the formatting-only changes or by rerunning clang-format the way the repo expects.
If it’s easier, I’m also totally fine with you running clang-format on my branch. Just let me know what you’d prefer!

kellijohnson-NOAA · 2026-03-05T21:05:27Z

Why don't you try and minimize the formatting changes the best you can and then we will go from there 😁

madhavgairola · 2026-03-06T05:21:24Z

Thanks for the suggestion! I reverted the formatting-only changes so the PR now only includes the files with the actual logic updates. The diff should be much smaller now.

kellijohnson-NOAA · 2026-03-06T20:41:38Z

I installed your branch and changed an input value for landings in the data frame from a viable number to NA, NA_real_, and 0 and all three things led to the model not being able to optimize. So, a quick question. What value should we be using in R for missing values that should not be evaluated by the likelihood?

madhavgairola · 2026-03-07T04:07:26Z

I took a closer look at how the C++ side decides whether to skip evaluating data in the likelihood.
Right now data_object.hpp still defines na_value as -999. In the likelihood loops (for example in catch_at_age.hpp), missing observations are identified by comparing values against that sentinel, e.g. if (observed_data->at(i) != observed_data->na_value).
Since this PR intentionally left the integer sentinel refactor and na_value unchanged (based on the note in the issue about removing it after the integer refactor), the model still expects -999 to be passed from the R side in order to recognize and skip missing observations.
So to answer your question directly: for now it looks like -999 should still be used in R for missing values that shouldn't be evaluated by the likelihood. If NA_real_ or 0 is passed instead, the current checks won’t catch it, so the likelihood still tries to evaluate it, which seems to be what’s causing the optimization to fail.

kellijohnson-NOAA · 2026-03-10T12:01:08Z

Okay, I got back around to testing this and I am not getting a successful json. I ran a model with -999 values in the catch stream which used to lead to -inf in the json which was causing an error. I am still getting the following error message

Error: lexical error: invalid string in json text.
                        "uncertainty": nan,
                     (right here) ------^

because {jsonlite} cannot handle reading the file.

kellijohnson-NOAA · 2026-03-20T21:08:23Z

@madhavgairola I just wanted to check that you received my previous comment and what the status of this PR is?

madhavgairola · 2026-03-21T14:34:51Z

Thanks for the follow-up, and sorry for the delay — I missed your earlier message.

I see the issue now.I checked and found out that with the switch to NaN on the C++ side, those values are getting written directly into the JSON output, but since JSON doesn’t support NaN, it breaks parsing on the R side (jsonlite).

I haven’t handled the JSON serialization layer yet in this PR, so that part is definitely incomplete right now.

Would you prefer that I extend this PR to handle NaN/Inf serialization (e.g., mapping to null or a string), or should that be handled in a separate follow-up PR?

kellijohnson-NOAA · 2026-03-21T16:04:00Z

That would be great if we can do it all in one PR because we cannot merge it in if downstream workflows are breaking.

github-actions · 2026-03-22T05:27:37Z

🎨 Chore: code formatting workflow

Our automated workflows cannot run on forks because of permission issues, and thus, we ask that you run the following code locally and push any changes that are created to your feature branch. You will only be reminded of this once per PR. Thank you!

Format C++ code

Install clang-format version 18.0.0

Run the following command from the repository root:

clang-format -i --style="{BasedOnStyle: Google, SortIncludes: false}" $(find ./inst/include ./src ./tests/gtest -name "*.hpp" -o -name "*.cpp")

Format R code

Install {styler} and {roxygen2}
Run the following commands in R from the repository root:

styler::style_pkg() # Style R code
roxygen2::roxygenise() # Update documentation
styler::style_pkg() # Style R code again
roxygen2::roxygenise() # Update documentation again
usethis::use_tidy_description() # Style DESCRIPTION file

Push changes

Commit the formatting with a commit message of "Chore: format feature branch"
Push to your fork

madhavgairola · 2026-03-22T06:06:48Z

Hey! I just pushed a fix for the JSON serialization issue.Could you try it out and see if it works on your end?

kellijohnson-NOAA · 2026-03-29T18:57:41Z

Okay, I tested this again using

library(FIMS)
# Load sample data
data("data_big")

data_big[data_big$type == "landings" & data_big$timing == 2,"value"]<- NA_real_

# Prepare data for FIMS model
data_4_model <- FIMSFrame(data_big)

# Create parameters
parameters <- data_4_model |>
  create_default_configurations() |>
  create_default_parameters(data = data_4_model)

# Run the  model with optimization
fit <- parameters |>
  initialize_fims(data = data_4_model) |>
  fit_fims(optimize = FALSE)

Then, I had to augment reshape_json_output() with the following helper function instead of jsonlite::read_json()

read_json_with_invalid_numbers <- function(path, simplifyVector = FALSE) {
  txt <- paste(readLines(path, warn = FALSE, encoding = "UTF-8"), collapse = "\n")

  # Replace only unquoted invalid numeric tokens with JSON null
  txt <- gsub(
    '(?<!["A-Za-z0-9_])(?:NaN|nan|Infinity|-Infinity)(?!["A-Za-z0-9_])',
    "null",
    txt,
    perl = TRUE
  )

  jsonlite::fromJSON(txt, simplifyVector = simplifyVector)
}

I can now read the json but I am running into problems with other helpers inside of reshape_json_output(). Second, I cannot run the model with optimize = TRUE because it leads to NA with nlminb.

So, I am honestly not sure what we have gained by changing the code in the C++ because I am still running into all the same problems that I was before. Can you please explain in a little bit more plain language and less C++ speak what the changes you have made are trying to do so I make sure that I am testing the right thing?

madhavgairola · 2026-04-04T03:13:40Z

Thanks for running those tests and bringing this up. I can explain exactly what’s happening here, as there are two separate things colliding in your test script that are causing this confusion.

The NA_real_ issue (C++ Optimization Crash) As I mentioned in my previous message and noted in the PR description, this PR intentionally does not touch the R data inputs. The internal data sentinel that tells the C++ likelihood loop to skip an observation is still definitively hardcoded to -999. If you pass NA_real_ in R right now, the C++ engine doesn't recognize it as "missing data", tries to evaluate it as physics math, produces NaN in the objective function, and instantly breaks nlminb. To test this, you still have to pass -999 from R for missing inputs until the std::optional refactor handles the input streams in a later PR.
The JSON Parsing issue (String Coercion) You were totally correct that my recent C++ output updates were breaking the R downstream helpers! In my last commit, I updated the C++ to serialize uninitialized parameters and default uncertainties as "NaN" strings so jsonlite wouldn't crash. However, when jsonlite reads an array like [0.5, "NaN"], it forces the entire R array to become characters, which breaks your downstream numeric helpers inside reshape_json_output().

To fix this natively, I just pushed an update to the C++ code to output standard unquoted JSON null instead of text strings. R's jsonlite will now seamlessly convert the internal null emissions directly into NA without coercing your helper arrays into strings, so you shouldn't need your custom regex anymore.

Regarding the Scope of the Issue: On a separate note, I wanted to respectfully bring up the scope of this work. I originally picked this up precisely because it was labeled as a "good first issue" focusing strictly on C++ sentinel replacement. However, replacing the internal sentinels has inevitably cascaded into touching cross-language optimization bindings, R-side downstream architecture constraints, and JSON deserialization workflows. While I am happy to navigate these layers and help implement this properly, I would respectfully suggest removing the "good first issue" label, as the downstream complexity involved here is far beyond an intro-level task.

If you pull the newest commit and test the model using -999 for your R inputs, does the JSON output parse natively into your helpers?

madhavgairola force-pushed the fix-missing-values branch from 75c96c2 to 93475fb Compare March 6, 2026 05:10

kellijohnson-NOAA force-pushed the dev branch from 7736831 to 2e56884 Compare March 16, 2026 15:41

fix(JSON): output NaN and Inf as valid string literals

52c7851

madhavgairola force-pushed the fix-missing-values branch from 15e7e4c to 52c7851 Compare March 23, 2026 02:17

fix(JSON): output null for missing numerics to support jsonlite parsing

0eaccc8

Conversation

madhavgairola commented Mar 5, 2026

Summary of changes

Scope

Validation

Future work (not included in this PR)

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

kellijohnson-NOAA commented Mar 5, 2026

Uh oh!

madhavgairola commented Mar 5, 2026

Uh oh!

kellijohnson-NOAA commented Mar 5, 2026

Uh oh!

madhavgairola commented Mar 6, 2026

Uh oh!

kellijohnson-NOAA commented Mar 6, 2026

Uh oh!

madhavgairola commented Mar 7, 2026

Uh oh!

kellijohnson-NOAA commented Mar 10, 2026

Uh oh!

kellijohnson-NOAA commented Mar 20, 2026

Uh oh!

madhavgairola commented Mar 21, 2026

Uh oh!

kellijohnson-NOAA commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎨 Chore: code formatting workflow

Format C++ code

Format R code

Push changes

Uh oh!

madhavgairola commented Mar 22, 2026

Uh oh!

kellijohnson-NOAA commented Mar 29, 2026

Uh oh!

madhavgairola commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 22, 2026 •

edited

Loading