ReportFpgOutputForObservationalModel needs to only report the barcode alleles#159

Open

stitova-idm wants to merge 2 commits intoEMOD-Hub:mainfrom

stitova-idm:gh158_FPGObsModelReportFix

Collaborator

stitova-idm commented Feb 5, 2026

Added code to make sure that only the "neutral"/barcode alleles are being reported on in the report. We do not want to track other ones (such as PfEMP1, MSP, HRP2, Drug Resistance) if they are present.


          Added code to make sure that only the "neutral"/barcode alleles are b…

f3f6417

…eing reported on in the report. We do not want to track other ones (such as PfEMP1, MSP, HRP2, Drug Resistance) if they are present.

stitova-idm requested review from Bridenbecker and Copilot

February 6, 2026 17:59

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull request overview

Updates ReportFpgOutputForObservationalModel so the numpy allele/root outputs only include neutral “barcode” SNPs rather than the full parasite genome loci.

Changes:

Set reported genome dimension to the number of barcode SNP indices (instead of total base pairs).
Filter allele and root arrays to only include values at barcode indices when writing numpy output.
Add a fast-path for cases where the stored sequence/root arrays already appear to be barcode-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

reporters/ReportFpgOutputForObservationalModel.cpp Outdated Show resolved Hide resolved

reporters/ReportFpgOutputForObservationalModel.cpp

                           };
-                          write_data_func write_func_roots = [ this ]( size_t num_bytes, std::ofstream& file )
+                          write_data_func write_func_roots = [ this , &r_barcode_indexes, &snp_data_size ]( size_t num_bytes, std::ofstream& file )

Copilot AI Feb 6, 2026

Both lambdas capture snp_data_size by reference even though it’s a local variable. Capturing it by value avoids potential lifetime issues (if the callable is stored/used later) and makes the capture intent clearer. Same applies to capturing r_barcode_indexes by reference if you don’t require it.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

Comment on lines +314 to +323

+                                  if(r_barcode_indexes.size() == m_GenomeList[0].GetNucleotideSequence().size())
                                   {
-                                      file.write( (char *)r_genome.GetNucleotideSequence().data(), genome_size );
-                                      bytes_written += genome_size;
+                                      // in case we are storing only barcode SNPs as the full nucleotide sequence
+                                      for( const ParasiteGenome& r_genome : m_GenomeList )
+                                      {
+                                          file.write( (char*)r_genome.GetNucleotideSequence().data(), snp_data_size );
+                                          bytes_written += snp_data_size ;
+                                      }
+                                  }
+                                  else

Copilot AI Feb 6, 2026

Using r_barcode_indexes.size() == GetNucleotideSequence().size() as a proxy for 'sequence is already barcode-only' can silently produce incorrect output if a non-barcode sequence happens to have the same length as the barcode list (or the storage format changes). A safer approach is to always select by r_barcode_indexes, or gate the fast-path behind an explicit/authoritative indicator (e.g., a config flag or a genome/parasite-genetics API that declares the stored representation is barcode-only).

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

+                                      // in case we are storing only barcode SNPs as the full nucleotide sequence
+                                      for( const ParasiteGenome& r_genome : m_GenomeList )
+                                      {
+                                          file.write( (char*)r_genome.GetNucleotideSequence().data(), snp_data_size );

Copilot AI Feb 6, 2026

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

+                                              release_assert( index < r_genome.GetNucleotideSequence().size() );
+                                              snps.push_back( r_genome.GetNucleotideSequence()[index] );
+                                          }
+                                          file.write( (char*)snps.data(), snp_data_size );

Copilot AI Feb 6, 2026

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

+                                      // in case we are storing only barcode SNPs as the full nucleotide sequence
+                                      for(const ParasiteGenome& r_genome : m_GenomeList)
+                                      {
+                                          file.write( (char*)r_genome.GetAlleleRoots().data(), snp_data_size );

Copilot AI Feb 6, 2026

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

+                                              release_assert( index < r_genome.GetAlleleRoots().size() );
+                                              snps.push_back( r_genome.GetAlleleRoots()[index] );
+                                          }
+                                          file.write( (char*)snps.data(), snp_data_size );

Copilot AI Feb 6, 2026

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp Outdated

    
                          // --- file such that they can be read as a 2D array.

                          // --------------------------------------------------------------------------------

                          write_data_func write_func_alleles = [ this ]( size_t num_bytes, std::ofstream& file ) 

                          write_data_func write_func_alleles = [this, &r_barcode_indexes, &snp_data_size]( size_t num_bytes, std::ofstream& file )

Copilot AI Feb 6, 2026

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.

reporters/ReportFpgOutputForObservationalModel.cpp

                           };
-                          write_data_func write_func_roots = [ this ]( size_t num_bytes, std::ofstream& file )
+                          write_data_func write_func_roots = [ this , &r_barcode_indexes, &snp_data_size ]( size_t num_bytes, std::ofstream& file )

Copilot AI Feb 6, 2026

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.


          Update reporters/ReportFpgOutputForObservationalModel.cpp

47ff836

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet