Skip to content

ReportFpgOutputForObservationalModel needs to only report the barcode alleles#159

Open
stitova-idm wants to merge 2 commits intoEMOD-Hub:mainfrom
stitova-idm:gh158_FPGObsModelReportFix
Open

ReportFpgOutputForObservationalModel needs to only report the barcode alleles#159
stitova-idm wants to merge 2 commits intoEMOD-Hub:mainfrom
stitova-idm:gh158_FPGObsModelReportFix

Conversation

@stitova-idm
Copy link
Collaborator

Added code to make sure that only the "neutral"/barcode alleles are being reported on in the report. We do not want to track other ones (such as PfEMP1, MSP, HRP2, Drug Resistance) if they are present.

…eing reported on in the report. We do not want to track other ones (such as PfEMP1, MSP, HRP2, Drug Resistance) if they are present.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates ReportFpgOutputForObservationalModel so the numpy allele/root outputs only include neutral “barcode” SNPs rather than the full parasite genome loci.

Changes:

  • Set reported genome dimension to the number of barcode SNP indices (instead of total base pairs).
  • Filter allele and root arrays to only include values at barcode indices when writing numpy output.
  • Add a fast-path for cases where the stored sequence/root arrays already appear to be barcode-only.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

};

write_data_func write_func_roots = [ this ]( size_t num_bytes, std::ofstream& file )
write_data_func write_func_roots = [ this , &r_barcode_indexes, &snp_data_size ]( size_t num_bytes, std::ofstream& file )
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both lambdas capture snp_data_size by reference even though it’s a local variable. Capturing it by value avoids potential lifetime issues (if the callable is stored/used later) and makes the capture intent clearer. Same applies to capturing r_barcode_indexes by reference if you don’t require it.

Copilot uses AI. Check for mistakes.
Comment on lines +314 to +323
if(r_barcode_indexes.size() == m_GenomeList[0].GetNucleotideSequence().size())
{
file.write( (char *)r_genome.GetNucleotideSequence().data(), genome_size );
bytes_written += genome_size;
// in case we are storing only barcode SNPs as the full nucleotide sequence
for( const ParasiteGenome& r_genome : m_GenomeList )
{
file.write( (char*)r_genome.GetNucleotideSequence().data(), snp_data_size );
bytes_written += snp_data_size ;
}
}
else
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using r_barcode_indexes.size() == GetNucleotideSequence().size() as a proxy for 'sequence is already barcode-only' can silently produce incorrect output if a non-barcode sequence happens to have the same length as the barcode list (or the storage format changes). A safer approach is to always select by r_barcode_indexes, or gate the fast-path behind an explicit/authoritative indicator (e.g., a config flag or a genome/parasite-genetics API that declares the stored representation is barcode-only).

Copilot uses AI. Check for mistakes.
// in case we are storing only barcode SNPs as the full nucleotide sequence
for( const ParasiteGenome& r_genome : m_GenomeList )
{
file.write( (char*)r_genome.GetNucleotideSequence().data(), snp_data_size );
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
release_assert( index < r_genome.GetNucleotideSequence().size() );
snps.push_back( r_genome.GetNucleotideSequence()[index] );
}
file.write( (char*)snps.data(), snp_data_size );
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
// in case we are storing only barcode SNPs as the full nucleotide sequence
for(const ParasiteGenome& r_genome : m_GenomeList)
{
file.write( (char*)r_genome.GetAlleleRoots().data(), snp_data_size );
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
release_assert( index < r_genome.GetAlleleRoots().size() );
snps.push_back( r_genome.GetAlleleRoots()[index] );
}
file.write( (char*)snps.data(), snp_data_size );
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
// --- file such that they can be read as a 2D array.
// --------------------------------------------------------------------------------
write_data_func write_func_alleles = [ this ]( size_t num_bytes, std::ofstream& file )
write_data_func write_func_alleles = [this, &r_barcode_indexes, &snp_data_size]( size_t num_bytes, std::ofstream& file )
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.
};

write_data_func write_func_roots = [ this ]( size_t num_bytes, std::ofstream& file )
write_data_func write_func_roots = [ this , &r_barcode_indexes, &snp_data_size ]( size_t num_bytes, std::ofstream& file )
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Malaria FPG: with "Var_Gene_Randomness_Type": FIXED_NEIGHBORHOOD , the var genes are added to variants.npy and roots.npy, but shouldn't be

2 participants