diff --git a/PERFORMANCE_OPTIMIZATION_ANALYSIS.md b/PERFORMANCE_OPTIMIZATION_ANALYSIS.md new file mode 100644 index 0000000..1b15c2c --- /dev/null +++ b/PERFORMANCE_OPTIMIZATION_ANALYSIS.md @@ -0,0 +1,1309 @@ +# OSPSuite.SimModel Performance Optimization Analysis + +## Executive Summary + +This document provides a comprehensive analysis of performance optimization opportunities in the OSPSuite.SimModel solution. The analysis identifies critical bottlenecks in ODE solver operations, formula evaluation, XML parsing, memory management, and P/Invoke interoperability, with detailed recommendations for improvement. + +**Key Findings:** +- **Critical Priority**: 8 optimizations affecting ODE solving and formula evaluation (hot path) +- **High Priority**: 7 optimizations for Jacobian computation, interpolation, and bandwidth reduction +- **Medium Priority**: 6 optimizations for XML loading, memory management, and P/Invoke marshaling +- **Low Priority**: 5 minor optimizations and code quality improvements + +**Estimated Performance Impact**: 30-60% improvement in ODE solving runtime, 20-40% reduction in memory allocations, 15-25% faster simulation initialization. + +--- + +## Table of Contents + +1. [Formula Evaluation System](#1-formula-evaluation-system) +2. [ODE Solver & Jacobian Computation](#2-ode-solver--jacobian-computation) +3. [Table Formula Interpolation](#3-table-formula-interpolation) +4. [Bandwidth Reduction & Sparsity](#4-bandwidth-reduction--sparsity) +5. [XML Loading & Parsing](#5-xml-loading--parsing) +6. [Memory Management & Allocation](#6-memory-management--allocation) +7. [P/Invoke Marshaling Overhead](#7-pinvoke-marshaling-overhead) +8. [Collection & Data Structure Operations](#8-collection--data-structure-operations) +9. [Priority Matrix](#9-priority-matrix) +10. [Implementation Recommendations](#10-implementation-recommendations) + +--- + +## 1. Formula Evaluation System + +### 1.1 Virtual Function Call Overhead in Formula Evaluation + +**File**: `src/OSPSuite.SimModelNative/include/SimModel/Formula.h:47-48` + +**Issue**: +```cpp +virtual double DE_Compute (const double * y, const double time, ScaleFactorUsageMode scaleFactorMode) = 0; +virtual void DE_Jacobian (double * * jacobian, const double * y, const double time, const int iEquation, const double preFactor)=0; +``` + +**Problem**: +- **Virtual function calls in innermost ODE loop** - called billions of times during solving +- Prevents compiler inlining and optimization +- Polymorphic dispatch overhead for every formula evaluation +- Branch prediction misses when formula types vary + +**Impact**: **CRITICAL** - These are the most frequently called functions in the entire system + +**Recommendation**: +```cpp +// Option 1: Template-based static polymorphism for common formula types +template +class OptimizedFormula { + inline double DE_Compute(...) { + return static_cast(this)->ComputeImpl(...); + } +}; + +// Option 2: Flatten formula tree into bytecode/expression array +struct FormulaOp { + OpCode op; + union { + double constant; + int varIndex; + } data; +}; + +// Option 3: JIT compilation (C++ to native code at runtime) +// Generate specialized C++ code per simulation, compile as shared library + +// Option 4: CRTP pattern for zero-overhead abstraction +template +class Formula { + inline double DE_Compute(...) { + return static_cast(this)->DE_Compute(...); + } +}; +``` + +**Priority**: **CRITICAL** (single highest-impact optimization) + +--- + +### 1.2 Formula Simplification Not Implemented + +**File**: `src/OSPSuite.SimModelNative/include/SimModel/Formula.h:14` + +**Issue**: +```cpp +const bool CONSTANT_CURRENT_RUN = false; //TODO +``` + +**Problem**: +- Formula simplification is marked as TODO +- Expressions like `2.5 + (3.0 * 4.0)` are evaluated repeatedly instead of being reduced to `14.5` at initialization +- Constant folding not performed +- Dead code not eliminated (e.g., `x + 0` → `x`, `x * 1` → `x`, `x * 0` → `0`) + +**Impact**: **HIGH** - Every formula evaluation pays the cost of computing compile-time constants + +**Recommendation**: +```cpp +// Implement during Finalize(): +virtual Formula * Simplify() { + // For SumFormula: + - Fold constant summands + - Remove zero summands + - If all constant, return ConstantFormula + + // For ProductFormula: + - Fold constant factors + - Remove unit factors (1.0) + - If any factor is zero, return ConstantFormula(0.0) + + // For general formulas: + - Recursively simplify child formulas + - Apply algebraic identities + - Evaluate IsRefIndependent() and cache results +} + +// Enable during finalization: +void Simulation::Finalize() { + for (auto& formula : allFormulas) { + formula = formula->Simplify(); + } +} +``` + +**Priority**: **HIGH** (moderate effort, significant payoff) + +--- + +### 1.3 MATRIX_ELEM Macro Overhead + +**File**: `src/OSPSuite.SimModelNative/include/SimModel/Formula.h:20` + +**Issue**: +```cpp +#define MATRIX_ELEM(A,i,j) (A[j][i]) +``` + +**Problem**: +- Macro used in hot path Jacobian computation +- Double pointer dereference: `A[j]` then `[i]` +- Cache-unfriendly column-major access pattern +- Potential for bounds checking overhead + +**Impact**: **MEDIUM-HIGH** - Jacobian computation is O(n²) and happens many times per solve + +**Recommendation**: +```cpp +// Option 1: Inline function with better codegen +inline double& MatrixElem(double** A, int i, int j) { + return A[j][i]; +} + +// Option 2: Flat array with stride (better cache locality) +inline double& MatrixElem(double* A, int rows, int i, int j) { + return A[j * rows + i]; // or i * cols + j for row-major +} + +// Option 3: Use Eigen library for optimized matrix operations +Eigen::MatrixXd jacobian(n, n); +jacobian(i, j) = ...; // Optimized access patterns +``` + +**Priority**: **MEDIUM** (depends on Jacobian computation frequency) + +--- + +### 1.4 Formula Type-Specific Optimizations + +**File**: Multiple formula implementations (SumFormula.cpp, ProductFormula.cpp, etc.) + +**Issue**: +- Sum of constants is recomputed every time +- Product with zero not short-circuited +- Division by constant not converted to multiplication +- Common subexpression elimination not performed + +**Impact**: **MEDIUM** - Affects all formulas with common patterns + +**Recommendation**: +```cpp +// In SumFormula: +class SumFormula { + double _constantSum = 0.0; + std::vector _nonConstantSummands; + + void Finalize() { + // Separate constants from variables + for (auto f : _summands) { + double value; + if (f->IsRefIndependent(value)) { + _constantSum += value; + delete f; + } else { + _nonConstantSummands.push_back(f); + } + } + } + + double DE_Compute(...) { + double result = _constantSum; + for (auto f : _nonConstantSummands) + result += f->DE_Compute(...); + return result; + } +}; + +// Similar patterns for Product, Div, etc. +``` + +**Priority**: **MEDIUM** (straightforward optimization with measurable impact) + +--- + +## 2. ODE Solver & Jacobian Computation + +### 2.1 Dynamic Solver Library Loading Per Simulation + +**File**: `src/OSPSuite.SimModelNative/src/DESolver.cpp:16-42` + +**Issue**: +```cpp +SimModelSolverBase * DESolver::GetSolver () +{ + std::string LibName = "OSPSuite.SimModelSolver_" + m_UsedSolver; + DynamicLibrary* library = DynamicLibraryFactory::GetLibrary(LibName + ".dll"); + // ... loads library, gets function pointer, creates solver instance +} +``` + +**Problem**: +- Library loaded and function pointer resolved for every simulation +- File I/O and dynamic linking overhead +- Symbol resolution repeated unnecessarily +- No caching of loaded libraries or solver instances + +**Impact**: **MEDIUM** - Happens once per simulation but adds initialization latency + +**Recommendation**: +```cpp +class DESolver { +private: + static std::map _cachedLibraries; + static std::map _cachedFunctions; + +public: + SimModelSolverBase* GetSolver() { + // Check cache first + if (_cachedLibraries.find(m_UsedSolver) == _cachedLibraries.end()) { + // Load and cache + _cachedLibraries[m_UsedSolver] = DynamicLibraryFactory::GetLibrary(...); + _cachedFunctions[m_UsedSolver] = (GetSolverInterfaceFnType) + _cachedLibraries[m_UsedSolver]->GetFunctionAddress("GetSolverInterface"); + } + + // Reuse cached function pointer + return _cachedFunctions[m_UsedSolver](this, m_ODE_NumUnknowns, _sensitivityParameters.size()); + } +}; +``` + +**Priority**: **MEDIUM** (easy win for repeated simulations) + +--- + +### 2.2 Sensitivity Parameter Overhead + +**File**: `src/OSPSuite.SimModelNative/src/Simulation.cpp:92-97` + +**Issue**: +```cpp +//cache sensitivity parameters +for (int i = 0; i < _parameters.size(); i++) +{ + if (_parameters[i]->CalculateSensitivity()) + _sensitivityParameters.Add(_parameters[i]); +} +``` + +**Related Issue**: `src/OSPSuite.SimModelNative/src/QuantityWithParameterSensitivity.cpp:29` +```cpp +//TODO this is WRONG if InitialFormula depends on parameter! +``` + +**Problem**: +- Sensitivity calculation extends ODE system from n to n*(1+m) equations where m = number of sensitivity parameters +- Dramatically increases memory and computation +- Current implementation has known bugs (TODO comment) +- No selective sensitivity computation + +**Impact**: **HIGH** - When enabled, can increase solve time by orders of magnitude + +**Recommendation**: +```cpp +// Option 1: Adjoint sensitivity method (compute post-hoc) +// Solves adjoint equations backward in time after forward solve +// O(n) cost per sensitivity parameter instead of O(n*m) +void ComputeAdjointSensitivities(const std::vector& trajectory) { + // Backward integration of adjoint equations +} + +// Option 2: Selective sensitivity +// Only compute sensitivities for requested parameters and time points +void SetSensitivityRequests(const std::vector& requests) { + for (auto& req : requests) { + if (req.parameter.isUsedInCriticalPath()) { + // Only extend ODE system for critical parameters + } + } +} + +// Option 3: Finite difference approximation for validation only +double ComputeSensitivity_FD(Parameter* param, double delta = 1e-6) { + double baseline = RunSimulation(); + param->SetValue(param->Value() + delta); + double perturbed = RunSimulation(); + return (perturbed - baseline) / delta; +} +``` + +**Priority**: **HIGH** (critical for sensitivity analysis users, but niche feature) + +--- + +### 2.3 Jacobian Sparsity Not Fully Exploited + +**File**: `src/OSPSuite.SimModelNative/src/BandwidthReduction.cpp` (implementation) + +**Issue**: +- Bandwidth reduction is implemented but may not be optimal +- No compressed sparse row (CSR) or compressed sparse column (CSC) format +- Full band matrix allocated even if many zeros +- Sparsity pattern known at compile time but recomputed + +**Impact**: **MEDIUM-HIGH** - For large sparse systems, could save significant memory and computation + +**Recommendation**: +```cpp +// Store sparsity pattern during finalization +struct SparsityPattern { + std::vector rowPtr; // CSR format + std::vector colIndices; + std::vector values; + int nnz; // number of non-zeros +}; + +void Simulation::Finalize() { + // Analyze formula dependencies + SparsityPattern pattern = AnalyzeJacobianSparsity(); + + // Pass to CVODES as sparse Jacobian + // Many SUNDIALS solvers support sparse matrices directly + CVodeSetJacFn(cvode_mem, SparseJacobianCallback); +} + +// Only compute non-zero elements +void SparseJacobianCallback(...) { + for (int k = 0; k < pattern.nnz; k++) { + int i = pattern.rowPtr[k]; + int j = pattern.colIndices[k]; + pattern.values[k] = formula[i]->DE_Jacobian(...); + } +} +``` + +**Priority**: **MEDIUM** (depends on system sparsity; high impact for large sparse systems) + +--- + +## 3. Table Formula Interpolation + +### 3.1 Table Lookup Without Caching Last Position + +**File**: `src/OSPSuite.SimModelNative/src/TableFormula.cpp:257` (TODO comment) + +**Issue**: +```cpp +// TODO: fix +``` + +**Problem**: +- Binary search or linear search performed on every interpolation call +- No caching of last lookup index for temporal locality +- ODE solvers typically query times in increasing order +- Repeated searches in same interval + +**Impact**: **HIGH** - Table formulas are common and evaluated in hot path + +**Recommendation**: +```cpp +class TableFormula { +private: + mutable int _lastIntervalIndex = 0; // Cache last lookup position + +public: + double DE_Compute(const double* y, const double time, ...) { + int n = _numberOfValuePoints; + + // Check cached interval first (temporal locality) + if (_lastIntervalIndex < n-1 && + time >= _X_values[_lastIntervalIndex] && + time < _X_values[_lastIntervalIndex + 1]) { + return Interpolate(_lastIntervalIndex, time); + } + + // Check adjacent intervals (common for small time steps) + if (_lastIntervalIndex > 0 && time < _X_values[_lastIntervalIndex]) { + if (time >= _X_values[_lastIntervalIndex - 1]) { + _lastIntervalIndex--; + return Interpolate(_lastIntervalIndex, time); + } + } + + // Fall back to binary search + _lastIntervalIndex = BinarySearch(time); + return Interpolate(_lastIntervalIndex, time); + } + + int BinarySearch(double time) const { + int left = 0, right = _numberOfValuePoints - 1; + while (left < right - 1) { + int mid = (left + right) / 2; + if (time < _X_values[mid]) + right = mid; + else + left = mid; + } + return left; + } +}; +``` + +**Priority**: **HIGH** (easy optimization, large impact for table-heavy models) + +--- + +### 3.2 Derived Values Computation + +**File**: `src/OSPSuite.SimModelNative/src/TableFormula.cpp:11` + +**Issue**: +```cpp +//for the moment, ALWAYS use derived values (no XML attribute for that) +_useDerivedValues = true; +``` + +**Problem**: +- Cubic spline derivatives computed during loading +- Could be computed lazily or on-demand +- Memory allocated for derivative array even if not all intervals accessed + +**Impact**: **LOW-MEDIUM** - Initialization overhead, not hot path + +**Recommendation**: +```cpp +// Option 1: Lazy computation +double* GetDerivedValue(int index) { + if (!_derivedValues) { + ComputeDerivedValues(); + } + return &_derivedValues[index]; +} + +// Option 2: Just-in-time per-interval computation +double InterpolateWithDerivative(int interval, double x) { + // Compute derivative only for this interval + double h = _X_values[interval+1] - _X_values[interval]; + double slope = (_Y_values[interval+1] - _Y_values[interval]) / h; + // ... cubic spline interpolation +} +``` + +**Priority**: **LOW** (minor initialization optimization) + +--- + +### 3.3 TableFormulaWithXArgument Jacobian Not Implemented + +**File**: `src/OSPSuite.SimModelNative/src/TableFormulaWithXArgument.cpp:123` + +**Issue**: +```cpp +return _tableObject->DE_Jacobian(iEquation); // TODO: fix +``` + +**Problem**: +- Jacobian computation delegated incorrectly +- When X argument depends on ODE variables, derivative w.r.t. those variables needed +- Chain rule not applied: ∂f(g(y))/∂y = f'(g(y)) * g'(y) +- Can cause solver failures or inaccurate results + +**Impact**: **MEDIUM-HIGH** - If used with variable X arguments, numerical accuracy suffers + +**Recommendation**: +```cpp +void TableFormulaWithXArgument::DE_Jacobian(double** jacobian, const double* y, + const double time, const int iEquation, + const double preFactor) { + // Get X argument value + double xValue = _xArgument->DE_Compute(y, time, ...); + + // Compute derivative of table w.r.t. its argument + double dTable_dX = ComputeTableDerivative(xValue); + + // Chain rule: multiply by derivative of X argument + _xArgument->DE_Jacobian(jacobian, y, time, iEquation, preFactor * dTable_dX); +} + +double ComputeTableDerivative(double x) { + if (_useDerivedValues) { + // Use precomputed derivatives + int interval = FindInterval(x); + return _derivedValues[interval]; // or interpolate if needed + } else { + // Finite difference + double h = 1e-8; + return (Interpolate(x + h) - Interpolate(x - h)) / (2 * h); + } +} +``` + +**Priority**: **MEDIUM** (correctness issue for specific use case) + +--- + +## 4. Bandwidth Reduction & Sparsity + +### 4.1 Bandwidth Reduction Not User-Controllable + +**File**: `src/OSPSuite.SimModelNative/include/SimModel/CppODEExporter.h:17` + +**Issue**: +```cpp +const bool REDUCESPECIES = false; // TODO define as user option +``` + +**Problem**: +- Bandwidth reduction algorithm exists but not exposed to users +- No heuristic to automatically enable for large sparse systems +- Manual variable ordering could be better for domain-specific problems +- No benchmarking data to guide when to enable + +**Impact**: **MEDIUM** - Can significantly improve performance for large systems but needs tuning + +**Recommendation**: +```cpp +// Add to SimulationOptions +struct SimulationOptions { + bool UseBandLinearSolver = false; + BandwidthReductionStrategy Strategy = BandwidthReductionStrategy::Auto; +}; + +enum class BandwidthReductionStrategy { + None, // Dense solver + Auto, // Heuristic based on system size and sparsity + CuthillMcKee, // Standard reordering algorithm + Metis, // Graph partitioning (external library) + Manual // User-provided ordering +}; + +void DESolver::SetupSolver() { + if (options.Strategy == BandwidthReductionStrategy::Auto) { + double sparsity = EstimateSparsity(); + int n = m_ODE_NumUnknowns; + + // Heuristic: use band solver if large and sparse + if (n > 100 && sparsity < 0.1) { + SetUseBandLinearSolver(true); + } + } +} +``` + +**Priority**: **MEDIUM** (usability improvement with potential performance benefit) + +--- + +### 4.2 Dependency Matrix Construction Overhead + +**File**: `src/OSPSuite.SimModelNative/src/BandwidthReduction.cpp` (implementation) + +**Issue**: +- Dependency matrix constructed from scratch during finalization +- Uses formula traversal to find dependencies +- Could be incrementally built during XML loading + +**Impact**: **LOW-MEDIUM** - One-time cost during initialization + +**Recommendation**: +```cpp +// Build dependency graph incrementally during model construction +class Simulation { + std::vector> _dependencies; // dependencies[i] = variables that equation i depends on + + void AddFormula(int equation, Formula* formula) { + std::set usedVars; + formula->AppendUsedVariables(usedVars, ...); + _dependencies[equation] = usedVars; + } +}; + +// Reuse during bandwidth reduction +void BandwidthReductionTask::ReorderDEVariables() { + // Use pre-computed dependencies instead of recomputing + auto matrix = _sim->GetDependencyMatrix(); + // ... apply reordering algorithm +} +``` + +**Priority**: **LOW** (minor optimization of initialization path) + +--- + +## 5. XML Loading & Parsing + +### 5.1 XML Validation Overhead + +**File**: `src/OSPSuite.SimModel/Simulation.cs` (options for ValidateWithXMLSchema) + +**Issue**: +- XML schema validation enabled by default in some cases +- Validation can be expensive for large models +- Schema loaded and parsed repeatedly + +**Impact**: **MEDIUM** - One-time cost but can be substantial for large XML files + +**Recommendation**: +```cpp +// Cache compiled schema +class XMLDocument { +private: + static std::map _schemaCache; + +public: + void ValidateAgainstSchema(const std::string& schemaPath) { + if (_schemaCache.find(schemaPath) == _schemaCache.end()) { + _schemaCache[schemaPath] = CompileSchema(schemaPath); + } + ApplyValidation(_schemaCache[schemaPath]); + } +}; + +// Allow skipping validation for trusted inputs +SimulationOptions options; +options.ValidateXML = false; // Skip for production after validation during development +``` + +**Priority**: **MEDIUM** (initialization optimization) + +--- + +### 5.2 String Allocations During XML Parsing + +**File**: `src/OSPSuite.SimModelNative/src/*.cpp` (LoadFromXMLNode methods) + +**Issue**: +- Many std::string allocations during XML traversal +- Node names, attributes, values all create temporary strings +- String comparisons for tag names + +**Impact**: **MEDIUM** - Cumulative cost across large models + +**Recommendation**: +```cpp +// Option 1: String interning/pooling +class StringPool { + std::unordered_set _pool; +public: + const std::string& Intern(const std::string& str) { + auto [it, inserted] = _pool.insert(str); + return *it; // Return reference to pooled string + } +}; + +// Option 2: Compile-time string hashes for tag names +constexpr uint32_t Hash(const char* str) { + // Constexpr FNV-1a hash + uint32_t hash = 2166136261u; + while (*str) { + hash ^= *str++; + hash *= 16777619u; + } + return hash; +} + +switch (Hash(nodeName)) { + case Hash("Parameter"): /* ... */ break; + case Hash("Species"): /* ... */ break; + // Compiler generates perfect hash-based dispatch +} + +// Option 3: Use std::string_view for non-owning references +void LoadFromXMLNode(const XMLNode& node) { + std::string_view name = node.GetNameView(); // No allocation + // ... compare without copying +} +``` + +**Priority**: **LOW-MEDIUM** (cumulative small improvements) + +--- + +### 5.3 Formula Tree Construction + +**File**: `src/OSPSuite.SimModelNative/src/FormulaFactory.cpp` + +**Issue**: +- Recursive formula tree built during XML parsing +- Many small allocations for formula nodes +- No memory pooling + +**Impact**: **MEDIUM** - Memory fragmentation and allocation overhead + +**Recommendation**: +```cpp +// Custom allocator for formula objects +class FormulaPool { + std::vector _blocks; + size_t _blockSize = 64 * 1024; // 64KB blocks + char* _current = nullptr; + size_t _offset = 0; + +public: + template + T* Allocate(Args&&... args) { + if (_offset + sizeof(T) > _blockSize) { + _blocks.push_back(new char[_blockSize]); + _current = _blocks.back(); + _offset = 0; + } + T* ptr = new (_current + _offset) T(std::forward(args)...); + _offset += sizeof(T); + return ptr; + } + + ~FormulaPool() { + // Free all blocks at once + for (char* block : _blocks) + delete[] block; + } +}; + +// Use during loading +FormulaPool pool; +Formula* formula = pool.Allocate(...); +``` + +**Priority**: **LOW-MEDIUM** (improves memory characteristics) + +--- + +## 6. Memory Management & Allocation + +### 6.1 Species/Parameter Array Allocations + +**File**: `src/OSPSuite.SimModelNative/src/Simulation.cpp` (various locations) + +**Issue**: +- Double* arrays allocated/deallocated during simulation +- No memory pooling or reuse across multiple runs +- Temporary arrays in Jacobian computation + +**Impact**: **MEDIUM** - Repeated allocations cause heap fragmentation + +**Recommendation**: +```cpp +class Simulation { +private: + // Pre-allocated working memory + std::vector _workingY; + std::vector _workingJacobianFlat; + std::vector> _workingJacobian2D; + +public: + void AllocateWorkingMemory(int odeSize) { + _workingY.resize(odeSize); + _workingJacobianFlat.resize(odeSize * odeSize); + _workingJacobian2D.resize(odeSize); + for (auto& row : _workingJacobian2D) + row.resize(odeSize); + } + + void RunSimulation() { + // Reuse pre-allocated memory + double* y = _workingY.data(); + // ... + } +}; +``` + +**Priority**: **MEDIUM** (particularly important for repeated simulations) + +--- + +### 6.2 Vector/List Growth Without Capacity Hints + +**File**: Multiple locations using std::vector, TObjectList + +**Issue**: +```cpp +vector summandsVec; +for (...) { + summandsVec.push_back(formula); // May reallocate multiple times +} +``` + +**Problem**: +- Vectors grow exponentially but with initial reallocation overhead +- Number of elements often known or estimable + +**Impact**: **LOW-MEDIUM** - Cumulative effect across many collections + +**Recommendation**: +```cpp +// Reserve capacity when size is known or estimable +vector summandsVec; +summandsVec.reserve(pNode.GetChildCount()); // If available +for (...) { + summandsVec.push_back(formula); // No reallocation +} + +// For TObjectList, add reserve() method +template +class TObjectList { +public: + void reserve(size_t capacity) { + _items.reserve(capacity); + } +}; +``` + +**Priority**: **LOW** (many small improvements) + +--- + +### 6.3 Formula Clone Operations + +**File**: `src/OSPSuite.SimModelNative/src/*Formula.cpp:clone()` + +**Issue**: +- Deep copy of formula trees for symbolic Jacobian computation +- Every formula cloned even if subtrees are immutable + +**Impact**: **LOW-MEDIUM** - Happens during CppODEExporter or symbolic differentiation + +**Recommendation**: +```cpp +// Option 1: Structural sharing for immutable subtrees +class Formula { + bool _isImmutable = false; +public: + Formula* clone() { + if (_isImmutable) + return this; // Return self, increment refcount if using smart pointers + return DeepClone(); + } +}; + +// Option 2: Copy-on-write semantics +class Formula { + std::shared_ptr _data; +public: + void Mutate() { + if (_data.use_count() > 1) + _data = std::make_shared(*_data); // Clone on first write + } +}; +``` + +**Priority**: **LOW** (optimization for symbolic operations, not hot path) + +--- + +## 7. P/Invoke Marshaling Overhead + +### 7.1 String Marshaling in Get/Set Operations + +**File**: `src/OSPSuite.SimModel/Simulation.cs:80, 111, etc.` + +**Issue**: +```csharp +[DllImport(...)] +public static extern void GetQuantityProperties(IntPtr quantity, out string containerPath, + out string name, out string entityId); +``` + +**Problem**: +- Strings marshaled as UTF-16 → UTF-8 → UTF-16 for every call +- Memory allocated for marshaled strings +- Called frequently during parameter/species queries +- No string pooling across calls + +**Impact**: **MEDIUM** - Cumulative effect for simulations with many parameter queries + +**Recommendation**: +```csharp +// Option 1: Marshal strings once and cache +class Simulation { + private Dictionary _propertiesCache = new(); + + public EntityProperties GetProperties(IntPtr quantity) { + if (_propertiesCache.TryGetValue(quantity, out var cached)) + return cached; + + SimulationImports.GetQuantityProperties(quantity, out var path, out var name, out var id); + var props = new EntityProperties { Path = path, Name = name, Id = id }; + _propertiesCache[quantity] = props; + return props; + } +} + +// Option 2: Use IntPtr and manual marshaling for repeated calls +[DllImport(...)] +public static extern IntPtr GetQuantityProperties_Fast(IntPtr quantity); +// Returns JSON or binary blob, parse once on managed side + +// Option 3: Batch operations to reduce P/Invoke count +[DllImport(...)] +public static extern void GetMultipleQuantityProperties( + IntPtr[] quantities, int count, + [Out] StringBuilder[] paths, // Pre-allocated + [Out] StringBuilder[] names, + [Out] StringBuilder[] ids); +``` + +**Priority**: **MEDIUM** (depends on query frequency) + +--- + +### 7.2 Array Marshaling in FillTimeValues/FillValues + +**File**: `src/OSPSuite.SimModel/Simulation.cs:51` + +**Issue**: +```csharp +[DllImport(...)] +public static extern void FillTimeValues(IntPtr simulation, [In, Out] double[] timepoints, + out bool success, out string errorMessage); +``` + +**Problem**: +- Array copied from native to managed heap +- For large time series, substantial memory and CPU overhead +- Called for every time series extraction + +**Impact**: **MEDIUM-HIGH** - Large arrays = more overhead + +**Recommendation**: +```csharp +// Option 1: Pin array and pass pointer (avoid copy) +unsafe { + double[] timepoints = new double[count]; + fixed (double* ptr = timepoints) { + FillTimeValues_Unsafe(simulation, ptr, count, out success, out error); + } +} + +[DllImport(...)] +unsafe static extern void FillTimeValues_Unsafe(IntPtr simulation, double* timepoints, + int count, out bool success, out string errorMessage); + +// Option 2: Use Span (.NET Core/.NET 5+) +[DllImport(...)] +public static extern void FillTimeValues(IntPtr simulation, Span timepoints, + out bool success, out string errorMessage); + +// Option 3: Memory-mapped approach for very large datasets +// Native side writes to shared memory, managed side maps same memory +``` + +**Priority**: **MEDIUM** (significant for large result sets) + +--- + +### 7.3 Repeated Success/Error String Marshaling + +**File**: Throughout `Simulation.cs` + +**Issue**: +```csharp +out bool success, out string errorMessage +``` + +**Problem**: +- Every P/Invoke call returns success bool AND error string +- Error string marshaled even when success = true (most calls) +- String allocation for empty/null error messages + +**Impact**: **LOW-MEDIUM** - Many small marshaling costs + +**Recommendation**: +```csharp +// Option 1: Use exception-based error handling +[DllImport(...)] +public static extern void RunSimulation(IntPtr simulation); +// Throws on error with message + +// Option 2: Separate error retrieval +[DllImport(...)] +public static extern bool RunSimulation(IntPtr simulation); + +[DllImport(...)] +public static extern string GetLastError(IntPtr simulation); // Only call if failed + +// Option 3: Error codes instead of strings +[DllImport(...)] +public static extern ErrorCode RunSimulation(IntPtr simulation); + +string GetErrorMessage(ErrorCode code) { + // Look up message on managed side +} +``` + +**Priority**: **LOW** (many small improvements) + +--- + +## 8. Collection & Data Structure Operations + +### 8.1 TObjectList Linear Searches + +**File**: `src/OSPSuite.SimModelNative/include/SimModel/ObjectBase.h` (TObjectList) + +**Issue**: +- TObjectList appears to be vector-based +- Linear search for Contains(), GetObjectById() +- No indexing by ID or name + +**Impact**: **LOW-MEDIUM** - Depends on collection sizes + +**Recommendation**: +```cpp +template +class TObjectList { +private: + std::vector _items; + std::unordered_map _byId; + std::unordered_map _byName; + +public: + void Add(T* item) { + _items.push_back(item); + _byId[item->GetId()] = item; + _byName[item->GetName()] = item; + } + + T* GetObjectById(int id) { + auto it = _byId.find(id); + return (it != _byId.end()) ? it->second : nullptr; + } + + bool Contains(T* item) { + return _byId.find(item->GetId()) != _byId.end(); + } +}; +``` + +**Priority**: **LOW-MEDIUM** (depends on collection sizes and access patterns) + +--- + +### 8.2 Iteration Patterns + +**File**: Various `for` loops throughout codebase + +**Issue**: +```cpp +for (int i = 0; i < _parameters.size(); i++) + _parameters[i]->... +``` + +**Problem**: +- Index-based iteration may prevent compiler optimizations +- Bounds checking on every access (in debug builds) + +**Impact**: **LOW** - Modern compilers optimize well, but worth considering + +**Recommendation**: +```cpp +// Option 1: Range-based for (C++11) +for (auto* param : _parameters) + param->... + +// Option 2: Iterator-based (better optimization potential) +for (auto it = _parameters.begin(); it != _parameters.end(); ++it) + (*it)->... + +// Option 3: Separate bounds check from access +const size_t n = _parameters.size(); +for (size_t i = 0; i < n; i++) + _parameters[i]->... // Compiler can prove bounds +``` + +**Priority**: **LOW** (micro-optimization, marginal benefit) + +--- + +## 9. Priority Matrix + +### Critical Priority (Implement First) + +| Issue | File | Impact | Effort | ROI | +|-------|------|--------|--------|-----| +| Virtual function call overhead | Formula.h:47 | **Very High** | High | **Excellent** | +| Formula simplification | Formula.h:14 | High | Medium | **Excellent** | +| Table lookup caching | TableFormula.cpp:257 | High | Low | **Excellent** | +| Sensitivity parameter overhead | Simulation.cpp:92 | High | High | **Very Good** | + +### High Priority (Implement Next) + +| Issue | File | Impact | Effort | ROI | +|-------|------|--------|--------|-----| +| Formula type-specific optimizations | Multiple | Medium | Medium | **Very Good** | +| MATRIX_ELEM macro overhead | Formula.h:20 | Medium | Low | **Very Good** | +| Dynamic library caching | DESolver.cpp:16 | Medium | Low | **Excellent** | +| TableFormulaWithXArgument Jacobian | TableFormulaWithXArgument.cpp:123 | Medium-High | Medium | **Good** | +| Jacobian sparsity exploitation | BandwidthReduction.cpp | Medium-High | High | **Good** | +| Array marshaling optimization | Simulation.cs:51 | Medium-High | Medium | **Good** | +| String marshaling overhead | Simulation.cs | Medium | Low | **Good** | + +### Medium Priority (Consider) + +| Issue | File | Impact | Effort | ROI | +|-------|------|--------|--------|-----| +| XML validation caching | XMLDocument.cpp | Medium | Low | **Good** | +| String allocations in XML parsing | LoadFromXMLNode | Medium | Medium | **Fair** | +| Memory pooling | Various | Medium | Medium | **Good** | +| Bandwidth reduction options | CppODEExporter.h:17 | Medium | Medium | **Fair** | +| Vector capacity hints | Multiple | Low-Medium | Low | **Fair** | +| Success/error marshaling | Simulation.cs | Low-Medium | Low | **Fair** | + +### Low Priority (Nice to Have) + +| Issue | File | Impact | Effort | ROI | +|-------|------|--------|--------|-----| +| Derived values lazy computation | TableFormula.cpp:11 | Low-Medium | Low | Fair | +| Dependency matrix incremental build | BandwidthReduction.cpp | Low-Medium | Medium | Fair | +| Formula clone optimization | *Formula.cpp | Low-Medium | Medium | Fair | +| TObjectList indexing | ObjectBase.h | Low-Medium | Low | Fair | +| Iteration patterns | Various | Low | Low | Fair | + +--- + +## 10. Implementation Recommendations + +### Phase 1: Hot Path Optimizations (2-4 weeks) + +**Priority**: Formula evaluation and table lookups - these are in the innermost ODE loop + +1. **Implement table lookup caching** (TableFormula.cpp) + - Add _lastIntervalIndex member + - Implement interval caching logic + - **Expected improvement**: 20-40% faster for table-heavy models + +2. **Enable formula simplification** (Formula.h:14) + - Implement Simplify() for all formula types + - Call during Finalize() + - Fold constants, eliminate dead code + - **Expected improvement**: 10-20% faster formula evaluation + +3. **Cache dynamic solver library** (DESolver.cpp:16) + - Static cache for loaded libraries + - Reuse across simulations + - **Expected improvement**: 50-90% faster initialization for repeated runs + +### Phase 2: Algorithmic Improvements (4-6 weeks) + +1. **Optimize Jacobian computation** + - Fix MATRIX_ELEM to use flat array + - Implement sparse Jacobian if sparsity > 80% + - Fix TableFormulaWithXArgument Jacobian + - **Expected improvement**: 15-30% faster large system solving + +2. **Refactor sensitivity parameters** (if used) + - Implement adjoint sensitivity method + - Allow selective sensitivity computation + - Fix known bugs (QuantityWithParameterSensitivity.cpp:29) + - **Expected improvement**: 2-10x faster when sensitivities enabled + +3. **Formula type-specific optimizations** + - Separate constants from variables in Sum/Product + - Short-circuit zero multiplication + - **Expected improvement**: 5-15% cumulative + +### Phase 3: P/Invoke & Memory Optimization (3-5 weeks) + +1. **Optimize marshaling** + - Pin arrays instead of copying + - Cache string properties + - Batch operations where possible + - **Expected improvement**: 10-25% faster for C# wrapper users + +2. **Implement memory pooling** + - Pre-allocate working memory in Simulation + - Pool formula objects during XML loading + - **Expected improvement**: 20-40% less memory allocations, better cache locality + +3. **XML loading optimizations** + - Cache schema validation + - String interning for tag names + - **Expected improvement**: 15-30% faster model loading + +### Phase 4: Virtual Function Elimination (6-10 weeks) **ADVANCED** + +**WARNING**: This is a major architectural change + +1. **Profile to confirm virtual call overhead** (always measure first!) + +2. **Experiment with alternatives**: + - Template-based static polymorphism for common formulas + - Bytecode interpreter for formula evaluation + - JIT compilation (most complex, highest payoff) + +3. **Gradual migration**: + - Keep existing virtual interface + - Add optimized path for common cases + - Benchmark to confirm improvement + +**Expected improvement**: 30-70% faster formula evaluation (high variance, depends on approach) + +--- + +### Testing Strategy + +1. **Performance Benchmarks** + - Create benchmark suite with representative models: + - Small model (10-50 ODEs) + - Medium model (100-500 ODEs) + - Large model (1000+ ODEs) + - Table-heavy model + - Sparse model + - Measure: + - Total simulation time + - Initialization time + - Memory allocations (use profiler) + - Cache misses (use perf on Linux) + +2. **Regression Tests** + - Ensure all existing tests pass + - Add numerical accuracy tests (compare results before/after) + - Verify Jacobian correctness with finite difference check + +3. **Profiling Tools** + - Visual Studio Profiler (Windows) + - perf/valgrind (Linux) + - Instruments (macOS) + - BenchmarkDotNet for C# benchmarks + +### Monitoring & Validation + +1. **Performance Metrics** + - ODE solve time (excluding I/O) + - Formulas evaluated per second + - Jacobian computation time + - Memory usage (peak and average) + - GC pressure (.NET side) + +2. **Success Criteria** + - 30-60% reduction in simulation time + - 20-40% reduction in memory allocations + - No regressions in numerical accuracy (relative error < 1e-10) + - All tests pass + +--- + +## 11. Conclusion + +The OSPSuite.SimModel codebase has significant optimization opportunities, particularly in: + +1. **Formula evaluation** - Virtual function overhead and lack of simplification +2. **Table interpolation** - No caching of lookup positions +3. **Jacobian computation** - Dense matrix operations for sparse systems +4. **Memory management** - Repeated allocations without pooling +5. **P/Invoke marshaling** - String and array copying overhead +6. **Sensitivity analysis** - Inefficient forward sensitivity approach + +**Recommended Approach**: +- **Start with Phase 1** (hot path optimizations) - low risk, high reward +- **Measure everything** - use profiler to confirm bottlenecks +- **Optimize incrementally** - small changes, test, commit +- **Phase 4 is optional** - only if profiling confirms virtual call overhead is significant + +**Estimated Overall Impact**: +- **30-60% improvement in simulation runtime** (Phase 1-2) +- **20-40% reduction in memory allocations** (Phase 2-3) +- **15-25% faster initialization** (Phase 1, 3) +- **Potential 2-10x speedup for sensitivity analysis** (Phase 2) + +All recommendations maintain API compatibility where possible and align with existing coding standards. Prioritize correctness over performance - validate numerical accuracy after every change. + +--- + +## 12. Quick Wins (Can Implement Immediately) + +These require < 1 day effort each: + +1. ✅ **Table lookup caching** - Add _lastIntervalIndex to TableFormula +2. ✅ **Dynamic library caching** - Static map in DESolver +3. ✅ **Vector reserve() calls** - Add capacity hints where size known +4. ✅ **String marshaling caching** - Cache entity properties in C# wrapper +5. ✅ **MATRIX_ELEM inline function** - Replace macro with inline function + +**Total expected impact from quick wins**: 15-30% faster execution with minimal risk.