Supporting repository for:
"Agentic AI-Enhanced Digital Twins for Smart City Civil Infrastructure:
A Secure, Autonomous and Auditable Management Framework."
— Manuscript under review in PLOS ONE.
Note: This repository contains a Monte Carlo simulation framework designed to characterize the theoretical performance bounds of an Agentic Digital Twin architecture. It is a conceptual prototype, and metrics generated are synthetic representations based on parameterized distributions.
- Overview
- Configurations
- Degradation Model
- Parameter Calibration
- Detection Mechanisms
- Mitigation Success Model
- Pipeline Latency Model
- Dataset Specifications
- Statistical Analysis
- Reproducibility Protocol
- References
- License
This repository provides a fully reproducible simulation framework and synthetic dataset for evaluating monitoring architectures in smart city civil infrastructure systems.
Unlike parameter-table approaches, performance in this framework emerges from the physics-grounded simulation mechanics — detection latency and mitigation success are computed outcomes, not pre-assigned distributions.
| ID | Configuration | Detection Mechanism | Orchestration |
|---|---|---|---|
rules |
Baseline (Static) | Static threshold crossing | Manual/Rules |
dt |
DT Baseline | Kalman Filter (Predictive) | Automated Alert |
dt_single_agent |
Ablation 1 | Kalman Filter (Predictive) | Single-Agent Dispatch |
dt_multi_no_chain |
Ablation 2 | Multi-Agent Adaptive Loop | No Blockchain Audit |
agentic_full |
Proposed Model | Multi-Agent Adaptive Loop | Blockchain-Anchored Audit |
Each incident is generated from a discrete-time stochastic degradation process:
| Symbol | Definition |
|---|---|
D(t) ∈ [0, 1] |
Structural degradation index at timestep t |
α ~ N(α_mean, α_std) |
Per-run exponential drift coefficient representing fatigue accumulation (Paris & Erdogan, 1963; AASHTO LRFD, 2020) |
S(t) |
Poisson-gated shock: S(t) = max(N(μ_s, σ_s), 0) with probability λ_shock per step, else S(t) = 0 |
The simulation runs at Δt = 0.1 hr resolution. Sensor observations are corrupted by Gaussian noise ε ~ N(0, 0.02²), consistent with commercial strain gauge SHM systems (e.g., HBM QuantumX noise floor specifications).
Degradation rates are calibrated to produce timescales of 20–200 hours, consistent with observed fatigue crack growth rates in steel bridge members:
| Complexity | α_mean |
λ_shock |
μ_shock |
Reference |
|---|---|---|---|---|
| 🟢 Low | 0.0030 |
0.010 |
0.040 |
Mori & Ellingwood (1994) |
| 🟡 Medium | 0.0060 |
0.030 |
0.080 |
Frangopol et al. (2004) |
| 🔴 High | 0.0120 |
0.060 |
0.120 |
Strauss et al. (2008) |
Generates an alarm when the noisy sensor reading first exceeds a static threshold τ_rules = 0.70. This threshold approximates recommended intervention levels in standard infrastructure condition indices (e.g., FHWA bridge condition rating).
Implements a scalar Kalman filter (Kalman, 1960) tracking the structural state:
Prediction:
Update:
An alert fires when the projected state 15 steps ahead (1.5 hours) exceeds
The agentic layer transitions from deterministic thresholds to Uncertainty-Aware Risk Management. It utilizes the Digital Twin's covariance matrix
The agent maintains a shock-context memory and triggers a mitigation plan only when the PoF exceeds 15%. This approach minimizes "alarm fatigue" while ensuring high-risk incidents are addressed with probabilistic certainty, consistent with Bayesian decision theory in structural engineering (Mori & Ellingwood, 1994).
To evaluate the system under realistic operational stress, I implemented a Dynamic Cognitive Fatigue model based on Wickens' Multiple Resource Theory:
-
Workload-Latency Coupling: Operator pipeline latency is no longer static. It scales exponentially with the number of decisions per hour:
$$\Delta t_{\text{pipeline}} = \Delta t_{\text{base}} \cdot \exp(0.04 \cdot \max(W - 15, 0))$$ - Stress Threshold: Once workload exceeds 15 decisions/hour, operator response times increase rapidly, simulating cognitive saturation.
- Shedding Logic: The Agentic DT reduces this load by autonomous plan generation, keeping the operator in the "optimal performance" zone.
The framework includes a Total Lifecycle Cost (TLC) model to quantify the business case for Agentic Digital Twins:
| Cost Item | Value | Description |
|---|---|---|
COST_MITIGATION |
$12,000 | Cost of preventive structural maintenance |
COST_FAILURE |
$1,500,000 | Total cost of catastrophic structural collapse |
SYSTEM_OVERHEAD |
Variable | Operating cost (Blockchain/Audit/Compute) |
Total Cost =
📊 Result: While the Agentic DT has higher operating overhead (due to blockchain audit trails), it significantly reduces the Expected Annual Loss (EAL) by preventing rare but catastrophic failures, resulting in a >50% reduction in Total Lifecycle Cost compared to rule-based baselines.
Mitigation success is not pre-assigned. For each incident, a Bernoulli trial is drawn with probability:
where margin_hours = time from detection to projected critical failure (D ≥ 0.85), and σ(·) is the logistic function.
| Detection Margin | P(success) | Outcome |
|---|---|---|
| 0 hr | ~11% | 🔴 Critical |
| 2.5 hr | ~50% | 🟡 Marginal |
| 5.0 hr | ~88% | 🟢 Good |
| 7.5 hr | ~96% (capped at 95%) | 🟢 Excellent |
Earlier detection → longer margin → higher success probability. Performance differences across configurations emerge directly from this mechanism.
Algorithmic detection time is augmented by operator/system pipeline latency (seconds), derived from human factors analysis of control room workflows (Hart & Staveland, 1988; NASA-TLX):
| Configuration | Pipeline Mean (s) | Pipeline SD (s) | Workflow Description |
|---|---|---|---|
rules |
42 | 8 | Sensor alert → manual dashboard review → phone dispatch |
dt |
18 | 4 | Automated alert → operator screen confirmation → dispatch |
dt_single_agent |
18 | 4 | Single-agent predictive alert → dispatch |
dt_multi_no_chain |
6 | 2 | Multi-agent plan → push notification → dispatch |
agentic_full |
6 | 2 | Autonomous multi-agent plan → push notification → dispatch |
Total latency = algorithmic detection delay + pipeline latency.
To evaluate the framework's robustness against adversarial conditions, the simulation includes a Sensor Spoofing Attack model:
- Attack Model: Malicious actors cap sensor readings at
τ_spoof = 0.35once structural degradation begins, masking the onset of failure (D ≥ 0.40). - Detection (Agentic Only): The agentic layer implements a Noise Floor Audit. It monitors the stochastic variance of the signal; because digital spoofing/capping results in an unnaturally flat signal (
σ_window < 0.5 · σ_noise), the agent flags a Data Integrity Violation. - Impact:
- Baseline Models: Fail to detect masked incidents, leading to
success = 0and catastrophic failure. - Agentic Framework: Detects the spoofing via physical-model decoupling and triggers a fail-safe mitigation plan.
- Baseline Models: Fail to detect masked incidents, leading to
File:
data/synthetic_agentic_dt_dataset.csv
5 configurations × 30 runs × 120 incidents = 18,000 incident records
| Column | Description | Type |
|---|---|---|
run_id |
Independent simulation run (0–29) | Integer |
config |
Configuration (rules, dt, agentic) |
Categorical |
incident_id |
Incident index within run (0–119) | Integer |
complexity |
Scenario complexity (low, medium, high) |
Categorical |
latency_s |
Total detection + pipeline latency (seconds) | Float |
success |
Mitigation success (1 = successful plan executed) | Boolean |
workload |
Operator workload (decisions/hour) | Float |
justified |
Blockchain-anchored audit trail present | Boolean |
alpha |
Per-run degradation drift coefficient | Float |
noise_sigma |
Shock magnitude noise parameter | Float |
is_attacked |
Sensor spoofing attack simulated (True/False) | Boolean |
attack_detected |
System successfully identified data tampering | Boolean |
pof |
Maximum Probability of Failure recorded at detection | Float |
fatigue_mult |
Cognitive fatigue multiplier applied to latency | Float |
total_cost |
Total economic cost of the incident ($) | Float |
window_var_feature |
Minimum windowed variance of sensor signal (ML feature) | Float |
The analysis script (scripts/analysis.py) produces a comprehensive battery of tests:
| # | Method | Purpose |
|---|---|---|
| 1 | Descriptive Statistics | Mean, SD, 95% CI per config × complexity |
| 2 | Shapiro-Wilk Test | Normality screening with non-parametric fallback |
| 3 | Welch's t-tests | Pairwise latency comparisons with Bonferroni correction |
| 4 | Mann-Whitney U | Non-parametric complement with rank-biserial r |
| 5 | Chi-squared Tests | Pairwise mitigation success rate comparisons |
| 6 | Two-way ANOVA | Type II SS — Config × Complexity for latency and success |
| 6b | Mixed-Effects Model | Linear Mixed Model accounting for run-level nesting |
| 7 | Tukey HSD Post-hoc | All pairwise group comparisons |
| 8 | Effect Sizes | Cohen's d (parametric) and rank-biserial r (non-parametric) |
| 9 | Sensitivity Analysis | Spearman ρ of α and σ with outcomes |
| 10 | Run-level Aggregation | Table 1 format for manuscript reporting |
| 11 | Cyber-Physical Resilience | ML-validated attack detection (Train/Test split) |
| 12 | Economic ROI | Total Lifecycle Cost comparison across architectures |
| 13 | Human Factors | Cognitive fatigue impact on pipeline latency |
| 14 | Reliability Validation | Out-of-sample RF classifier consistency check |
Python 3.10+
numpy 1.26.4
pandas 2.2.2
scipy 1.11.4
statsmodels 0.14.1
matplotlib 3.8.4
seaborn 0.13.2
scikit-learn 1.4.2
tqdm 4.66.1
agentic-dt-framework/
├── 📁 data/
│ └── synthetic_agentic_dt_dataset.csv
├── 📁 results/
│ ├── 📊 latency_boxplot.png
│ ├── 📊 success_rate_barplot.png
│ ├── 📊 workload_violinplot.png
│ ├── 📊 attack_detection_rate.png
│ ├── 📊 lifecycle_cost_comparison.png
│ ├── 📊 fatigue_impact_scatter.png
│ ├── 📄 resilience_analysis.json
│ └── 📄 economic_analysis.json
├── 📁 scripts/
│ ├── simulation.py
│ └── analysis.py
├── requirements.txt
└── README.md
# Install dependencies
pip install -r requirements.txt
# Regenerate dataset (deterministic, seed=42)
cd scripts
python simulation.py
# Run full statistical analysis (including Mixed-Effects Modeling and Logistic Regression)
# This will generate all plots and JSON files in the /results folder
python analysis.py✅ The regenerated dataset is guaranteed to exactly match the archived version (seed=42). To run specific ablation studies, modify the
CONFIGSlist insimulation.py.
Click to expand full reference list
- Endsley, M.R. (1995). Toward a theory of situation awareness in dynamic systems. Human Factors, 37(1), 32–64.
- Farrar, C.R. & Worden, K. (2012). Structural Health Monitoring: A Machine Learning Perspective. Wiley.
- Frangopol, D.M., et al. (2004). Maintenance, monitoring, safety, risk and resilience of deteriorating systems. J. Struct. Eng.
- Hart, S.G. & Staveland, L.E. (1988). Development of NASA-TLX. Human Mental Workload, 1, 139–183.
- Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. J. Basic Eng., 82(1), 35–45.
- Mori, Y. & Ellingwood, B.R. (1994). Maintaining reliability of concrete structures. J. Struct. Eng., 120(3), 824–845.
- Paris, P. & Erdogan, F. (1963). A critical analysis of crack propagation laws. J. Basic Eng., 85(4), 528–533.
- Strauss, A., et al. (2008). Stochastic finite elements and experimental investigations of the durability of concrete structures. Structural Safety, 30(5), 380–395.
Released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
You are free to share and adapt the material provided appropriate credit is given.
Built with ❤️ for reproducible civil infrastructure research · DOI: 10.5281/zenodo.18843087
