This project analyzes single-cell RNA sequencing (scRNA-seq) data from Ravindra et al. (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001143) to investigate the cellular response to COVID-19 infection.
Key Objectives:
- Characterize Cellular Dynamics: Understand changes in cellular composition during SARS-CoV-2 infection in Human Bronchial Epithelial Cells (HBECs).
- Gene Expression Mapping: Map the expression of ACE2 and other viral entry factors across distinct cell populations.
- Replication: Utilize a modified Scanpy pipeline to replicate key figures from the original study.
Experimental Design:
- Model: In vitro Human Bronchial Epithelial Cells (HBECs).
- Conditions: * Mock: No treatment (Control).
- 1 dpi: 1 Day Post-Infection.
- 2 dpi: 2 Days Post-Infection.
- 3 dpi: 3 Days Post-Infection.
The analysis was executed using a standard Scanpy pipeline within a Jupyter Notebook, divided into two distinct phases.
- Setup: Import necessary libraries (Scanpy, Anndata, bbknn, Decoupler, Pandas, etc.).
- QC & Filtering: * Assessed metrics: Mitochondrial (%MT), Ribosomal (%RB), and Hemoglobin (%HB) content.
- Removed low-quality cells:
%MT < 10. - Removed doublets/empty wells:
Gene counts > 200,Cells > 3.
- Removed low-quality cells:
- Feature Selection: Identification of Highly Variable Genes (HVGs).
- Dimensionality Reduction: Principal Component Analysis (PCA).
- Clustering: UMAP projection and Leiden clustering.
- Cell Annotation: Automated annotation using Decoupler and PanglaoDB (top-scoring cell type per cluster).
- Trajectory Inference: Tracked differentiation trajectories and ACE2 expression changes using PAGA.
-
Integration: Concatenated samples to create a merged AnnData object.
-
Batch Correction: Applied BB-kNN to align batches, mirroring the paper's methodology.
- Before batch correction
- After batch correction
- Before batch correction
-
Visualization & Plotting:
Figure: Barplot of proportional change in cell composition per sample.

We observed distinct shifts in cell population abundance across the infection timeline(see Figure above):
- Mock: Diverse epithelial landscape including Airway Epithelial, Goblet, Basal, Ciliated, Clara, and Ionocytes.
- Infection Response (1–3 dpi):
- 📈 Increased: Ciliated cells (dramatic increase), Goblet cells (peak at Day 2), and Neuroendocrine cells.
- 📉 Decreased: Basal cells (dramatic reduction) and general Airway Epithelial cells.
The observed population dynamics align with known physiological responses to SARS-CoV-2:
- Antiviral Defense: The acute increase in Goblet cells (mucus production to trap pathogens) and Ciliated cells (sweeping mucus/debris) suggests an active effort to clear the virus.
- Sensing & Signaling: The rise in Neuroendocrine cells indicates early sensing of viral abnormalities, potentially relaying signals to the immune system.
- Differentiation: The reduction in Basal cells (progenitors) concurrent with the rise in differentiated lineages (Ciliated/Goblet) suggests that stem-like cells are actively differentiating to replenish the epithelium and fight the infection.
Note on Discrepancies: Our analysis detected "Alveolar macrophages," "Pulmonary Alveolar Type I/II," and "Mesothelial cells." As the dataset is derived from a Bronchial Epithelial cell line, these are likely annotation artifacts arising from the reference database (PanglaoDB). Additionally, unlike the original paper, Tuft cells were not identified, likely due to differences in the marker database or resolution.
A) Figure: ACE2 and ENO2 levels with infection timeline
- ACE2 (Viral Entry): ACE2 serves as a reliable marker for infection susceptibility. While overall expression is low, the number of ACE2+ cells and expression levels increase over time compared to Mock samples.
- ENO2: In contrast to ACE2, ENO2 is highly expressed in Mock samples but is dramatically downregulated upon infection, showing an inverse correlation.
B) Figure: Trajectory and PAGA analysis for dynamic changes in ACE2 during day3 of infection

- Target Populations: Pseudotime analysis at 3 dpi reveals that Airway Goblet cells have the highest ACE2 abundance. Biologically, this suggests Goblet cells are primary targets for viral entry in this model; their neutralization may facilitate deeper tissue infiltration.
Key Takeaways:
- ✅ Dynamic Remodeling: Infection triggers a shift from stem-like Basal cells toward defensive lineages (Ciliated, Goblet).
- ✅ ACE2 Kinetics: Expression increases over the infection timeline, though overall levels remain low.
- ✅ Cell Specificity: ACE2 levels are higher in differentiated cells, identifying them as primary viral targets.
Future Directions:
- ⏳ Refine Annotation: Improve resolution to resolve ambiguous labels (e.g., "Epithelial cells," "Alveolar" artifacts).
- ⏳ Enrichment Analysis: Perform pathway enrichment to understand signaling mechanisms within specific clusters.
- ⏳ In Vivo Validation: Compare findings with models that capture the full physiological heterogeneity of the lung.
| Name | Version | Source | Description |
|---|---|---|---|
| scanpy | 1.11.5 | pypi | Core single-cell analysis |
| anndata | 0.11.4 | conda-forge | Data storage format |
| bbknn | 1.6.0 | pypi | Batch correction (Phase 2) |
| scrublet | 0.2.3 | pypi | Doublet detection |
| decoupler | 2.1.2 | pypi | Cell type annotation |
| scvi-tools | 1.3.3 | conda-forge | Probabilistic modeling |
| matplotlib | 3.10.7 | pypi | Plotting core |
| seaborn | 0.13.2 | pypi | Statistical data visualization |
| fa2-modified | 0.4 | pypi | ForceAtlas2 layout |
| igraph | 0.11.9 | pypi | Graph theory operations |
| umap-learn | 0.5.9.post2 | pypi | Dimensionality reduction |
| leidenalg | 0.10.2 | pypi | Community detection (Clustering) |
Full library composition is available in the scanpy_env.yml file.







