Create genetics.py#109
Conversation
Adding SQL table structure for genetics related data
There was a problem hiding this comment.
Pull request overview
This PR adds a new genetics module to define SQLAlchemy ORM models for storing genomic variant data, GWAS summary statistics, and patient genotype information. The module establishes the database schema for genetics-related analysis.
Key Changes:
- Introduces three new ORM models:
GenomicVariant,GWASStatistic, andSampleGenotype - Establishes relationships between variants and their associated GWAS statistics
- Defines a
VariantTypeclass for categorizing genetic variants
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,50 @@ | |||
| from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum | |||
There was a problem hiding this comment.
The Enum import from sqlalchemy is unused. The VariantType class is defined as a plain Python class (inheriting from str) rather than using SQLAlchemy's Enum type, so this import should be removed.
| from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum | |
| from sqlalchemy import Column, Integer, String, Float, ForeignKey |
| from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum | ||
| from sqlalchemy.orm import relationship | ||
| from biosurfer.core.models.base import Base, TablenameMixin | ||
| from biosurfer.core.models.nonpersistent import Position |
There was a problem hiding this comment.
The Position import is unused in this file and should be removed.
| from biosurfer.core.models.nonpersistent import Position |
| from sqlalchemy.orm import relationship | ||
| from biosurfer.core.models.base import Base, TablenameMixin | ||
| from biosurfer.core.models.nonpersistent import Position | ||
| from biosurfer.core.constants import Strand |
There was a problem hiding this comment.
The Strand import is unused in this file and should be removed.
| from biosurfer.core.constants import Strand |
| class VariantType(str): | ||
| SNP = "SNP" | ||
| INDEL = "INDEL" | ||
|
|
There was a problem hiding this comment.
The VariantType class is defined but never used in the schema. If variant types need to be stored, consider adding a variant_type column to GenomicVariant using SQLAlchemy's Enum type. If this class is intended for future use, consider removing it until needed to avoid unused code.
| class VariantType(str): | |
| SNP = "SNP" | |
| INDEL = "INDEL" |
| Stores summary statistics for a variant. | ||
| """ | ||
| id = Column(Integer, primary_key=True) | ||
| variant_id = Column(Integer, ForeignKey("genomicvariant.id")) |
There was a problem hiding this comment.
The foreign key reference uses lowercase 'genomicvariant' but the actual table name from TablenameMixin would typically be 'genomic_variant' (snake_case). Verify that the TablenameMixin generates table names matching this reference, or update the foreign key to use the correct table name format.
| Stores individual patient genotypes (from VCF). | ||
| """ | ||
| id = Column(Integer, primary_key=True) | ||
| variant_id = Column(Integer, ForeignKey("genomicvariant.id")) |
There was a problem hiding this comment.
The foreign key reference uses lowercase 'genomicvariant' but the actual table name from TablenameMixin would typically be 'genomic_variant' (snake_case). Verify that the TablenameMixin generates table names matching this reference, or update the foreign key to use the correct table name format.
| class SampleGenotype(Base, TablenameMixin): | ||
| """ | ||
| Stores individual patient genotypes (from VCF). | ||
| """ | ||
| id = Column(Integer, primary_key=True) | ||
| variant_id = Column(Integer, ForeignKey("genomicvariant.id")) | ||
| sample_id = Column(String, index=True) | ||
| genotype = Column(String) # e.g., "0/1", "1/1" |
There was a problem hiding this comment.
The SampleGenotype model lacks a relationship back to GenomicVariant, making it difficult to efficiently query variants from a genotype. Consider adding a relationship field like variant = relationship('GenomicVariant') for bidirectional navigation.
added db handling for VCF/GWAS data
attempt to map var to isoforms
adding n-term blocks check for var
PPARG example
added cli command for the feature
added pysam to lib
updated the function
Adding SQL table structure for genetics related data