Skip to content

Create genetics.py#109

Open
MayankMurali wants to merge 13 commits into
sheynkman-lab:mainfrom
MayankMurali:biosurfer-genet-1
Open

Create genetics.py#109
MayankMurali wants to merge 13 commits into
sheynkman-lab:mainfrom
MayankMurali:biosurfer-genet-1

Conversation

@MayankMurali
Copy link
Copy Markdown
Collaborator

Adding SQL table structure for genetics related data

Adding SQL table structure for genetics related data
@MayankMurali MayankMurali requested a review from Copilot January 5, 2026 20:44
@MayankMurali MayankMurali self-assigned this Jan 5, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new genetics module to define SQLAlchemy ORM models for storing genomic variant data, GWAS summary statistics, and patient genotype information. The module establishes the database schema for genetics-related analysis.

Key Changes:

  • Introduces three new ORM models: GenomicVariant, GWASStatistic, and SampleGenotype
  • Establishes relationships between variants and their associated GWAS statistics
  • Defines a VariantType class for categorizing genetic variants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,50 @@
from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Enum import from sqlalchemy is unused. The VariantType class is defined as a plain Python class (inheriting from str) rather than using SQLAlchemy's Enum type, so this import should be removed.

Suggested change
from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum
from sqlalchemy import Column, Integer, String, Float, ForeignKey

Copilot uses AI. Check for mistakes.
from sqlalchemy import Column, Integer, String, Float, ForeignKey, Enum
from sqlalchemy.orm import relationship
from biosurfer.core.models.base import Base, TablenameMixin
from biosurfer.core.models.nonpersistent import Position
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Position import is unused in this file and should be removed.

Suggested change
from biosurfer.core.models.nonpersistent import Position

Copilot uses AI. Check for mistakes.
from sqlalchemy.orm import relationship
from biosurfer.core.models.base import Base, TablenameMixin
from biosurfer.core.models.nonpersistent import Position
from biosurfer.core.constants import Strand
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Strand import is unused in this file and should be removed.

Suggested change
from biosurfer.core.constants import Strand

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +10
class VariantType(str):
SNP = "SNP"
INDEL = "INDEL"

Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VariantType class is defined but never used in the schema. If variant types need to be stored, consider adding a variant_type column to GenomicVariant using SQLAlchemy's Enum type. If this class is intended for future use, consider removing it until needed to avoid unused code.

Suggested change
class VariantType(str):
SNP = "SNP"
INDEL = "INDEL"

Copilot uses AI. Check for mistakes.
Stores summary statistics for a variant.
"""
id = Column(Integer, primary_key=True)
variant_id = Column(Integer, ForeignKey("genomicvariant.id"))
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The foreign key reference uses lowercase 'genomicvariant' but the actual table name from TablenameMixin would typically be 'genomic_variant' (snake_case). Verify that the TablenameMixin generates table names matching this reference, or update the foreign key to use the correct table name format.

Copilot uses AI. Check for mistakes.
Stores individual patient genotypes (from VCF).
"""
id = Column(Integer, primary_key=True)
variant_id = Column(Integer, ForeignKey("genomicvariant.id"))
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The foreign key reference uses lowercase 'genomicvariant' but the actual table name from TablenameMixin would typically be 'genomic_variant' (snake_case). Verify that the TablenameMixin generates table names matching this reference, or update the foreign key to use the correct table name format.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +50
class SampleGenotype(Base, TablenameMixin):
"""
Stores individual patient genotypes (from VCF).
"""
id = Column(Integer, primary_key=True)
variant_id = Column(Integer, ForeignKey("genomicvariant.id"))
sample_id = Column(String, index=True)
genotype = Column(String) # e.g., "0/1", "1/1"
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SampleGenotype model lacks a relationship back to GenomicVariant, making it difficult to efficiently query variants from a genotype. Consider adding a relationship field like variant = relationship('GenomicVariant') for bidirectional navigation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants