Pindora_shield

Problem Statement

Drug discovery is a highly time-consuming and resource-intensive process. Existing computational tools for molecular generation and optimization often suffer from limited diversity, slow exploration, and poor usability for researchers. Despite recent advances in AI-driven molecular generation, there remains a gap between research models and practical, researcher-friendly systems capable of rapidly proposing novel, chemically valid drug candidates.

Live Project :

stat-vision.xyz

Demo Video

Check out idea overview video!

![Pindora Shield Working]

Detailed Description

Traditional drug discovery pipelines require years of experimentation and extensive trial-and-error, leading to high costs and delayed treatments. (-DrugPaatentWatch)
Existing AI-based molecular generators such as VAE and early GAN models often struggle with mode collapse, limited novelty, or unstable training. (-Royal Society of Chemistry)
Many recent transformer-based approaches demonstrate promising results in academic settings but lack validation, reproducibility, or integration into usable research workflows. (arxiv.org)
Researchers struggle to generate diverse, optimized, chemically valid molecules quickly, as models often produce invalid structures, fail synthesizability checks, or overlook multi-property optimization in practical drug pipelines. (ICLR Conference -openreview.org)

#Disease–Protein Interaction Network : Diseases are driven by complex biological networks, not isolated proteins.

A single disease can involve multiple protein–protein interactions, feedback loops, and pathways.
Targeting one protein can influence several downstream biological effects.
Motivation: Our system starts by identifying relevant target proteins associated with the disease before molecule generation

This justifies why target identification is a critical first step in our workflow. ** Lead Optimization via Molecular Variants **

In real-world drug discovery, a single lead molecule is rarely optimal.
Chemists generate multiple structural analogs of a lead compound.
Each analog can show drastically different potency (µM → nM).
Motivation: Instead of manually creating analogs, our system automatically

This motivates our TangGen-based alternative SMILES generation stage #** Target Mechanism** :

Diseases often arise from enzyme-catalyzed reactions (rate-limiting steps).
Inhibiting a specific enzyme can block disease progression (e.g., aromatase → estrogen → breast tumor)
Effective drugs must:
Bind the target enzyme
Reduce reaction rateMaintain chemical feasibility Motivation: Generated molecules must be evaluated not just structurally, but pharmacologically. This motivates multi-factor evaluation (IC50, association score, phase, target relevance

Overview :

The proposed system operates in four major stages:

1 -> Disease & Target Identification 2 -> Molecular Retrieval & Analysis 3 -> De Novo Molecule Generation 4 -> Comparative Evaluation & Optimization

Each stage is designed to reduce manual intervention while ensuring chemical validity, diversity, and optimization across multiple drug-relevant properties.

Work FLow

Pindora Shield follows a structured pipeline where disease-specific molecular candidates are generated and evaluated across multiple drug-relevant dimensions.

Step 1: Disease Context & Target Identification The system accepts disease-related inputs and identifies relevant target proteins using curated biomedical knowledge sources. This establishes biological relevance before molecular analysis begins.

Step 2: Molecular Processing (SMILES Representation) Known molecules linked to identified targets are retrieved and represented using standardized SMILES strings, providing a consistent molecular format for downstream analysis.

Step 3: De Novo Molecule Generation (TangGen) Based on the disease and target context, a generative model (TangGen) is used to generate alternative candidate molecules.

Generates diverse and novel SMILES strings
Ensures chemical validity and structural diversity
Explores chemical space beyond known drugs

This step produces a large pool of candidate molecular SMILES.

Step 4: Multi-Model Molecular Evaluation Each generated SMILES is evaluated individually using five specialized, trained models:

IC50 prediction (potency)
Target association score
Clinical phase likelihood
Target relevance (exploratory) Rather than using a single multi-task model, each property is predicted by an independent, optimized model, improving accuracy and interpretability.
- In effect, every SMILES generated by TangGen is passed one-by-one through all five models, producing a comprehensive property profile.

Step 5: Filtering & Shortlisting Predicted outputs are combined to filter low-potential candidates and shortlist molecules based on pharmacological relevance and feasibility.

Step 6: Comparative Analysis & Ranking Shortlisted candidates are compared with existing medicines, and a ranked comparative analysis is produced to support research-level decision making.

Failure Handling & System Reliability

Pindora Shield is designed with defensive failure-handling mechanisms to prevent system crashes and ensure graceful recovery during runtime issues.

Isolated Component Execution:
Generative models, predictive evaluators, backend APIs, and frontend rendering operate as independent modules. A failure in one component does not cascade across the entire system.
Graceful API Degradation:
Backend endpoints validate inputs and handle unexpected responses (e.g., invalid JSON, empty outputs) without crashing the service. Errors are returned as structured responses instead of unhandled exceptions.
Model-Level Fault Tolerance:
Each predictive model runs independently. If a single model fails or times out, the system continues evaluating other properties and flags partial results instead of terminating execution.
Frontend Error Guarding:
The frontend includes runtime checks for non-JSON or malformed backend responses and displays informative status messages instead of breaking the UI.
Stateless Backend Design:
APIs remain stateless, enabling safe retries, redeployment, or migration without data corruption or recovery complexity.

Together, these mechanisms ensure robustness, prevent full-system crashes, and allow controlled recovery during partial failures or infrastructure instability.

Dataset & Inputs

Public biomedical datasets (e.g., ChEMBL, Open Targets) used for disease–target and drug information
Molecular data represented using SMILES strings with curated physicochemical descriptors
Dataset cleaned, deduplicated, and structured for training multiple predictive models These enhancements will strengthen performance and enable scalable, system-level deployment.

Team Contributions

The development of Pindora Shield was a collaborative effort, with responsibilities distributed across model development, backend engineering, frontend implementation, and system integration.

Kashif

Led the development of the TenGAN / TangGen molecular generation model.
Worked on generative modeling, training, and validation of molecule generation pipelines.
Contributed to backend logic related to molecule generation and model execution.

Jagdish

Worked on machine learning evaluation models, including individual predictive models for molecular properties.
Assisted in backend integration for ML inference and result aggregation.
Contributed to frontend logic related to result handling and data flow.

Yash

Implemented major parts of the backend API layer.
Developed and integrated the 3D molecule visualization module, including WebGL-based molecule rendering.
Assisted with backend–frontend communication and testing.

Sameer

Led the development of the frontend application, including UI design and user interaction flows.
Handled frontend–backend integration and API consumption.
Supported integration, debugging, and cross-component testing across the system.

All team members collaborated where needed to resolve integration issues, debug system-level problems, and ensure smooth end-to-end functionality.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
3d_models		3d_models
Tengan		Tengan
__pycache__		__pycache__
data		data
docs		docs
frontend		frontend
matriX_model		matriX_model
models		models
routes		routes
tests		tests
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
flow.png		flow.png
git		git
image-1.png		image-1.png
image-2.png		image-2.png
image.png		image.png
main.py		main.py
package-lock.json		package-lock.json
pindora copy.py		pindora copy.py
pindora.py		pindora.py
requirements.txt		requirements.txt
tengan_overview.png		tengan_overview.png
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pindora_shield

Problem Statement

Live Project :

Demo Video

Detailed Description

Overview :

Work FLow

Failure Handling & System Reliability

Team Contributions

Kashif

Jagdish

Yash

Sameer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pindora_shield

Problem Statement

Live Project :

Demo Video

Detailed Description

Overview :

**Work FLow **

Failure Handling & System Reliability

Team Contributions

Kashif

Jagdish

Yash

Sameer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Work FLow

Packages