Skip to content

Milan933-coder/Molecular_Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Molecular Property Prediction β€” Radius of Gyration (Rg)

This repository contains code and methodology for predicting the Radius of Gyration (Rg) of molecules using multiple molecular representations and deep learning architectures. The goal is to leverage both graph-based and text-based molecular encodings to learn structural relationships driving polymer conformations.

πŸš€ Project Overview

We predict the Radius of Gyration (Rg) β€” a structural descriptor computed using RDKit β€” by combining multiple feature extraction methods from molecular data:

Graph-based representation: CMPNN (Communicative Message Passing Neural Network)

Text-based representation: ChemBERTa (Transformer trained on SMILES)

Fingerprint-based representation: Morgan Fingerprints

Rule-based features: RDKit Descriptors

These features are integrated and trained using a neural model for property regression.

🧩 Methodology 1️⃣ Data Input

Each molecule is represented by its SMILES string.

Rg values are computed directly from RDKit for supervised training.

2️⃣ Feature Extraction Feature Type Method Description Graph CMPNN Captures molecular graph topology and atomic interactions. Text ChemBERTa Encodes SMILES sequences using transformer embeddings. Fingerprint Morgan Fingerprints Circular substructure-based molecular representation. Rule-based RDKit Descriptors Physicochemical features such as MW, TPSA, HBA/HBD, etc. 3️⃣ Model Training

The extracted features are concatenated into a unified representation.

A neural regression head is trained to predict the Radius of Gyration (Rg).

Loss function: Mean Squared Error (MSE)

Optimizer: Adam

Training with K-Fold cross-validation to ensure robustness.

🧠 Model Architecture β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CMPNN Encoder β”‚ β”‚ (Graph Embeddings) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ChemBERTa Model β”‚ β”‚ (SMILES Encoding) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Morgan Fingerprints β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ RDKit Features β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Fully Connected NN β”‚ β”‚ β†’ Predicts Rg β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

About

In this ibynb file i have use ChemBerta for extracting the features from the smile.then CMPNN to extrctot the molecular graph,then use RDKit top Fetaures .these Features Are Selected based on the features in btw which the correlation is less than 0.7,and then morgan fingerprint.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors