ChemVisualizer is a Python-based utility designed to bridge the gap between chemical nomenclature and visual molecular representation. By integrating the PubChem API and the RDKit chemoinformatics library, this program automatically retrieves molecular structures from chemical names and renders them as clear, standardized 2D diagrams.
The inspiration for this project came from my personal experience in Organic Chemistry. I noticed that many students, including myself, often struggled to visualize complex molecules just by looking at their IUPAC names.
Manual drawing is time-consuming and prone to errors. I realized that a tool which could instantly convert a name into an accurate skeletal structure would not only help me study more effectively but also serve as a valuable resource for my classmates. This program was built to turn those abstract names into intuitive visual aids, making the learning of organic nomenclature more accessible for everyone.
- Automated Data Retrieval: Interfaces with the PubChem database via
pubchempyto fetch SMILES strings for various chemical compounds. - In-Memory Rendering: Implements a memory-based byte-stream (using
io.BytesIO) to handle image data. This allows the program to display structures instantly without creating temporary "junk" files on the hard drive. - Intelligent Visualization: Leverages
RDKitto calculate optimal 2D coordinates and generate publication-quality skeletal formulas with customizable atom indexing and bond styling.
To run ChemVisualizer locally, you need to set up Python and install the required library dependencies. We recommend doing this in a clean virtual environment.
First, clone this repository to your local machine and navigate into the project directory:
git clone [https://github.com/ryanqian-0724/ChemVisualizer.git](https://github.com/ryanqian-0724/ChemVisualizer.git)
cd ChemVisualizerThis project requires Python 3.x and the following external libraries:
- RDKit: For chemoinformatics calculations and 2D structure rendering.
- PubChemPy: For interacting with the PubChem API to retrieve chemical data.
- Pillow (PIL): For image processing and handling the generated diagrams.
Instead of installing each library manually, you can install all dependencies at once using the provided requirements.txt file. Run the following command in your terminal:
pip install -r requirements.txt- Clone this repository to your local machine.
- Ensure all dependencies are installed.
- Run the script:
python chem_visualizer.py- When prompted:
Enter the compound you want to visualize:, type a chemical name (e.g., "Ethanol", "Caffeine", or "2-methyl-2-propanol").
- Nomenclature Mapping: A significant challenge was handling the discrepancy between formal IUPAC names and common names (e.g., ethanamide vs acetamide). I learned how database indexing works and how to manage these variations to improve query success rates.
- Efficient Memory Management: To optimize performance, I chose to process images in RAM using byte-streams instead of writing to the disk. This "In-memory" approach avoids unnecessary Disk I/O and keeps the project directory clean.
- Chemoinformatics Algorithms: I explored RDKit's coordinate generation algorithms to ensure that 2D projections of 3D molecules remain clear and chemically accurate, even when including explicit hydrogen atoms.
This project is open-source and available under the MIT License. It is intended for educational purposes and personal use.
Developed by Haorui Qian as a chemistry learning tool.