This repository gives simple, chemistry-based demonstrations of both an SQL and a No-SQL database structure. For each database structure, there exists an example notebook that shows how to:
- Initialize the database
- Load a schema and use it to validate example data
- Insert validated data into the database
- Query the database.
The repository contains five key parts:
notebooks: Directory containing the example notebooks that demonstrate file processing and using SQL and No-SQL databases.schema: Directory containing the schema for both SQL and No-SQL database examples.raw_data: Directory containing the raw data files for data that is inserted into the databases in the examples. There are two computational data files (one from the software Gaussian and one from the software Psi4) and one experimental potetiostat file.file_parser.py: File containing python code for extracting key values from the raw computational and experimental data files. These parsing codes are used in the examples. The processing_notebook.ipynb shows the coding principles behind the code in file_parser.pyexternal_resources.md: A list of external resources that give more specific details for setting up a database.
If you find this useful, please cite the following article -
Duke, R.; Bhat, V.; Risko, C. Data Storage Architectures to Accelerate Chemical Discovery: Data Accessibility for Individual Laboratories and the Community. Chemical Science 2022. https://doi.org/10.1039/d2sc05142g.