Skip to content

alcantarar/literature_update

Repository files navigation

Biomechanics Literature Update

This repository has been migrated to https://github.com/alcantarar/BiomchBERT, where it is being loosely maintained.

Model_Accuracy

We use Machine Learning to predict the general topic of a biomechanics-related paper given its title. To accomplish this, we:

  1. Developed an HTML web scraper to extract the paper information and assigned paper topic from every Biomch-L Literature Update since 2010. (webscraper.py)
  2. Trained and compared multiple classification Machine Learning algorithms (keras_1.py & test_many_ML_algorithms_nn.ipynb)
  3. Created a python script (literature_search.ipynb) that:
    1. Searches PubMed for Biomechanics-related papers published in the past week,
    2. Uses the top-performing Machine Learning model (keras-1, a Deep Neural Network with 73.5% accuracy) to predict the paper topic for the week’s papers,
    3. Compiles papers, formats their citation, and organizes them by topic, saving to .md file here: Literature Updates.

Files

Assets

A neato gif.

Construct_Models

Contains the files to contstruct the models. Two main files keras_1.py and test_many_ML_algorithms_nn.ipynb.

  1. keras_1.py - Fits a deep neural network to data contained in Data. Saves the models into models. The vectorizer and label encoders are saved here as well.
  2. test_many_ML_algorithms_nn.ipynb - Fits multiple machine learning methods to the Data. Includes Multinomial Naive Payes, Logistic Regression, Stochastic Gradient Descent (SGD), Linear Support Vector Classification), and Multi-layer Perceptron Classifier. Saves the data into models. The vectorizer and label encoders are saved here as well.
  3. keras_eval.py - A small script to evaluate the keras neural network on test strings.

Data

Where the webscraped data is stored.

  1. RYANDATA.csv - The full csv file including paper number, Category/Topic, Authors, Title, Journal, Year, Volume and Issue, DOI, and Abstract. Named this way because Gary just thought he would hand the data off and not get really really caught up in this. Boy, was he wrong.
  2. RYANDATA_filt.csv - Has all the same headers as RYANDATA.csv, but filters out topics that represent less than 5% of the total papers.
  3. RYANDATA_filt_even.csv - An evenly downsampled (by topic) csv of RYANDATA_filt.csv. Each topic has the same number of representations in this csv.

Literature_Updates

Where weekly updates can be stored in markdown & csv format for publishing.

Models

Where all the model files are saved after being created.

  1. Keras_model - Location of all the Keras Neural Net files. Some neural net files are to large to upload to Git on their own so are split. Using 7-zip(Windows) or Keka (MacOS) you can recombine these files to create the model file and weights file.
  2. Many_ML_models - Location of all the many ML testing files are saved. The mpl file will need to be recombined using 7-zip/Keka similar to the Keras Neural Net files.

Plots

Model validation plots are saved here. Usually a confusion matrix.

Webscraper

The python file to scrape the Biomch-L forum.

literature_search.ipynb

Ipython Notebook to generate the literature update. Uses Biopython v1.73 to perfrom a literature search, then the a given ML model to classify the papers. Saves the results in a markdown file in literature update.

Unique Packages

  • BeautifySoup is used to scrape the web for the articles to feed into the ML models.
  • Keras and Scikit-learn are used to construct ML models.
  • Biopython is used to access PubMed. Requires version 1.73 or newer.

About

Performs a search of PubMed for biomechanics-related publications, categorizes them using a machine learning algorithm trained on ~20,000 publications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors