Skip to content

schmelling/data_intensive_science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Welcome to the course for Data Intensive Science

This repository contains the materials for a full semester (15 weeks) course about data analysis, reproducibility, and open science. Right now the course is in its planning phase, where I collect and create materials that I want to use for the course. The course is for anyone who is interested in learning about reproducible data analysis and data intensive research. I'll try to create the material in such a way that anyone can learn the content self-paced on their own. However, this course will also be taught at Heinrich Heine University for all interested students of all majors. The course is, however, officially intended for biology master students, but I will try to allow other students as well. Right now I have no idea, if the course will be accepted in other majors as well, expect for cross-disciplinary classes. If your a student interested in taking the class and you don't care about a transcript, you're always welcome. Contact information and information about date and location can be found on our lab website.

Syllabus

Lecture (90min per week)

  • Week 1: What is Data Intensive Science all about?
  • Week 1: Version Control, Git and GitHub
  • Week 2: What is Open Science and why is it important?
  • Week 2: Python/R (including Jupyter, PyCharm, RStudio and RMarkdown)
  • Week 3: Open Access and the Changing Face of Publication
  • Week 3: The Reproducibility Crisis
  • Week 4: Content, Code, and Data Licensing
  • Week 4: Data Management Plans
  • Week 5: Data Visualization 101
  • Week 6: What is a hypothesis? Intro to statistical thinking
  • Week 6: Descriptive Statistic
  • Week 7: T-Test
  • Week 8: ANOVA
  • Week 9: Regressions
  • Week 10: Clustering
  • Week 11: Intro to Machine Learning
  • Week 12: ML1
  • Week 13: ML2
  • Week 14: Real World Data Science
  • Week 15: Exam

Ideas

  • more statistic lectures
  • a lecture about data intensive methods in biology (depending on the group)

Practical Course

  • Week 1: Version Control with Git and GitHub
  • Week 2: Basic Python (Variables, Lists, Functions)
  • Week 3: Numpy
  • Week 4: Pandas Basics
  • Week 5: Graphics (Matplotlib, Seaborn, Bokeh)
  • Week 6: Advanced Pandas + Descriptive Stats
  • Week 7: Intro to R and RStudio with RMarkdown
  • Week 7: Py2R
  • Week 8: Stats (Scipy, Statsmodels)
  • Week 9: Regressions (Scikit-Learn)
  • Week 10: Clustering (Scikit-Learn)
  • Week 11+12: Rotation Based Learning: Initiator vs. Successor Phase
  • Week 13+14: Final Projects
  • Week 15: Presentation of Results

Contribution Instructions

Please read the Code of Conduct before working on any issue in the repository.

License

All the materials are released under the Creative Commons CC0 license. You can find more information here.

About

This repo contains the materials for a full semester (15 weeks) course about data analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors