This repository contains the materials for a full semester (15 weeks) course about data analysis, reproducibility, and open science. Right now the course is in its planning phase, where I collect and create materials that I want to use for the course. The course is for anyone who is interested in learning about reproducible data analysis and data intensive research. I'll try to create the material in such a way that anyone can learn the content self-paced on their own. However, this course will also be taught at Heinrich Heine University for all interested students of all majors. The course is, however, officially intended for biology master students, but I will try to allow other students as well. Right now I have no idea, if the course will be accepted in other majors as well, expect for cross-disciplinary classes. If your a student interested in taking the class and you don't care about a transcript, you're always welcome. Contact information and information about date and location can be found on our lab website.
- Week 1: What is Data Intensive Science all about?
- Week 1: Version Control, Git and GitHub
- Week 2: What is Open Science and why is it important?
- Week 2: Python/R (including Jupyter, PyCharm, RStudio and RMarkdown)
- Week 3: Open Access and the Changing Face of Publication
- Week 3: The Reproducibility Crisis
- Week 4: Content, Code, and Data Licensing
- Week 4: Data Management Plans
- Week 5: Data Visualization 101
- Week 6: What is a hypothesis? Intro to statistical thinking
- Week 6: Descriptive Statistic
- Week 7: T-Test
- Week 8: ANOVA
- Week 9: Regressions
- Week 10: Clustering
- Week 11: Intro to Machine Learning
- Week 12: ML1
- Week 13: ML2
- Week 14: Real World Data Science
- Week 15: Exam
Ideas
- more statistic lectures
- a lecture about data intensive methods in biology (depending on the group)
- Week 1: Version Control with Git and GitHub
- Week 2: Basic Python (Variables, Lists, Functions)
- Week 3: Numpy
- Week 4: Pandas Basics
- Week 5: Graphics (Matplotlib, Seaborn, Bokeh)
- Week 6: Advanced Pandas + Descriptive Stats
- Week 7: Intro to R and RStudio with RMarkdown
- Week 7: Py2R
- Week 8: Stats (Scipy, Statsmodels)
- Week 9: Regressions (Scikit-Learn)
- Week 10: Clustering (Scikit-Learn)
- Week 11+12: Rotation Based Learning: Initiator vs. Successor Phase
- Week 13+14: Final Projects
- Week 15: Presentation of Results
Please read the Code of Conduct before working on any issue in the repository.
All the materials are released under the Creative Commons CC0 license. You can find more information here.