Skip to content

ahgroup/mds-project-template

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This repository is a template for a reproducible data analysis project or paper. The default example uses R, Quarto, Git, and GitHub, but the structure is workflow-first so projects can add Python, Julia, shell scripts, or other tools without major reorganization.

This template also includes lightweight guidance for AI-supported work. The goal is not to make AI do the project for you. The goal is to make AI tools useful for coding, documentation, review, and troubleshooting while keeping the project transparent, reproducible, and human-reviewed.

Pre-Requisites

The default example uses R, Quarto, GitHub, and a reference manager that can handle BibTeX. Zotero with the Better BibTeX plugin is a good choice.

It is also useful to have a word processor installed, such as MS Word or LibreOffice. To produce PDF output, you need a TeX distribution. TinyTeX is one option; see the Quarto PDF instructions.

The example files use these R packages: broom, dplyr, ggplot2, here, knitr, readxl, skimr, and tidyr. Install them before running the example workflow:

install.packages(c("broom", "dplyr", "ggplot2", "here", "knitr",
                   "readxl", "skimr", "tidyr"))

Template Structure

The template comes with a folder structure and example files to show the kinds of content you would place in each folder. See the folder-specific readme files for more detail.

  • ai/: AI workflow notes, prompt templates, review checklists, and a short AI-use log. See ai/readme-ai.md.
  • assets/: static non-code materials such as references, CSL files, PDFs, and manually created figures. See assets/readme-assets.md.
  • code/: analysis code organized by workflow stage. See code/readme-code.md.
  • data/: raw, processed, private, and large data folders. See data/readme-data.md.
  • products/: final or near-final deliverables such as reports, manuscripts, presentations, posters, and apps. See products/readme-products.md.
  • results/: outputs generated by code, such as figures, tables, and model summaries. See results/readme-results.md.

Important project-level files:

  • readme.md: this project overview.
  • usage.md: practical instructions for running and reproducing the project.
  • AGENTS.md: extra instructions for AI coding assistants and collaborators using AI tools.
  • project-metadata.yml: concise metadata about the project, software, data, and AI-use policy.

Naming Conventions

Use descriptive file and folder names. In general:

  • use lower-case names;
  • separate words with -;
  • avoid spaces, underscores, and CamelCase unless a standard file name or file extension requires otherwise.

For example, this template has code/analysis/statistical-analysis.R.

Readme files are named by folder context, such as readme-code.md, readme-data.md, and readme-exploration.md.

Software And Package Management

Document the software and package setup your project needs. The default example uses manually installed R packages because that is approachable for students and short projects.

For R projects, renv can help manage R packages and improve long-term reproducibility. This template does not enable renv by default because it adds complexity for new users and classroom settings.

If you decide to use renv, commit the lockfile (renv.lock) and the files needed to activate the environment, but do not commit the local package library.

For Python, Julia, or other languages, document the chosen environment manager in this readme or in project-metadata.yml. Examples include virtual environments, Conda, Poetry, Julia project files, or containers. These are optional; use them when they solve a real project need.

Getting Started

This is a GitHub template repository. The best way to start a new project is to create a repository from this template.

For the example project, run the code manually. See usage.md for the run order, what each code file does, and how to render products.

AI-Supported Workflow

AI tools can help explain code, draft first-pass code, improve documentation, suggest checks, and review for reproducibility problems. They should not be treated as final authority for scientific claims, model choice, data privacy, citation accuracy, or interpretation of results.

When using AI tools:

  • Point the tool to readme.md, usage.md, AGENTS.md, project-metadata.yml, and data/readme-data.md.
  • Do not paste sensitive, private, regulated, or identifiable data into external AI tools unless the project owner has explicitly approved that workflow.
  • Ask for small, reviewable changes.
  • Rerun affected scripts or rerender affected products after meaningful changes.
  • Add a short entry to ai/ai-use-log.md for meaningful AI-assisted work.

GitHub Actions and other automated workflows can be useful for advanced users. They are intentionally not enabled by default in this template because many users will be new to Git and GitHub.

About

A template file and folder structure for an AI supported modeling and data science project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • R 56.1%
  • TeX 31.8%
  • CSS 12.1%