A collection of projects demonstrating functional programming approaches to data science using F#. This repository showcases how F#'s expressive syntax, type safety, and functional paradigms make it an excellent choice for data analysis, transformation, and visualization workflows.
🔗 View Live Demo → — Interactive website showcasing the project, visualizations, and code examples.
F# brings unique advantages to data science that complement traditional tools like Python and R:
| Feature | Benefit |
|---|---|
| Type Inference | Catch data type errors at compile time, not runtime |
| Pipeline Operators | Chain transformations naturally with |> |
| Immutability by Default | Safer data transformations without side effects |
| Pattern Matching | Elegant handling of missing data and edge cases |
| REPL Support | Interactive exploration via F# Interactive (FSI) |
| .NET Ecosystem | Access to mature libraries and enterprise integration |
A complete data analysis pipeline exploring morphological measurements of penguins from the Palmer Archipelago. This project demonstrates the full data science workflow in F#.
What it demonstrates:
- Loading and exploring CSV datasets
- Data cleaning and preprocessing
- Handling missing values functionally
- Outlier detection and removal
- Feature engineering
- Interactive data visualization
Deedle is F#'s answer to pandas—a powerful data frame library for structured data manipulation.
open Deedle
// Load CSV into a typed data frame
let df = Frame.ReadCsv("penguins.csv", hasHeaders = true)
printfn "Rows: %d, Columns: %d" df.RowCount df.ColumnCountF#'s pipeline operator (|>) enables readable, chainable data transformations:
let cleanedData =
rawData
|> Frame.filterRows (fun _ row -> hasValidMeasurements row)
|> Frame.mapCols (fun name col -> handleMissingValues name col)
|> Frame.filterRows (fun _ row -> isWithinValidRange row)Pattern matching and OptionalValue<T> provide compile-time safe handling of missing data:
let billLength = row.TryGetAs<float>("bill_length_mm")
let billDepth = row.TryGetAs<float>("bill_depth_mm")
// Count valid measurements using functional composition
let validCount =
[billLength.HasValue; billDepth.HasValue; flipperLength.HasValue; bodyMass.HasValue]
|> List.filter id
|> List.lengthPlotly.NET brings interactive, publication-quality visualizations to F#:
open Plotly.NET
Chart.Scatter(x = flipperData, y = massData, mode = StyleParam.Mode.Markers)
|> Chart.withTitle "Flipper Length vs Body Mass"
|> Chart.withXAxisStyle "Flipper Length (mm)"
|> Chart.withYAxisStyle "Body Mass (g)"
|> Chart.saveHtml("plots/flipper_vs_mass.html")Create derived features using Deedle's expressive column operations:
// Create bill area as a derived feature
let billAreaSeries = df?bill_length_mm * df?bill_depth_mm
df.AddColumn("bill_area_mm2", billAreaSeries)- .NET 9.0 SDK or later
- F# compiler (included with .NET SDK)
# Clone the repository
git clone https://github.com/SiD-array/F-applications.git
cd F-applications
# Run the Penguin Data Analysis project
cd PenguinDataDemo
dotnet restore
dotnet runF-applications/
├── PenguinDataDemo/ # Penguin data analysis project
│ ├── src/
│ │ ├── DataLoad.fs # Data loading utilities
│ │ ├── DataClean.fs # Data cleaning & preprocessing
│ │ └── Visualization.fs # Plotting & visualization
│ ├── plots/ # Generated HTML visualizations
│ ├── Program.fs # Main entry point
│ ├── penguins.csv # Palmer Penguins dataset
│ └── PenguinDataDemo.fsproj # Project configuration
├── docs/ # Project showcase website
│ ├── index.html # Main webpage
│ ├── style.css # Styling
│ ├── script.js # Interactions
│ └── plots/ # Embedded visualizations
├── README.md # This file
└── .gitignore
| Library | Version | Purpose |
|---|---|---|
| F# | .NET 9.0 | Functional-first programming language |
| Deedle | 3.0.0 | Data frames & series manipulation |
| FSharp.Data | 6.6.0 | Type providers for data access |
| Plotly.NET | 5.1.0 | Interactive visualizations |
| Plotly.NET.Interactive | 5.0.0 | Notebook integration |
- FsLab - F# data science packages
- F# for Fun and Profit - Comprehensive F# tutorials
- Deedle Documentation - Data frame operations
- Plotly.NET Documentation - Chart types and customization
- XPlot - Alternative plotting library
Future projects planned for this repository:
- Time Series Analysis - Stock price prediction with ML.NET
- Statistical Analysis - Hypothesis testing and regression
- Machine Learning - Classification with ML.NET and F#
- Data Pipeline - ETL workflows with type providers
Contributions are welcome! Whether it's:
- 🐛 Bug fixes
- 📝 Documentation improvements
- ✨ New data science examples
- 🎨 Visualization enhancements
Please feel free to submit a Pull Request.
This project is open source and available for educational purposes.
SiD-array - GitHub Profile
Functional programming meets data science 📊