measure extends tidymodels with preprocessing steps for analytical measurement data such as spectroscopy, chromatography, and other instrument-generated signals. It provides a recipes-style interface for common spectral preprocessing techniques.
measure helps you:
- Convert measurement data from wide or long formats into an internal representation
- Preprocess spectra using techniques like smoothing, derivatives, and normalization
- Transform data back to wide or long format for modeling or visualization
- Handle multi-dimensional data like LC-DAD, EEM fluorescence, and 2D NMR with native nD support
- Decompose complex signals using PARAFAC, Tucker, and MCR-ALS methods
You can install the development version of measure from GitHub:
# install.packages("pak")
pak::pak("JamesHWade/measure")The measure workflow follows the familiar recipes pattern: define a recipe, add steps, prep, and bake.
library(measure)
library(recipes)
library(ggplot2)
# NIR spectroscopy data for predicting meat composition
data(meats_long)
head(meats_long)
#> # A tibble: 6 × 6
#> id water fat protein channel transmittance
#> <int> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 1 60.5 22.5 16.7 1 2.62
#> 2 1 60.5 22.5 16.7 2 2.62
#> 3 1 60.5 22.5 16.7 3 2.62
#> 4 1 60.5 22.5 16.7 4 2.62
#> 5 1 60.5 22.5 16.7 5 2.62
#> 6 1 60.5 22.5 16.7 6 2.62rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
# Assign sample ID role (not used as predictor)
update_role(id, new_role = "id") |>
# Convert long-format measurements to internal representation
step_measure_input_long(transmittance, location = vars(channel)) |>
# Apply Savitzky-Golay smoothing with first derivative
step_measure_savitzky_golay(window_side = 5, differentiation_order = 1) |>
# Standard Normal Variate normalization
step_measure_snv() |>
# Convert back to wide format for modeling
step_measure_output_wide(prefix = "nir_")# Prep learns any parameters from training data
prepped <- prep(rec)
# Bake applies the transformations
processed <- bake(prepped, new_data = NULL)
# Result is ready for modeling
processed[1:5, 1:8]
#> # A tibble: 5 × 8
#> id water fat protein nir_01 nir_02 nir_03 nir_04
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 60.5 22.5 16.7 -0.126 -0.110 -0.0928 -0.0745
#> 2 2 46 40.1 13.5 0.0184 0.0381 0.0601 0.0841
#> 3 3 71 8.4 20.5 0.105 0.114 0.125 0.136
#> 4 4 72.8 5.9 20.7 0.0716 0.0786 0.0871 0.0974
#> 5 5 58.3 25.5 15.5 -0.132 -0.118 -0.101 -0.0817# Get data at intermediate step (before output conversion)
rec_for_viz <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
step_measure_savitzky_golay(window_side = 5, differentiation_order = 1) |>
step_measure_snv()
processed_long <- bake(prep(rec_for_viz), new_data = NULL)
# Extract and plot a few spectra
library(tidyr)
library(dplyr)
plot_data <- processed_long |>
slice(1:10) |>
mutate(sample_id = row_number()) |>
unnest(.measures)
ggplot(plot_data, aes(x = location, y = value, group = sample_id, color = factor(sample_id))) +
geom_line(alpha = 0.7) +
labs(
x = "Channel",
y = "Preprocessed Signal",
title = "NIR Spectra After Preprocessing",
subtitle = "Savitzky-Golay first derivative + SNV normalization",
color = "Sample"
) +
theme_minimal() +
theme(legend.position = "none")| Step | Description |
|---|---|
step_measure_input_wide() |
Convert wide format (measurements in columns) to internal format |
step_measure_input_long() |
Convert long format (measurements in rows) to internal format |
step_measure_output_wide() |
Convert back to wide format for modeling |
step_measure_output_long() |
Convert back to long format |
| Step | Description |
|---|---|
step_measure_absorbance() |
Convert transmittance to absorbance |
step_measure_transmittance() |
Convert absorbance to transmittance |
step_measure_log() |
Log transformation with configurable base/offset |
step_measure_kubelka_munk() |
Kubelka-Munk transformation for reflectance |
step_measure_derivative() |
Simple finite difference derivatives |
step_measure_derivative_gap() |
Gap (Norris-Williams) derivatives |
| Step | Description |
|---|---|
step_measure_savitzky_golay() |
Smoothing and/or differentiation |
step_measure_snv() |
Standard Normal Variate normalization |
step_measure_msc() |
Multiplicative Scatter Correction |
step_measure_emsc() |
Extended MSC with wavelength-dependent correction |
step_measure_osc() |
Orthogonal Signal Correction |
| Step | Description |
|---|---|
step_measure_smooth_ma() |
Moving average smoothing |
step_measure_smooth_median() |
Median filter (robust to spikes) |
step_measure_smooth_gaussian() |
Gaussian kernel smoothing |
step_measure_smooth_wavelet() |
Wavelet denoising |
step_measure_filter_fourier() |
Fourier low-pass/high-pass filtering |
step_measure_despike() |
Spike/outlier detection and removal |
| Step | Description |
|---|---|
step_measure_normalize_sum() |
Divide by sum (total intensity) |
step_measure_normalize_max() |
Divide by maximum value |
step_measure_normalize_range() |
Scale to 0-1 range |
step_measure_normalize_vector() |
L2/Euclidean normalization |
step_measure_normalize_auc() |
Divide by area under curve |
step_measure_normalize_peak() |
Normalize by peak region (tunable) |
| Step | Description |
|---|---|
step_measure_center() |
Mean centering |
step_measure_scale_auto() |
Auto-scaling (z-score) |
step_measure_scale_pareto() |
Pareto scaling |
step_measure_scale_range() |
Range scaling |
step_measure_scale_vast() |
VAST scaling |
| Step | Description |
|---|---|
step_measure_baseline_als() |
Asymmetric least squares |
step_measure_baseline_poly() |
Polynomial baseline fitting |
step_measure_baseline_rf() |
Rolling ball/LOESS baseline |
step_measure_baseline_rolling() |
Rolling ball algorithm |
step_measure_baseline_airpls() |
Adaptive Iteratively Reweighted PLS |
step_measure_baseline_arpls() |
Asymmetrically Reweighted PLS |
step_measure_baseline_snip() |
SNIP (Statistics-sensitive Non-linear Iterative Peak-clipping) |
step_measure_baseline_tophat() |
Top-hat morphological filter |
step_measure_baseline_morph() |
Iterative morphological correction |
step_measure_baseline_minima() |
Local minima interpolation |
step_measure_baseline_auto() |
Automatic method selection |
step_measure_detrend() |
Polynomial detrending |
| Step | Description |
|---|---|
step_measure_subtract_blank() |
Blank/background subtraction |
step_measure_subtract_reference() |
Reference spectrum subtraction |
step_measure_ratio_reference() |
Reference ratio with optional blank |
| Step | Description |
|---|---|
step_measure_trim() |
Keep measurements within specified x-range |
step_measure_exclude() |
Remove measurements within specified range(s) |
step_measure_resample() |
Interpolate to new regular grid |
| Step | Description |
|---|---|
step_measure_align_shift() |
Cross-correlation shift alignment |
step_measure_align_reference() |
Align to external reference spectrum |
step_measure_align_dtw() |
Dynamic Time Warping alignment |
step_measure_align_ptw() |
Parametric Time Warping |
step_measure_align_cow() |
Correlation Optimized Warping (tunable) |
| Step | Description |
|---|---|
step_measure_qc_snr() |
Calculate signal-to-noise ratio |
step_measure_qc_saturated() |
Detect saturated measurements |
step_measure_qc_outlier() |
Detect outlier samples |
step_measure_impute() |
Interpolate missing values |
| Step | Description |
|---|---|
step_measure_peaks_detect() |
Detect peaks using prominence or derivative methods |
step_measure_peaks_integrate() |
Calculate peak areas |
step_measure_peaks_filter() |
Filter peaks by height, area, or count |
step_measure_peaks_deconvolve() |
Deconvolve overlapping peaks |
step_measure_peaks_to_table() |
Convert peaks to wide format for modeling |
| Step | Description |
|---|---|
step_measure_mw_averages() |
Calculate Mn, Mw, Mz, Mp, and dispersity |
step_measure_mw_distribution() |
Generate molecular weight distribution curve |
step_measure_mw_fractions() |
Calculate molecular weight fractions |
| Step | Description |
|---|---|
step_measure_integrals() |
Calculate integrated areas for specified regions |
step_measure_ratios() |
Calculate ratios between integrated regions |
step_measure_moments() |
Calculate statistical moments from spectra |
step_measure_bin() |
Reduce spectrum to fewer points via binning |
| Step | Description |
|---|---|
step_measure_augment_noise() |
Add random noise for training augmentation |
step_measure_augment_shift() |
Random x-axis shifts for shift invariance |
step_measure_augment_scale() |
Random intensity scaling |
| Step/Function | Description |
|---|---|
step_measure_drift_qc_loess() |
QC-RLSC drift correction using LOESS |
step_measure_drift_linear() |
Linear drift correction |
step_measure_drift_spline() |
Spline-based drift correction |
step_measure_qc_bracket() |
QC bracketing interpolation |
step_measure_batch_reference() |
Reference-based batch correction |
measure_detect_drift() |
Detect significant drift in QC samples |
measure provides a comprehensive suite of functions for analytical method validation, designed for compatibility with ICH Q2(R2), ISO 17025, and similar regulatory frameworks.
| Function | Description |
|---|---|
measure_calibration_fit() |
Fit weighted calibration curves (linear/quadratic) |
measure_calibration_predict() |
Predict concentrations with uncertainty |
measure_calibration_verify() |
Continuing calibration verification |
measure_lod() / measure_loq() |
Detection and quantitation limits |
| Function | Description |
|---|---|
measure_repeatability() |
Within-run precision |
measure_intermediate_precision() |
Between-run precision with variance components |
measure_reproducibility() |
Between-lab precision |
measure_gage_rr() |
Gage R&R / Measurement System Analysis |
measure_accuracy() |
Bias, recovery, and accuracy assessment |
measure_linearity() |
Linearity with lack-of-fit testing |
measure_carryover() |
Carryover evaluation |
| Function | Description |
|---|---|
measure_bland_altman() |
Bland-Altman analysis with limits of agreement |
measure_deming_regression() |
Deming regression for method comparison |
measure_passing_bablok() |
Passing-Bablok non-parametric regression |
measure_proficiency_score() |
z-scores, En scores, zeta scores for PT |
| Function/Step | Description |
|---|---|
measure_matrix_effect() |
Quantify ion suppression/enhancement |
step_measure_standard_addition() |
Standard addition correction |
step_measure_dilution_correct() |
Back-calculate diluted concentrations |
step_measure_surrogate_recovery() |
Surrogate/internal standard recovery |
| Function | Description |
|---|---|
measure_uncertainty_budget() |
ISO GUM uncertainty budgets |
measure_uncertainty() |
Combined and expanded uncertainty |
measure_control_limits() |
Shewhart, EWMA, or CUSUM limits |
measure_control_chart() |
Westgard multi-rule control charts |
measure_system_suitability() |
System suitability testing |
| Function | Description |
|---|---|
measure_criteria() |
Define acceptance criteria |
measure_assess() |
Evaluate data against criteria |
criteria_bioanalytical() |
FDA/EMA bioanalytical presets |
criteria_ich_q2() |
ICH Q2 validation presets |
- Getting Started - A comprehensive introduction to measure
- Preprocessing Techniques - Deep dive into available preprocessing methods
- Analytical Validation - Calibration, uncertainty, and method validation
The package includes datasets for examples and testing:
| Dataset | Technique | Samples | Description |
|---|---|---|---|
meats_long |
NIR | 215 | NIR transmittance spectra of meat samples (from modeldata) |
bioreactors_small |
Raman | 210 | Raman spectra from 15 small-scale bioreactors |
bioreactors_large |
Raman | 42 | Raman spectra from 3 large-scale bioreactors |
hplc_chromatograms |
HPLC-UV | 20 | Simulated HPLC chromatograms with 5 compounds |
sec_chromatograms |
SEC/GPC | 10 | Simulated SEC chromatograms (5 standards + 5 polymers) |
sec_calibration |
SEC/GPC | 5 | Calibration standards for molecular weight curves |
maldi_spectra |
MALDI-TOF | 16 | Simulated mass spectra (4 groups × 4 replicates) |
# Load datasets
data(meats_long)
data(glucose_bioreactors) # loads bioreactors_small and bioreactors_large
data(hplc_chromatograms)
data(sec_chromatograms)
data(sec_calibration)
data(maldi_spectra)For additional test data beyond what’s included with measure, these sources provide publicly available analytical measurement data:
R Packages with Spectral Data:
| Package | Dataset | Technique | Description |
|---|---|---|---|
| modeldata | meats |
NIR | Meat composition (wide format version) |
| prospectr | NIRsoil |
NIR | Soil analysis with 825 samples |
| ChemoSpec | Various | IR, NMR | Multiple spectroscopy datasets |
| hyperSpec | Various | Raman, IR | Hyperspectral data examples |
# Example: Load NIRsoil from prospectr
# install.packages("prospectr")
data(NIRsoil, package = "prospectr")Online Repositories:
- Mendeley Data - Search “spectroscopy”, “chromatography”, or “mass spectrometry”
- Zenodo - Open science data repository
- Kaggle Datasets - Community-contributed datasets
- NIST Chemistry WebBook - Reference spectra (IR, MS, UV-Vis)
- SDBS - Spectral Database for Organic Compounds (NMR, IR, MS)
Domain-Specific Databases:
| Database | Data Type | URL |
|---|---|---|
| MassBank | Mass spectra | https://massbank.eu/MassBank/ |
| HMDB | NMR, MS metabolomics | https://hmdb.ca/ |
| NMRShiftDB | NMR spectra | https://nmrshiftdb.nmr.uni-koeln.de/ |
| Crystallography Open Database | XRD patterns | https://www.crystallography.net/cod/ |
measure builds on the tidymodels ecosystem:
- recipes - The foundation for preprocessing pipelines
- parsnip - Unified modeling interface
- workflows - Bundle preprocessing and modeling
- tune - Hyperparameter tuning (works with measure’s tunable steps!)
For spectral analysis in R, you might also find these packages useful:
- prospectr - Spectral preprocessing functions
- ChemoSpec - Exploratory chemometrics
- mdatools - Multivariate data analysis
This package is under active development. Contributions are welcome! Please see the contributing guidelines.
Please note that the measure project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
