Aggregate Technical Replicates for Statistical Rigor — aggregate

Collapses technical replicates (e.g., multiple thermal images taken of the same animal) into a single biological data point per subject. This step is crucial for avoiding pseudoreplication in downstream statistical analyses (e.g., t-tests, ANOVA).

Usage

aggregate_replicates(data, id_col, method = "mean", keep_cols = NULL)

Arguments

data: A data frame. Typically the output from compile_batch_stats or merge_clinical_data.
id_col: String. The column name representing the unique Biological Subject ID (e.g., "MouseID", "Subject_No"). Rows sharing this ID will be condensed into one.
method: String. The mathematical function used for aggregation: either "mean" (default) or "median". Median is often more robust to outliers (e.g., one blurry image).
keep_cols: Vector of strings. Names of non-numeric metadata columns to preserve in the final output (e.g., "Group", "Genotype", "Sex", "Treatment").

Value

A data frame with exactly one row per unique ID. The column order is reorganized to place ID and metadata first, followed by the aggregated thermal statistics and the n_replicates count.

Examples

# Create a toy dataset with repeated measurements
df_raw <- data.frame(
  SampleID = rep(paste0("M", 1:3), each = 3),
  Group = rep(c("ND", "HFD", "ND"), each = 3),
  Sex = rep("M", 9),
  Median = runif(9, 33, 36),
  IQR = runif(9, 0.5, 1.5)
)

df <- aggregate_replicates(
  data = df_raw,
  id_col = "SampleID",
  method = "median",
  keep_cols = c("Group", "Sex")
)
#> Aggregating replicates by 'SampleID' using median...
#> Aggregation complete. Reduced from 9 rows to 3 subjects.