Validation — real ECCC data vs. the Cascade Hydro pipeline¶

Purpose. This document is a readable baseline for the external-validation test suite (tests/test_validation_eccc_idf.py). It records how closely the tool reproduces Environment and Climate Change Canada's (ECCC) published Intensity–Duration–Frequency (IDF) numbers for a real station, using today's code. If a future change moves these numbers, the test fails and this page tells you what the agreement used to be — so "model drift" is visible, not silent.

Field	Value
Reference station	Vancouver Harbour CS, BC — Climate ID 1108446 (composite)
ECCC dataset	Engineering Climate Datasets, Short-Duration Rainfall IDF, v3.40 (retrieved 2025‑12‑05)
ECCC method (stated)	Gumbel distribution, Method of Moments
Baseline recorded	2026‑06‑19
App version at baseline	0.5.11 (suite introduced in the following release)
Test file	`tests/test_validation_eccc_idf.py`
Ground-truth fixture	`tests/data/eccc_idf_1108446_vancouver_harbour.txt` (ECCC file, verbatim)
Daily-record fixture	`tests/data/eccc_daily_1108446_vancouver_harbour.csv` (fetched via the app)

These fixtures live under tests/ and are not packaged into the installed application; they exist only to gate releases.

1. Why this station is a clean benchmark¶

ECCC publishes, for 1108446, both the annual-maximum series (AMS) it fitted (its Table 1) and the resulting return-period depths (its Table 2a), and it states its estimator as Gumbel / Method of Moments — which is exactly this tool's IDF default. That makes the comparison genuinely apples-to-apples, and it lets us validate two distinct things separately (Layers 1 and 2 below).

2. Methodology¶

2.1 Gumbel (EV1) by Method of Moments¶

The annual maxima are fitted to a Gumbel distribution

\[ F(x) = \exp\!\left[-\exp\!\left(-\frac{x-\mu}{\beta}\right)\right], \]

with location \(\mu\) and scale \(\beta\) estimated by the method of moments from the sample mean \(\bar{x}\) and sample standard deviation \(s\) (with \(\mathrm{ddof}=1\)):

\[ \beta = \frac{s\sqrt{6}}{\pi}, \qquad \mu = \bar{x} - \gamma\,\beta, \qquad \gamma \approx 0.5772 \;\text{(Euler–Mascheroni)}. \]

The \(T\)-year return level (non-exceedance probability \(p = 1 - 1/T\)) is the inverse CDF,

\[ x_T = \mu - \beta\,\ln\!\left[-\ln\!\left(1-\tfrac{1}{T}\right)\right] = \bar{x} + K_T\,s, \qquad K_T = -\frac{\sqrt{6}}{\pi}\Big(\gamma + \ln\ln\tfrac{T}{T-1}\Big), \]

where \(K_T\) is the Gumbel frequency factor. Return periods evaluated: \(T \in \{2, 5, 10, 25, 50, 100\}\) years.

2.2 Screening applied before fitting (Layer 2 only)¶

Completeness. A calendar year is retained only if its fraction of expected records (at the detected temporal resolution) meets the threshold

\[ \frac{n_{\text{obs}}}{n_{\text{exp}}} \ge 0.90 . \]

The 90% default follows the WMO convention that a derived statistic is computed only when no more than ~10% of the constituent observations are missing (Guide to Climatological Practices, WMO‑No. 100; the "3‑and‑5" missing-data rule maps to ≈90% monthly completeness).

Outliers — Bulletin 17 (Grubbs). Operating in log space (\(y = \ln x\)), the single most-extreme value is tested with the Grubbs statistic

\[ G = \frac{\max_i \lvert y_i - \bar{y}\rvert}{s_y}, \]

rejected when \(G > G_{\text{crit}}(\alpha, n)\) at \(\alpha = 0.10\) (the Bulletin 17B-recommended level). Only the high tail is screened: low annual maxima are real dry-year values, and removing them would bias the upper tail of interest for rainfall (the low-outlier / Grubbs–Beck step in Bulletin 17 is a flood-frequency device, not appropriate here).

2.3 Fixed-window adjustment (Layer 2, 24 h)¶

A daily gauge reports the calendar-day maximum, which underestimates the sliding 24‑hour maximum that ECCC's curves use. The standard WMO/Hershfield factor converts the two:

\[ D^{\text{sliding}}_{24\text{h}} \approx 1.13 \times D^{\text{fixed}}_{1\text{day}} . \]

We apply \(1.13\) to the pipeline's 24 h (and to the sub-daily depths derived from it) before comparing to ECCC.

2.4 Error metric¶

All agreement figures are relative error against the ECCC published value:

\[ \varepsilon = \frac{\lvert x_{\text{tool}} - x_{\text{ECCC}}\rvert}{x_{\text{ECCC}}} . \]

3. Layer 1 — core statistics vs. ECCC's published table¶

What it tests. ECCC's own AMS (Table 1) → this tool's fit_distribution(Gumbel, MoM) → compare to ECCC's published depths (Table 2a), for every duration and return period. Because the input data and the estimator are identical to ECCC's, the only expected disagreement is ECCC's 0.1 mm table rounding. This isolates the parameter-estimation and return-level math, deterministically and offline.

Result (54 cells = 9 durations × 6 return periods):

Metric	Value
Worst-case relative error	0.74 % (15 min, 2‑yr — where 0.1 mm rounding alone is ≈0.8 %)
Mean relative error	0.17 %
24 h column	within 0.04 %
Test tolerance (pass bar)	1.5 %

Per-duration worst case:

Duration	5 min	10 min	15 min	30 min	1 h	2 h	6 h	12 h	24 h
Max rel. err.	0.66%	0.66%	0.74%	0.61%	0.36%	0.22%	0.08%	0.10%	0.04%

Interpretation. The tool reproduces ECCC's IDF statistics to rounding precision. Any future Layer‑1 error above ~1 % indicates a real change in the fitting/return-level code, not noise.

4. Layer 2 — full daily-gauge pipeline vs. ECCC¶

What it tests. A real ECCC daily precipitation record (1925–2026, committed fixture) pushed through the entire in-app IDF pipeline — annual-maxima extraction, 90 % completeness screening, Grubbs high-tail outlier screening, Gumbel/MoM fit, log-log curve, and the ecozone sub-daily ratio derivation — then adjusted by the 1.13 fixed-window factor and compared to ECCC.

Pipeline configuration at baseline:

Setting	Value
Completeness	enabled, 90 % → 22 incomplete years excluded
Outliers	Grubbs, high tail, log space, \(\alpha = 0.10\)
Distribution / fit	Gumbel / Method of Moments
Curve form	log-log linear
Ecozone (sub-daily ratios)	Pacific Maritime (resolved by point-in-polygon)
Daily record span	1925‑11‑01 → 2026‑06‑19

This is intentionally looser than Layer 1 because of three structural, anticipated differences: (i) calendar-day vs. sliding-window maxima (addressed by \(1.13\)); (ii) our ~100‑year daily record vs. ECCC's 35‑year composite; and (iii) sub-daily depths come from a regional ratio table, not this station's (non-public) tipping-bucket data.

4.1 — 24 h (directly observed; tight)¶

\(T\) (yr)	Pipeline raw (mm)	× 1.13 (mm)	ECCC (mm)	Rel. err.
2	61.0	69.0	68.4	0.8 %
5	76.2	86.1	86.2	0.1 %
10	86.2	97.4	98.0	0.6 %
25	98.9	111.8	113.0	1.1 %
50	108.3	122.4	124.0	1.3 %
100	117.6	132.9	135.0	1.5 %

Pass bar: ±10 %. Observed worst case 1.5 % — the daily pipeline plus the 1.13 factor recovers ECCC's 24 h curve almost exactly.

4.2 — Sub-daily (ratio-derived; loose)¶

Sub-daily depths (5 min – 12 h), after the 1.13 adjustment, fall within

\[ 0.76 \le \frac{x_{\text{tool}}}{x_{\text{ECCC}}} \le 1.23, \]

i.e. inside a ±30 % band (pass bar [0.70, 1.30]). The largest underestimate is at the shortest durations / longest return periods (e.g. 10 min, 100‑yr ≈ 0.76), where the regional ratio least represents this specific coastal station. This is consistent with ECCC's own caution for 1108446: "95 % Confidence Interval > ±25 %." These figures validate that the ratio path is physically reasonable, not that it is exact — by design.

5. How to use this for drift detection¶

Run the suite: pytest tests/test_validation_eccc_idf.py -v (offline, deterministic, part of the release gate).
If it fails, compare the reported numbers to §3/§4 here:
A Layer‑1 regression means the fitting/return-level core changed — treat as serious; the math should match ECCC to rounding.
A Layer‑2 24 h regression means the extraction/screening/curve path changed materially.
A Layer‑2 sub-daily regression means the ecozone ratio derivation changed.
If the change is intentional, update both the tolerances/expectations in the test and the baseline tables here, in the same commit, with a note on why.

6. Provenance & reproducibility¶

The ground-truth file is ECCC's published text product, committed verbatim (tests/data/eccc_idf_1108446_vancouver_harbour.txt); the test parses Table 1 (AMS) and Table 2a (depths) directly from it — the numbers are never hand-transcribed into code.
The daily record was fetched once through the app's own ECCC downloader (download_station_data, climate-daily collection) and committed as a CSV so the test runs offline and identically on every machine.
Both fitting paths call the shipped analysis code (idf_analyzer.analysis.frequency.fit_distribution and idf_analyzer.analysis.idf.compute_idf_table) — the same code the GUI uses — so this validates the product, not a parallel reimplementation.