Validation — real ECCC data vs. the Cascade Hydro pipeline¶
Purpose. This document is a readable baseline for the external-validation
test suite (tests/test_validation_eccc_idf.py). It records how closely the
tool reproduces Environment and Climate Change Canada's (ECCC) published
Intensity–Duration–Frequency (IDF) numbers for a real station, using today's
code. If a future change moves these numbers, the test fails and this page
tells you what the agreement used to be — so "model drift" is visible, not
silent.
| Field | Value |
|---|---|
| Reference station | Vancouver Harbour CS, BC — Climate ID 1108446 (composite) |
| ECCC dataset | Engineering Climate Datasets, Short-Duration Rainfall IDF, v3.40 (retrieved 2025‑12‑05) |
| ECCC method (stated) | Gumbel distribution, Method of Moments |
| Baseline recorded | 2026‑06‑19 |
| App version at baseline | 0.5.11 (suite introduced in the following release) |
| Test file | tests/test_validation_eccc_idf.py |
| Ground-truth fixture | tests/data/eccc_idf_1108446_vancouver_harbour.txt (ECCC file, verbatim) |
| Daily-record fixture | tests/data/eccc_daily_1108446_vancouver_harbour.csv (fetched via the app) |
These fixtures live under
tests/and are not packaged into the installed application; they exist only to gate releases.
1. Why this station is a clean benchmark¶
ECCC publishes, for 1108446, both the annual-maximum series (AMS) it fitted (its Table 1) and the resulting return-period depths (its Table 2a), and it states its estimator as Gumbel / Method of Moments — which is exactly this tool's IDF default. That makes the comparison genuinely apples-to-apples, and it lets us validate two distinct things separately (Layers 1 and 2 below).
2. Methodology¶
2.1 Gumbel (EV1) by Method of Moments¶
The annual maxima are fitted to a Gumbel distribution
with location \(\mu\) and scale \(\beta\) estimated by the method of moments from the sample mean \(\bar{x}\) and sample standard deviation \(s\) (with \(\mathrm{ddof}=1\)):
The \(T\)-year return level (non-exceedance probability \(p = 1 - 1/T\)) is the inverse CDF,
where \(K_T\) is the Gumbel frequency factor. Return periods evaluated: \(T \in \{2, 5, 10, 25, 50, 100\}\) years.
2.2 Screening applied before fitting (Layer 2 only)¶
Completeness. A calendar year is retained only if its fraction of expected records (at the detected temporal resolution) meets the threshold
The 90% default follows the WMO convention that a derived statistic is computed only when no more than ~10% of the constituent observations are missing (Guide to Climatological Practices, WMO‑No. 100; the "3‑and‑5" missing-data rule maps to ≈90% monthly completeness).
Outliers — Bulletin 17 (Grubbs). Operating in log space (\(y = \ln x\)), the single most-extreme value is tested with the Grubbs statistic
rejected when \(G > G_{\text{crit}}(\alpha, n)\) at \(\alpha = 0.10\) (the Bulletin 17B-recommended level). Only the high tail is screened: low annual maxima are real dry-year values, and removing them would bias the upper tail of interest for rainfall (the low-outlier / Grubbs–Beck step in Bulletin 17 is a flood-frequency device, not appropriate here).
2.3 Fixed-window adjustment (Layer 2, 24 h)¶
A daily gauge reports the calendar-day maximum, which underestimates the sliding 24‑hour maximum that ECCC's curves use. The standard WMO/Hershfield factor converts the two:
We apply \(1.13\) to the pipeline's 24 h (and to the sub-daily depths derived from it) before comparing to ECCC.
2.4 Error metric¶
All agreement figures are relative error against the ECCC published value:
3. Layer 1 — core statistics vs. ECCC's published table¶
What it tests. ECCC's own AMS (Table 1) → this tool's
fit_distribution(Gumbel, MoM) → compare to ECCC's published depths (Table 2a),
for every duration and return period. Because the input data and the
estimator are identical to ECCC's, the only expected disagreement is ECCC's
0.1 mm table rounding. This isolates the parameter-estimation and return-level
math, deterministically and offline.
Result (54 cells = 9 durations × 6 return periods):
| Metric | Value |
|---|---|
| Worst-case relative error | 0.74 % (15 min, 2‑yr — where 0.1 mm rounding alone is ≈0.8 %) |
| Mean relative error | 0.17 % |
| 24 h column | within 0.04 % |
| Test tolerance (pass bar) | 1.5 % |
Per-duration worst case:
| Duration | 5 min | 10 min | 15 min | 30 min | 1 h | 2 h | 6 h | 12 h | 24 h |
|---|---|---|---|---|---|---|---|---|---|
| Max rel. err. | 0.66% | 0.66% | 0.74% | 0.61% | 0.36% | 0.22% | 0.08% | 0.10% | 0.04% |
Interpretation. The tool reproduces ECCC's IDF statistics to rounding precision. Any future Layer‑1 error above ~1 % indicates a real change in the fitting/return-level code, not noise.
4. Layer 2 — full daily-gauge pipeline vs. ECCC¶
What it tests. A real ECCC daily precipitation record (1925–2026, committed fixture) pushed through the entire in-app IDF pipeline — annual-maxima extraction, 90 % completeness screening, Grubbs high-tail outlier screening, Gumbel/MoM fit, log-log curve, and the ecozone sub-daily ratio derivation — then adjusted by the 1.13 fixed-window factor and compared to ECCC.
Pipeline configuration at baseline:
| Setting | Value |
|---|---|
| Completeness | enabled, 90 % → 22 incomplete years excluded |
| Outliers | Grubbs, high tail, log space, \(\alpha = 0.10\) |
| Distribution / fit | Gumbel / Method of Moments |
| Curve form | log-log linear |
| Ecozone (sub-daily ratios) | Pacific Maritime (resolved by point-in-polygon) |
| Daily record span | 1925‑11‑01 → 2026‑06‑19 |
This is intentionally looser than Layer 1 because of three structural, anticipated differences: (i) calendar-day vs. sliding-window maxima (addressed by \(1.13\)); (ii) our ~100‑year daily record vs. ECCC's 35‑year composite; and (iii) sub-daily depths come from a regional ratio table, not this station's (non-public) tipping-bucket data.
4.1 — 24 h (directly observed; tight)¶
| \(T\) (yr) | Pipeline raw (mm) | × 1.13 (mm) | ECCC (mm) | Rel. err. |
|---|---|---|---|---|
| 2 | 61.0 | 69.0 | 68.4 | 0.8 % |
| 5 | 76.2 | 86.1 | 86.2 | 0.1 % |
| 10 | 86.2 | 97.4 | 98.0 | 0.6 % |
| 25 | 98.9 | 111.8 | 113.0 | 1.1 % |
| 50 | 108.3 | 122.4 | 124.0 | 1.3 % |
| 100 | 117.6 | 132.9 | 135.0 | 1.5 % |
Pass bar: ±10 %. Observed worst case 1.5 % — the daily pipeline plus the 1.13 factor recovers ECCC's 24 h curve almost exactly.
4.2 — Sub-daily (ratio-derived; loose)¶
Sub-daily depths (5 min – 12 h), after the 1.13 adjustment, fall within
i.e. inside a ±30 % band (pass bar [0.70, 1.30]). The largest
underestimate is at the shortest durations / longest return periods
(e.g. 10 min, 100‑yr ≈ 0.76), where the regional ratio least represents this
specific coastal station. This is consistent with ECCC's own caution for
1108446: "95 % Confidence Interval > ±25 %." These figures validate that the
ratio path is physically reasonable, not that it is exact — by design.
5. How to use this for drift detection¶
- Run the suite:
pytest tests/test_validation_eccc_idf.py -v(offline, deterministic, part of the release gate). - If it fails, compare the reported numbers to §3/§4 here:
- A Layer‑1 regression means the fitting/return-level core changed — treat as serious; the math should match ECCC to rounding.
- A Layer‑2 24 h regression means the extraction/screening/curve path changed materially.
- A Layer‑2 sub-daily regression means the ecozone ratio derivation changed.
- If the change is intentional, update both the tolerances/expectations in the test and the baseline tables here, in the same commit, with a note on why.
6. Provenance & reproducibility¶
- The ground-truth file is ECCC's published text product, committed verbatim
(
tests/data/eccc_idf_1108446_vancouver_harbour.txt); the test parses Table 1 (AMS) and Table 2a (depths) directly from it — the numbers are never hand-transcribed into code. - The daily record was fetched once through the app's own ECCC downloader
(
download_station_data,climate-dailycollection) and committed as a CSV so the test runs offline and identically on every machine. - Both fitting paths call the shipped analysis code
(
idf_analyzer.analysis.frequency.fit_distributionandidf_analyzer.analysis.idf.compute_idf_table) — the same code the GUI uses — so this validates the product, not a parallel reimplementation.