WA Water Quality Dashboard
Interactive visualization of nitrate and orthophosphate measurements across Washington, comparing surface and ground water samples.
Project overview
This project turns raw water quality measurements into a single interactive dashboard that lets you explore nutrient patterns across Washington, compare surface vs ground water, and evaluate seasonality. The work is end-to-end: data ingestion, cleaning, relational modeling, analysis, and a public-facing dashboard.
Goals
- Make nutrient signals interpretable. Nitrate and orthophosphate are common indicators for nutrient loading and water quality concerns, but raw tables are hard to parse.
- Support comparisons that are fair. Compare groups using consistent time windows, sampling types, and aggregation rules.
- Keep lineage traceable. Every chart in the dashboard should be reproducible from the pipeline outputs.
- Standardize messy terminology. Across sources and tables, attribute names and units can be inconsistent and break cross-sheet filtering unless normalized.
- Expose uncertainty. Summaries include sample sizes and uncertainty where appropriate.
- Deliver a public-facing artifact. A clear story: data aggregating decisions + analytics + a dashboard that answers real questions.
Analyses shown in the dashboard
The dashboard mixes exploratory visualization with a small set of targeted statistical comparisons. This provides interpretable summaries that help a user answer practical questions like: “Do concentrations differ by season?” and “Do patterns look different between ground and surface water?”
- Seasonal comparison tests: summer vs winter mean comparisons per chemical and water type using bootstrap analysis.
- Confidence intervals: 95% CIs displayed alongside effect estimates to show uncertainty, not just a p value.
- Mapping and distribution views: spatial context + distribution plots to identify outliers and site-to-site variation.
Key results
The specific numbers below reflect the analytical summaries produced by the statistical tests and displayed in the dashboard (Summer vs Winter mean comparisons). Interpret them as “estimated difference between groups” with uncertainty and sample size context.
| Chemical | Water type | Effect | 95% CI | p value | Sample sizes |
|---|---|---|---|---|---|
| Nitrate | Groundwater | 1.83 | [0.80, 3.00] | 0.00 | n=72 (summer), n=24 (winter) |
| Surface Water | -0.80 | [-3.88, 1.81] | 0.55 | n=162 (summer), n=116 (winter) | |
| Orthophosphate | Groundwater | -0.01 | [-2.92, 7.62] | 0.21 | n=96 (summer), n=34 (winter) |
| Surface Water | 2.94 | [-6.42, 0.10] | 0.63 | n=137 (summer), n=80 (winter) |
What this means: Nitrate groundwater has significant evidence for seasonality between the summer and winter seasons. The positive effect value represents higher summer measurements compared to winter measurements. However, for the other 3/4 of these seasonal comparisons, the uncertainty intervals overlap zero and the p values are not small. That is evidence that the seasonal mean differences are not strongly separated for these measurements, as shown in this analysis. The more useful signal may be spatial variation, site-specific behavior, and differences by water type, which the dashboard makes easy to explore.
Data and structure
The dashboard is built from the Water Qaulity Portal (WQP) monitoring measurements that include site metadata (location and identifiers), sampling context (water type),and chemistry measurements The pipeline reshapes these into analysis-ready tables that are easy to join and filter reliably inside Tableau.
| What it contains | Why it exists | |
|---|---|---|
| sites | Site identifiers, names, lat/lon, and watershed fields. | Separates stable site metadata from measurement-level data to avoid duplication and enable mapping. |
| results | Measurement records keyed by site, date/time, water type, and chemical. | Single canonical measurement table for filtering and aggregation. |
| agg_month | Monthly aggregation by chemical and water type (counts and summary statistics). | Makes seasonality exploration fast and consistent across charts. |
| test_results | Precomputed hypothesis tests and confidence intervals for defined comparisons. | Lets the dashboard display analytical summaries without recomputing statistics inside Tableau. |
Pipeline and QA decisions
Standardization
Normalized chemical labels/units and site names to one canonical dimension.
Typed outputs
Loaded raw CSVs into staging tables, then cast into analysis tables to prevent joining issues.
Reproducibility
All derived fields and aggregations are produced in SQL Server so Tableau stays lightweight and consistent.
- Unit and format handling: enforced consistent numeric units and date parsing.
- Missingness strategy: missing values are retained as missing (not converted to zero), and summaries track sample counts. Samples at the detection limit are also inlcuded and tracked.
- Aggregation rules: monthly summaries use consistent grouping keys so charts agree with each other across pages.
Limitations and next steps
Limitations
- Non-uniform sampling: monitoring data often has uneven coverage across sites and time, which affects interpretability.
- Confounding: season is correlated with flow, land use, and sampling effort, so simple group comparisons can miss structure.
- Unit and method differences: even after cleaning, lab methods and reporting conventions can introduce variability.
Next steps
- Add stratified views: break out comparisons by watershed or region, not only statewide summaries.
- Model-based layer: add a simple hierarchical model or regression adjustment to control for site and season simultaneously.
- Data quality flags: surface detection limits, censored values, and method codes more explicitly in the dashboard.