Project overview

This project turns raw water quality measurements into a single interactive dashboard that lets you explore nutrient patterns across Washington, compare surface vs ground water, and evaluate seasonality. The work is end-to-end: data ingestion, cleaning, relational modeling, analysis, and a public-facing dashboard.

Tableau dashboard SQL pipeline Hypothesis tests + confidence intervals Geospatial exploration

Goals

  • Make nutrient signals interpretable. Nitrate and orthophosphate are common indicators for nutrient loading and water quality concerns, but raw tables are hard to parse.
  • Support comparisons that are fair. Compare groups using consistent time windows, sampling types, and aggregation rules.
  • Keep lineage traceable. Every chart in the dashboard should be reproducible from the pipeline outputs.
  • Standardize messy terminology. Across sources and tables, attribute names and units can be inconsistent and break cross-sheet filtering unless normalized.
  • Expose uncertainty. Summaries include sample sizes and uncertainty where appropriate.
  • Deliver a public-facing artifact. A clear story: data aggregating decisions + analytics + a dashboard that answers real questions.

Analyses shown in the dashboard

The dashboard mixes exploratory visualization with a small set of targeted statistical comparisons. This provides interpretable summaries that help a user answer practical questions like: “Do concentrations differ by season?” and “Do patterns look different between ground and surface water?”

  • Seasonal comparison tests: summer vs winter mean comparisons per chemical and water type using bootstrap analysis.
  • Confidence intervals: 95% CIs displayed alongside effect estimates to show uncertainty, not just a p value.
  • Mapping and distribution views: spatial context + distribution plots to identify outliers and site-to-site variation.

Key results

The specific numbers below reflect the analytical summaries produced by the statistical tests and displayed in the dashboard (Summer vs Winter mean comparisons). Interpret them as “estimated difference between groups” with uncertainty and sample size context.

Chemical Water type Effect 95% CI p value Sample sizes
Nitrate Groundwater 1.83 [0.80, 3.00] 0.00 n=72 (summer), n=24 (winter)
Surface Water -0.80 [-3.88, 1.81] 0.55 n=162 (summer), n=116 (winter)
Orthophosphate Groundwater -0.01 [-2.92, 7.62] 0.21 n=96 (summer), n=34 (winter)
Surface Water 2.94 [-6.42, 0.10] 0.63 n=137 (summer), n=80 (winter)

What this means: Nitrate groundwater has significant evidence for seasonality between the summer and winter seasons. The positive effect value represents higher summer measurements compared to winter measurements. However, for the other 3/4 of these seasonal comparisons, the uncertainty intervals overlap zero and the p values are not small. That is evidence that the seasonal mean differences are not strongly separated for these measurements, as shown in this analysis. The more useful signal may be spatial variation, site-specific behavior, and differences by water type, which the dashboard makes easy to explore.

Data and structure

The dashboard is built from the Water Qaulity Portal (WQP) monitoring measurements that include site metadata (location and identifiers), sampling context (water type),and chemistry measurements The pipeline reshapes these into analysis-ready tables that are easy to join and filter reliably inside Tableau.

What it contains Why it exists
sites Site identifiers, names, lat/lon, and watershed fields. Separates stable site metadata from measurement-level data to avoid duplication and enable mapping.
results Measurement records keyed by site, date/time, water type, and chemical. Single canonical measurement table for filtering and aggregation.
agg_month Monthly aggregation by chemical and water type (counts and summary statistics). Makes seasonality exploration fast and consistent across charts.
test_results Precomputed hypothesis tests and confidence intervals for defined comparisons. Lets the dashboard display analytical summaries without recomputing statistics inside Tableau.

Pipeline and QA decisions

Standardization

Normalized chemical labels/units and site names to one canonical dimension.

Typed outputs

Loaded raw CSVs into staging tables, then cast into analysis tables to prevent joining issues.

Reproducibility

All derived fields and aggregations are produced in SQL Server so Tableau stays lightweight and consistent.

  • Unit and format handling: enforced consistent numeric units and date parsing.
  • Missingness strategy: missing values are retained as missing (not converted to zero), and summaries track sample counts. Samples at the detection limit are also inlcuded and tracked.
  • Aggregation rules: monthly summaries use consistent grouping keys so charts agree with each other across pages.

Limitations and next steps

Limitations

  • Non-uniform sampling: monitoring data often has uneven coverage across sites and time, which affects interpretability.
  • Confounding: season is correlated with flow, land use, and sampling effort, so simple group comparisons can miss structure.
  • Unit and method differences: even after cleaning, lab methods and reporting conventions can introduce variability.

Next steps

  • Add stratified views: break out comparisons by watershed or region, not only statewide summaries.
  • Model-based layer: add a simple hierarchical model or regression adjustment to control for site and season simultaneously.
  • Data quality flags: surface detection limits, censored values, and method codes more explicitly in the dashboard.