Oral Presentation AFSS/NZFSS Joint Conference 2019

Anomaly detection in high frequency water-quality data from in-situ sensors (#36)

Catherine Leigh 1 2 3 , Kerrie Mengersen 1 2 , Erin E Peterson 1 2 3
  1. School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD, Australia
  2. ARC Centre of Excellence for Mathematics and Statistical Frontiers, Queensland University of Technology, Brisbane, QLD, Australia
  3. Institute for Future Environments, Queensland University of Technology, Brisbane, QLD, Australia

Monitoring the water quality of rivers is done increasingly using automated in situ sensors, but the high volume and velocity of the data renders manual anomaly detection unfeasible. We present a highly transferable framework for automated anomaly detection in high-frequency data from in situ sensors, using water-quality data from rivers flowing into the Great Barrier Reef. After identifying end-user needs and defining anomalies, we ranked anomaly importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. Classifications of drift and periods of anomalously low or high variability were more often correct when we applied mitigation, which replaces anomalous measurements with forecasts for further forecasting, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies and were similarly less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, however, all feature-based methods produced low false positive rates and have the benefit of not requiring training or optimization. Rule-based methods successfully detected impossible values and missing observations. We suggest that a combination of methods will provide optimal performance in terms of correct anomaly detection, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users, freshwater scientists and anomaly detection developers for optimal outcomes with respect to both detection performance and end-user application.