Nick Martin | Data Assimilation: Overcoming AI’s Data Uncertainty Limitations for Water Resources
Article written by Molly McCreadie

Water resources are essential for human life. Knowing how to manage water, both now and in the future, is necessary to continue using it as well as possible. Nick Martin and Jeremy White are examining limitations to artificial intelligence applications in water resources generated from noisy and estimated data sets. For poor quality data sets, they found that machine learning models will perform poorly relative to tools that explicitly include physics-based descriptions of physical processes; this is because physics-based calculations can use both data and physics knowledge through data assimilation techniques.  

Artificial Intelligence: What is Machine Learning?

Artificial intelligence (AI) aims to perform human tasks automatically, removing the manual work needed by humans. Machine learning (ML) is a subsection of this, which allows computers to learn from data they are given to create a model. The accuracy of an ML model is entirely dependent on the quality and amount of data that is given to the model during training. As a rule, the better the data, the better the AI-ML model.

ML uses statistics and data driven rules to carry out data analysis or comparisons. Three things are needed to build a machine learning system: input data, observed responses to that data, and prediction skill measurements, which describe how well the predicted outcome matches with the observed one. Physics informed machine learning (PIML) adds physics information into the training process of the model, to create plausible outcomes in response to a set of inputs.

The aim of creating an AI-ML model is for it to generally predict outcomes for a new set of input data, which was not involved in the training of the model. When developing an AI-ML model, an initial set of data is used to train it. Training is where the model learns to predict outcomes that best match observed outcomes, which correspond to the inputs used for statistical learning. During model training, problems can occur when the model ‘overfits’ to that data set, meaning it will only perform well for that specific data and won’t adapt well to new information. This overfitting occurs more as the quality of the data decreases. For noisy and estimated (rather than measured) data, statistical learning will learn the error in the poor-quality data set and will misconstrue the noise as meaningful information. These known issues hinder the ability for the model to learn trends and correlations that may really exist in complex, but poor-quality data sets. The more specific the model needs to be to analyse one particular set of data, the less likely it will be able to adapt to new data sets.

How Machine Learning is Applied to Water Resources

Water resources can be natural or manmade sources of water, provided they are used in ways which are useful, such as when rivers and streams are used for drinking water and to irrigate crops. Typically, data on water resources are primarily composed of estimated values, rather than observed ones. These are known as uncertain data sets because it is difficult to obtain an exact, accurate value. Water resource data is also highly variable, as it depends on many ever-changing factors, such as climate, hydrometeorology, and time. The uncertainty of data sets is a large issue when thinking of water resources, mainly due to the fact that there is no quick or easy way to improve the quality of the data.

Because AI-ML models are purely data driven, variables such as climate change are difficult to incorporate. This may be alleviated by manually predicting future outcomes with physics-based models, by using projected future climate and weather conditions, then training an AI-ML model on this generated data. This means climate change can be addressed with an AI-ML model in predicting future outcomes. The drawbacks are that the AI-ML model will learn to reproduce any inherent bias in the data generated from the physics-based model, which is driven by estimated weather, and that will model the same thing two times in an attempt to get the same answer both times. It does sometimes make sense to model twice because a trained AI-ML model will typically generate predictions much faster and more efficiently than a physics-based model. When assessing AI-ML versus physics-based future predictions, it is important to keep in mind that there are no known methods for accurately predicting the future. Consequently, sports betting is a $100+ billion industry.

Examples of Uncertain Water Resource Data Sets

There are three very common examples of uncertainty in water resource data sets identified by Nick Martin and his team. The first is when the amount of groundwater in an aquifer is estimated using the water level observed in a well. An aquifer is porous rock filled with groundwater, but its exact size and shape are not precisely known. Therefore, to estimate the aquifer’s storage volume as accurately as possible, hydrogeologists commonly use the water level in a well.

Evapotranspiration is the combination of evaporation and transpiration processes which result in water moving into the atmosphere. The evapotranspiration rates are calculated using weather and vegetation parameters, which are the second source of uncertainty. The accuracy of these parameters is dependent on their number and the accuracy of characterising vegetation types. The more accurate the data, the more accurate the evapotranspiration rates that are calculated from it.

Finally, the third example is river discharge. The volume of water discharged by a river is measured using a rating curve and a water stage recorder — a device used to convert water level measurements into a value for water discharge. Here, the error margins depend on the flow regime of the water.

Is Data Assimilation the Answer?

The main risk associated with AI-ML models based on uncertain data is that they will not generalise, leading to incorrect water resource management. The model’s generalisation ability needs to be checked, ensuring it can accurately predict the outcome of new data. Until high quality water resource data is available, this will continue to be a risk.

Data assimilation (DA) is a concept encompassing many ways to combine physics-based models with observed data to make estimates that leverage the information content of the data set and the physics knowledge encoded in the physics-based model. It does this by using a forward physics-based model, where it projects forward in time to predict unobserved values. These model-predicted values are combined with observed data to achieve the best possible, constrained model results. But DA also accounts for any bias or uncertainties. The combination of the forward model results and the observed values are optimised using a goodness-of-fit metric, which means the best models match up the predicted values to the observed ones. By identifying the areas where there are uncertainties and tracking how those uncertainties influence the final predicted outcomes, DA provides a means to combine the advantages of large data sets with physics knowledge, while producing a complete description of the uncertainty inherent in predicted values.

It is important to remember that the AI-ML boom is not based on theoretical advances, but is driven by increased computational abilities and increased availability of large data sets. In water resources, the calculation approaches used in AI-ML to engender statistical learning have been employed routinely by scientists for decades. In fact, similar statistical learning approaches in water resources fell out of favour during 1970s to 1990s, because advances in computational methods and computational ability made it feasible to routinely use a wide variety of physics-based models. These physics-based models were favoured over statistical learning approaches because they have distinct advantages relative to AI-ML when data are scarce, noisy, or comprised of estimated rather than measured values.

Martin and his team believe that the first step in any water resource study is assessment of data quality and quantity which should guide the second step of selecting an analysis method, or model. Data assessment will identify if: 1) complex physics-based models can be replaced by or augmented with AI-ML models, and 2) DA techniques should be used to optimally combine theoretical physics with limited and poor quality data.

SHARE

DOWNLOAD E-BOOK

REFERENCE

https://doi.org/10.33548/SCIENTIA1346

MEET THE RESEARCHER


Nick Martin
Principal Scientist and Founding member, Vodanube LLC, Fort Collins, CO

Nick Martin obtained his BA in International Studies in 1992 and MA in International Relations in 1993, both at the Johns Hopkins University. Martin went on to obtain a BSc in Geophysics from Virginia Tech in 1999, followed by an MS in Hydrogeology from Stanford University in 2004. He is registered as a Professional Geologist and a Professional Hydrologist Surface Water as well as a Certified Floodplain Manager.

Throughout his career as a software developer and surface water and groundwater hydrologist, Martin has focused on risk assessment, mitigation and reliability analysis, as well as resiliency and sustainability relating to climate change. He is currently an International Committee Member for the Association of State Floodplain Managers, a Legislative Committee Chairperson for the Colorado Groundwater Association, and is the Water Commissioner for the City of Fort Collins. Martin is a founding member and principal scientist at Vodanube, based in Fort Collins, CO.

CONTACT

E: nick.martin@alumni.stanford.edu

W: https://github.com/nmartin198/

LinkedIn: www.linkedin.com/in/nick-martin-aa0aa68

KEY COLLABORATORS

Dr. Jeremy White, INTERA Incorporated

FUNDING

Southwest Research Institute (SwRI), San Antonio, TX

FURTHER READING

Martin N & White J (2024) Water Resources’ AI–ML Data Uncertainty Risk and Mitigation Using Data Assimilation. Water, 16(19): 2758. https://doi.org/10.3390/w16192758

REPUBLISH OUR ARTICLES

We encourage all formats of sharing and republishing of our articles. Whether you want to host on your website, publication or blog, we welcome this. Find out more

Creative Commons Licence (CC BY 4.0)

This work is licensed under a Creative Commons Attribution 4.0 International License. Creative Commons License

What does this mean?

Share: You can copy and redistribute the material in any medium or format

Adapt: You can change, and build upon the material for any purpose, even commercially.

Credit: You must give appropriate credit, provide a link to the license, and indicate if changes were made.

SUBSCRIBE NOW


Follow Us

MORE ARTICLES YOU MAY LIKE

Nick Martin | The Future of Floods: Smarter Risk Tools for Sustainable Water Management in a Changing Climate

Nick Martin | The Future of Floods: Smarter Risk Tools for Sustainable Water Management in a Changing Climate

Sustainable decision-making requires balancing the costs borne by today’s society with those that will fall on future generations. Climate change is intensifying extreme weather, making floods more severe because a warmer atmosphere can hold and deliver a larger volume of water as precipitation. It may also be the case that severe floods are becoming more frequent as drought becomes more frequent, average conditions rarely occur, and weather oscillates between short duration wet and long duration dry extremes. Worryingly, traditional infrastructure (often designed using outdated, backward-looking models) risks failing under these evolving conditions.
Nick Martin from Vodanube LLC, and his colleagues have applied Probabilistic Risk Assessment (PRA) to flood inundation. Their research optimises current adaptation and future mitigation strategies, even while acknowledging PRA’s limitations. The team demonstrates how this approach can guide more resilient water resource management, and highlights opportunities for further study.

Dr James D. Burrington | Fuelling Hydrogen’s Role in a Net Zero Future

Dr James D. Burrington | Fuelling Hydrogen’s Role in a Net Zero Future

Hydrogen is often touted as the fuel of the future, but how much can it really contribute to global decarbonisation? Dr James D. Burrington, founder of NiceChemistry.com, has modelled how hydrogen technologies, particularly green hydrogen, might support worldwide net zero goals. His research applies rigorous metrics to assess energy efficiency, cost, emissions, and land use. This revealed that, while hydrogen may not directly replace electricity, it could be critical in decarbonising sectors where electrification falls short.

Dr Shigetaka Hayano | The Rubber Revolution: Cracking the Code for Tire Recycling!

Dr Shigetaka Hayano | The Rubber Revolution: Cracking the Code for Tire Recycling!

Traditionally, rubber waste was nearly impossible to recycle due to crosslinked sulphur bonds. But a team of researchers led by Dr Shigetaka Hayano from Zeon Corporation, in Japan, have achieved a groundbreaking feat in rubber recycling. Using mild conditions for the reaction, scientists have overcome the unfavourable cross-linked structure and have achieved recovery of rubber’s original monomers. This process restores cyclopentene monomers with 90% efficiency, allowing old tires and industrial rubber waste to be chemically recycled into high quality materials. If scaled up, this innovation could revolutionise waste management, reduce environmental pollution, and enable a circular economy for rubber production.

Seeing Beneath the Surface: Exploring Deltaic Reservoirs with Augmented Reality

Seeing Beneath the Surface: Exploring Deltaic Reservoirs with Augmented Reality

In the Aínsa Basin of the Spanish Pyrenees, the Mondot-1 well was drilled, cored, and fully logged to capture a detailed record of a long-buried ancient river delta system. Dr. John D. Marshall, Dr. Jürgen Grötsch, and Dr. Michael C. Pöppelreiter with co-workers at Shell International used this core to trace how sediments once flowed across the landscape, and were deposited under shifting tectonic conditions. The team employed augmented reality and interactive virtual displays; these innovative tools offer new ways to explore subsurface depositional systems, and are particularly useful in locations where physical access to the core is difficult, or no longer possible.