Back to Table of Contents

2. Data requirements – methodological framework

2.1 Modelling categories

There is a huge diversity of model types and purposes of modelling, as well as of hydrological regimes. Therefore, it is not straightforward to specify how much data are required for modelling. When we in the following try to do so anyway we do it with the objective of having a framework for analysing and discussing data availability in relation to modelling and we should like to emphasise that it must by no means be seen as standards prescribing how much data are required. Thus, the attempt made in the following does not pretend to provide universally applicable results and many examples for which it does not apply can no doubt be found.

We characterise modelling in two dimensions: (a) the level of complexity; and (b) the modelling domain.

The level of complexity is subdivided into four categories inspired by the classification of hydrological models into empirical, lumped conceptual and distributed physically-based (Refsgaard, 1996):

The modelling domain can be subdivided into the following six groups:

Typical types and examples of model codes for the different categories are listed in Table 1.

Table 1 Types of model codes and examples of codes suitable for different domains and different levels of model complexity
Complexity of code -Application (Data Requirements) Water Resources Assessment Floods Surface water quality and ecosystems Agricultural management, including nonpoint pollution Groundwater quality Socioeconomic cost assessment
Simple models – Screening (Low) GIS based water balance tools, MIKE BASIN   Regression models, Vollenweider-Model, Moneris Regression models, Export coefficient models, USLE DRASTIC Cost effectiveness
Intermediate models – Planning (Intermediate) Sacramento, HBV HEC-RAS 1D, Lisflood   HBV-N, INCA, AGNPS,   Hedonic pricing method, Contingent valuation
method, Costbenefit
Comprehensiv e models – Design (High) MIKE SHE, MODFLOW MIKE 11, SOBEK, ISIS, Telemac DELFT3D-ECO WASP7, RWQM1, Aquasim SWAT, DAISY, SOIL-N, ANIMO WEPP, EUROSEM PHREAQC, HST3D, RT3D  
Process studies – Research (Very high) Comprehensive model codes + tailor made research codes Comprehensive model codes + tailor made research codes Comprehensive model codes + tailor made research codes Comprehensive model codes + tailor made research codes Comprehensive model codes + tailor made research codes Comprehensive model codes + tailor made
research codes

2.2 Data requirements for the WFD implementation

The data requirements in a river basin depend on many factors. Fundamentally it depends on the economic and social/political value of the data, which again depends on the water use and conflicting demands in a socio-economic context and on the hydrological regime. As noted by WMO (1994) it is in practise impossible to design a monitoring network from purely objectively scientifically based criteria taking all relevant aspects into account. It is therefore also impossible to prescribe universally applicable data requirements for water management, including modelling.

One point of departure in search for guidance on data requirements could be WMO’s Guide to Hydrological Practices WMO (1994) which outlines requirements to density of stations for minimum network depending on physiographical regions. For interior plains and hilly/undulating regions the minimum densities of gauging stations are specified as

These requirements are seen to be very/extreme minimal and intuitively far below what is required in a WFD context. These minimum requirements would result in only one surface water quality gauging station for countries such as The Netherland and Denmark. With the recognised eutrophication problems in these countries this is obviously far from sufficient and as a fact much more stations actually exist.

Another place to look for recommendations on data requirements would be the CIS Guidance Documents. This has been done by Blind and de Blois (2003). Data requirements for the WFD are discussed in many of the WFD guidance documents, e.g. those dealing with 'Monitoring' (EC, 2003b) and 'Analysis of pressures and impacts' (EC, 2003a) as well as 'Planning Processes' (EC, 2003c). In the monitoring guidance document (EC, 2003b) some general guidance is provided on where, what and when to monitor in the different monitoring programmes. Most detailed information is provided on which variables that must be included in the monitoring programmes, while the required temporal and spatial resolution in general are linked to the chosen level of confidence and precision, where a key principle is stated: “the actual precision and confidence levels achieved should enable meaningful assessments of status in time and space to be made. Member States will have to quote these levels in River Basin Management Plans and will thus be open to scrutiny and comment by others”. The acceptable level of precision and confidence is thus a subjective quantity that depends on the socio-economic interests that are at stake and of the risk strategy of the decision makers (e.g. Bots et al., 2007). Therefore, there is a balance between the costs of monitoring against the potential risk and, in general, the lower the desired risk, the more monitoring is required.

The monitoring guidance (EC, 2003b) does, however, recognise that the required resolution depends on both the dynamic of the variables and the complexity of the system. For the temporal resolution a frequency must be chosen that allows the detection of both short- and long-term changes of the physical system. Highly variable parameters should therefore be measured more frequently than parameters that are more stable or relatively unresponsive to short-term variation in stresses and pressures. Furthermore, the monitoring programmes must be designed on the basis of a conceptual understanding of the system where the level of refinement needed in a model (conceptual model) is proportionate to (i) the difficulty in making the assessments or predictions required, and (ii) the potential consequences of errors in those assessments (EC, 2003b). Hence, the more complex the system is, the more monitoring data are likely required to understand the system or to reach a chosen confidence level.

One could then ask: when is a monitoring programme sufficient for the WFD – does such a programme exist in Europe? This question can not be fully answered until the first WFD cycle has been completed and evaluated after 2015. The following example may, however, put some light on the issue. Larsen et al. (2002) conducted an inventory on the National Monitoring Programme for the Aquatic Environment in Denmark with a focus on statistical optimization. This was done parallel to a review carried out by the European Topic Centre on Water (ETC, 2002) where it was concluded that the Danish monitoring programme in general is sound and adequate in terms of number of stations, spatial coverage, parameters and frequencies. Even so Larsen et al. (2002) concluded that frequencies are far too low to prove various statistical hypotheses and that often several (>5) decades with measurements will be required to identify a given trend. As an example the number of yearly measurements (presently 10) at the existing stations in lakes aiming at estimating the average level of parameters such as nutrients and chosen biological indicators should be raised with factor 2-5 to get a relative precision (The relative precision describes the relative deviation of the 95% confidence interval to the geometric mean - thus the lower relative precision the more well-determined is the value) of 20%. Another example is the groundwater monitoring stations, around 1000 wells in 70 monitoring areas that are typically sampled once per year. This sampling frequency only provides a precision level around 50% of a given parameter (nitrate), and raising this precision to about 20% requires 2-4 times more measurements. This indicates that even a rather intensive monitoring programme may not in all situations be sufficient to fulfil the requirements of the WFD within the relatively short time span until 2015.

At the Harmoni-CA WP4 workshops several participants stated that some data types required by the WFD are still not being collected. This is a challenge to be solved the nearest years (Arustiene 2005; Demetriou 2005; Crabtree 2005; Håkansson 2005; Jørgensen 2005; Pataki 2005; Purvina 2005).

2.3 Data requirements for modelling

Often data requirements for a given model code are considered as if it is possible to give a general answer to the question: “Are there enough data for use of this model code?” Typically the indicative list of required data provided in connection with documentation and users’ guides of model codes are considered as fundamental requirements also implying that the model will be able to provide useful results if such data are available. The model evaluation tool provided by the BMW project (http://www.rbm-toolbox.net/bmw/index.php) for instance includes the question: “Are all the necessary data required for the implementation of the model available?”

The opposite view that there is always enough data to set up and apply a model is advocated by some modellers. As an extreme example Refsgaard et al. (1999) established a catchment model for simulation of groundwater contamination from nitrate leaching alone from standard European level databases such as GISCO, EUROSTAT and EEA. Their view was that it is always possible to generate input data from secondary data sources through various transfer functions, and that it is important to assess the corresponding model prediction uncertainty (Thorsen et al., 2001). According to this view the question: “Are there enough data?” is ill posed. Instead the relevant question should be: “Is the model prediction uncertainty small enough to be of any practical use in a water management context?” This last question cannot be answered without considering the socio-economic context, i.e. how much is at stake economically and politically for stakeholders and decision makers, and it will therefore vary from case to case (Refsgaard and Henriksen, 2004; Højberg et al., 2007).

Depending on the model approach the data requirements vary greatly both with respect to the need for time-series and system data such as soil maps and land use. The different model concepts provide different degree of insight in the physical system and have different requirements with respect to system data and data for model calibration and validation, as illustrated in Figure 1. While the empirical models only describe an observed relation between the variables of interest, the physical-based models can, theoretically, describe all state variables of the entire model domain, and thus hold the potential of providing the most detailed and correct representation of the physical system. Most distributed models are, however, a combination of distributed and lumped descriptions, where less important areas/processes are described by a lumped approach, which lowers the data requirements for the model application.

Figure 1 Illustration of the relation between data requirement and provided insight in the physical system for different model concepts

An implicit assumption in the use of models in praxis is that the model concept is correct, being either empirical, lumped or physical based, and the model parameters only need to be adjusted in order to mimic the site-specific system acceptable, i.e. it does not require a modification of the governing equation of the model code. If the model, on the other hand, is used to validate or refine the underlying relations/equations, a much more detailed description is generally required to study the relevant processes thoroughly, and much more data are required. This type of model use is not common in praxis but most often carried out in the research world by use of research models, where alternative formulations may be tested and compared to field observations from dense monitoring programmes designed for the specific purpose.

The data requirements for the different model types are listed in Table 1 as low/intermediate/high/very high and this is differentiated into system data and time series data in Figure 1. But how much data is that and when do you decide on selection of model type?

2.4 How and when in the modelling process is the necessary amount of data assessed?

The modelling process may, according to the HarmoniQuA project (Refsgaard et al., 2005a; Scholten et al., 2007, www.harmoniqua.org), be decomposed into five major steps grouping related tasks to perform at different stages in the model study, Figure 2. The contents of the five steps are:

STEP1 (Model Study Plan). Some of the important issues of this step are to define the objectives and requirements for the model study. Questions to be addressed include: What is the purpose of the modelling – what are the questions we are searching an answer for? What are the consequences of being wrong? Where the latter question dictates the level of detail and precision required in the model study. What is the available data, and what are the needs of collecting additional data.

STEP2 (Data and Conceptualisation). In this step the modeller collects all the relevant knowledge about the study basin and develops an overview of the processes and their interactions in order to conceptualise how the system should be modelled in sufficient detail to meet the requirements specified in the Model Study Plan.

STEP3 (Model Set-up). With a conceptual model of the system the numerical model is constructed by transforming the conceptual model into a site-specific model that can be implemented in the model code.

STEP4 (Calibration and Validation). In this step the models ability to reproduce the physical system is analysed. This is acquired first by calibration, where the model parameters are adjusted until the model response reproduces the reality within a user specified accuracy, and next by validation, where the model performance are compared to independent data not used for calibration.

STEP5 (Simulation and Evaluation). In the final step the calibrated and validated model is used to carry out simulations to meet the objectives of the model study. These simulations may produce specific results that can be used in subsequent decision making (e.g. for planning or design purposes) or to improve understanding (e.g. of the hydrological/ ecological regime of the study area).

1. Model Study Plan
  • Identify problem
  • Define requirements
  • Assess uncertainties
  • Prepare Model Study Plan
2. Data and Conceptualisation
  • Collect and process data
  • Develop conceptual model
  • Select model code
  • Review and dialogue
3. Model Set-up
  • Construct model
  • Reassess performance criteria
  • Review and dialogue
4. Calibration and Validation
  • Model calibration
  • Model validation
  • Uncertainty assessment
  • Review and dialogue
5. Simulation and Evaluation
  • Model predictions
  • Uncertainty assessment
  • Review and dialogue
Figure 2 Decomposition of a model study into five major steps, simplified from Refsgaard et al. (2005a).

The flow chart in Figure 2 is a simplification. First of all each of the five steps are decomposed into several tasks. But maybe more importantly, it should be noted that the modelling process is not simply linear as depicted in Figure 2. Actually, one of the key characteristics of the modelling process described by Refsgaard et al. (2005a) and Scholten et al. (2007) is the many feedback processes and re-do loops caused both by modeltechnical reasons and by decisions made by the water manager after dialogues with stakeholders. Furthermore, it should be noted that the modelling process is interacting with the water management process. This is outlined in Refsgaard et al. (2007) and discussed in details in Bots et al. (2007).

In a model study data are required to conceptualise the system (STEP2). Here the data requirements are mostly dependent on the complexity or heterogeneity of the physical system, but also on the detail and precision by which we need to know the physical system in order to provide the answers to the questions raised in the Model Study Plan within the specified precision. For simple and homogeneous system, or where the requirement to the precision is low, few data may be sufficient for an adequate understanding of the system. In highly complex and heterogeneous system, or where the consequences of being wrong are fatal and the required precision thus is high, a higher spatial and temporal resolution of the data are generally required. In STEP4 data is needed for the calibration and validation of the model, where the requirements are mostly dependent on the performance criteria that have been defined for the model. If the criteria are low, it may be accepted that the model is only calibrated without a following validation. This lowers the requirements on the observations of the system response, but also lowers the confidence in the model results. A first assessment of the data required to achieve a desired level of accuracy in model predictions have to be made already in STEP1 and must be linked with the design of the WFD monitoring programme.

The amount of data required in a model study is thus controlled by the complexity of the physical system and the precision by which the model results are needed (model performance). The requirements can therefore not be expressed exact, like an absolute number of measurements per square unit, which is appropriate for all model studies. In contrary, the amount of data required varies from study to study depending on the objectives and requirements for the study. The question of how much data is required for modelling is therefore not well posed. Modelling is in principle possible with all levels of data quantity and quality. Instead it should be recognised that the quality and quantity of data will affect the uncertainty of the model outputs and as such the usefulness of the model simulations (Refsgaard et al., 2005b). The relevant question is therefore what the requirements to the accuracy of the modelling are. Only when the answer to this question is given, the data requirements can be assessed in a meaningful manner.

2.5 The relation between data availability and model performance

In the scientific community there is not total agreement on the relationship between model complexity, data availability and predictive performance. One common illustration of the conceptual relationship is shown in Figure 3. It illustrates that the model performance in general increases with data availability, and that complex models are required to make full use of large data availability. It also illustrates, however, that for limited data availability the moderately complex models may often perform better than the very complex models. The reason for this is that the uncertainty of model predictions (the opposite of the predictive performance axis in Figure 3) consists of two terms: (a) the parameter uncertainty which increases with model complexity, because of an increasing number of parameters; and (b) the model structure uncertainty which decreases with model complexity. This implies that for complex models with many ‘free’ parameters that require data for calibration the increase in parameter uncertainty may be larger than the decrease in model structure uncertainty when we have exceeded a certain level of complexity

Figure 3 Schematic diagram of the relationship between model complexity, data availability and predictive performance (Modified from Grayson and Blöschl, 2000)

The view illustrated in Figure 3 originates from statistical theoretical considerations assuming that the number of free parameters requiring calibration increases with model complexity. This is, however, not always the case. For model codes with comprehensive application records there are often experience based recommendations for setting default parameter values, so that typically only a handful of parameters are adjusted by calibration in the particular catchment. In this way secondary data sources (all previous model applications in other locations) are used. This would correspond to moving left on the data availability axis towards larger data availability.

An often posed question is whether the data collected for monitoring purposes under the WFD are also sufficient for modelling. This issue is analysed by Højberg et al. (2007). They argue that because (a) the monitoring programme should be designed so that it provides data that are necessary to provide sufficiently reliable assessments; and (b) the water manager decides on the desired accuracy level of the analysis, including the use of models as required, the monitoring should in principle be sufficient also to support the necessary modelling studies. Otherwise the monitoring is not good enough. In practise this may, however, not always be the case, amongst others because the water manager in the first WFD cycle up to 2015 may not have been able to analyse thoroughly what is required to achieve a certain level of accuracy in model predictions at the time of designing the monitoring system. Furthermore, it is noted that this does often not fulfil the data needs of researchers for modelling analysis in relation to process studies, which are beyond the scope of WFD.

2.6 Conclusions on methodological considerations regarding data requirements

Data requirements depend on the system and model complexity and the purpose of the modelling study, which again depend on the socio-economic context, i.e. what is the problem, what are the conflicting interests and how much is at stake economically, socially and politically.

Therefore it is not possible to prescribe general data requirements for water management, and WFD, purposes. We will instead supplement the above general considerations by approaching this issue from the opposite side through reviewing the data availability in some European river basins and try to relate this to the types and character of the various water management problems and the types of modelling carried out in the respective basins.


Go to chapter 3
Back to Table of Contents