Back to Table of Contents

5. Data quality – uncertainty

In addition to the availability and the accessibility of the data, the quality of the data is crucial. If data are not trustworthy they are not useful. Some data are known to be of good quality with well documented measurement, analysis and quality control procedures. Some data are perceived to be of questionable quality and other data even known to be of poor quality. There are many possible reasons for poor or questionable data quality, such as:

Data quality should not be characterised categorically just as good or poor with the implicit implication that the data are useful or useless, respectively. One reason for this is that the scale of data quality should be more differentiated and for instance expressed as data uncertainty in terms of probability distributions. Furthermore, the usefulness of uncertain data depends on the context in which they are used. Data with some level of uncertainty may be useful for a first screening of a problem, but may be inadequate for a comprehensive analysis with high demands to predictive accuracy.

Data quality can be characterised in terms of data uncertainty. Uncertainty is here defined as the degree of confidence that a person has in the data and the probabilities of the data (Klauer and Brown, 2003). It is noted that by adopting this definition uncertainty is considered as subjective implying that we can not know what the true uncertainty is, but different persons can come to consensus about their respective uncertainty assessments. Understanding the uncertainty in environmental data and systems is essential for making robust and wise water management decisions (Funtowicz and Ravetz, 1990; Pahl-Wostl, 2002). Therefore characterisation of data uncertainty is becoming an important basis for modern water resources management. Concepts and tools for characterisation and handling of data uncertainty are described in Refsgaard et al. (2005b, 2006) and Brown et al. (2006). Literature surveys were made to support the assessment of uncertainty in different types of data (Van Loon and Refsgaard, 2005).

In environmental modelling studies field data usually have a spatial and temporal scale of support that is different from the one at which models operate. This calls for a methodology for rescaling data uncertainty from one support scale to another (Heuvelink and Pebesma, 1999; Brown et al., 2006).

The practical implications of this are that data uncertainty should be assessed and that information on data uncertainty should be stored in as an important element of the data documentation. A problem in this respect is that standard databases today are not designed to enable storage of data uncertainty (Refsgaard et al., 2006).


Go to chapter 6
Back to Table of Contents