The influence of station density on climate data homogenization

Gubler, S., Hunziker, S., Begert, M., Croci-Maspoli, M., Konzelmann, T., Brönnimann, S., Schwierz, C., Oria, C. and Rosas, G., 2017: The influence of station density on climate data homogenization. Int. J. Climatol., 37: 4670–4683. doi: 10.1002/joc.5114.

Abstract. Relative homogenization methods assume that measurements of nearby stations experience similar climate signals and rely therefore on dense station networks with high-temporal correlations. In developing countries such as Peru, however, networks often suffer from low-station density. The aim of this study is to quantify the influence of network density on homogenization. To this end, the homogenization method HOMER was applied to an artificially thinned Swiss network.

Four homogenization experiments, reflecting different homogenization approaches, were examined. Such approaches include diverse levels of interaction of the homogenization operators with HOMER, and different application of metadata. To evaluate the performance of HOMER in the sparse networks, a reference series was built by applying HOMER under the best possible conditions.

Applied in completely automatic mode, HOMER decreases the reliability of temperature records. Therefore, automatic use of HOMER is not recommended. If HOMER is applied in interactive mode, the reliability of temperature and precipitation data may be increased in sparse networks. However, breakpoints must be inserted conservatively. Information from metadata should be used only to determine the exact timing of statistically detected breaks. Insertion of additional breakpoints based solely on metadata may lead to harmful corrections due to the high noise in sparse networks.


  1. It would be better to have two assessment, not counting the ones of the editors. And it would be better to have two editors responsible for a synthesis, but let’s not be too strict in this beginning stage.

    Both assessments seem sound and the grades well motivated. Thus I will simply average the grades.

    Impact on the larger scientific community. [75]
    Contribution to the scientific field of the journal. [85]
    The technical quality of the paper. [75]
    Importance at the time of publishing. [-]
    Importance of the research program. [-]


  1. This is a very worthwhile paper and the first of its type that I am aware of which has considered the issue of station density in such a systematic way. The basic concept of using the full homogenised data set as a reference is sound, at least as a benchmark.

    My only caution with the results is that the specific results presented in the paper are likely to be specific to the regions concerned. Peru and Switzerland both have complex topography and hence it would be expected that correlation length scales are relatively short by global standards. (By way of comparison, the density of temperature stations in Australia is about one per 10,000 square kilometres, even sparser than Peru, but the generally flat terrain means that correlation length scales are longer). Peru has the added complication that there is a strong ENSO signal on temperatures, including on coastal temperature gradients (although it is unclear from the paper how close the study area is to the coast) – in fact it is hard to imagine a more challenging region geographically for data homogenisation in general.

    The authors also note that correlation length scales are generally shorter in Peru than in Switzerland, and note that this is typical of the tropics. I note here that the correlations are aggregated across the entire year. My experience with tropical Australian data is that (a) there is very strong seasonality in correlation length scales for minimum temperature – in some cases, typical distances for decay to 0.6 correlation range from 200-300km in the wet season to 1000-1500km in the dry season and (b) low daily minimum temperatures in the tropical wet season generally occur during precipitation events (especially significant thunderstorms), not as a result of radiational cooling, so it is not surprising that minimum temperature correlations become comparable to those for precipitation during the wet season.

    (It is interesting that most of the lowest correlations in the Swiss data set, especially for maximum temperature, are in the 50-150km range; presumably this reflects the geographic distribution of mountain/valley station pairs, which I imagine would be the ones in Switzerland with the weakest correlations, all other things being equal?).

    Something which is mentioned only in passing is that one of the largest benefits of a homogenised data set is that it greatly reduces the spread of station trends, producing a more spatially coherent set of results, as shown in Figure 4.

    The result that the automatic method performed better (or, perhaps more accurately, less badly) on the sparse data set than the dense data set is interesting, and could perhaps use some more exploration.

    Impact on the larger scientific community. [80]
    As the first major paper of its type, this will be an important contribution to what can and cannot realistically be done in the homogenisation of sparse networks, which are particularly common in developing countries.

    Contribution to the scientific field of the journal. [90]
    This paper is definitely relevant to the journal.

    The technical quality of the paper. [90]
    The paper appears to be technically sound.

    Importance at the time of publishing. [-]
    Not relevant. New paper.

    Importance of the research program. [-]
    Not relevant. Single paper.

    1. Thank you for your assessment. I agree that when looking to transfer these results to other regions it would be better to look at typical correlations between stations than simply station density.

      In an upcoming paper (the manuscript is found on EarthArXiv) Ralf Lindau and I propose a method to estimate the Signal to Noise Ratio (SNR) and the number of breaks for a difference time series. The SNR is the standard deviation of the break signal divided by the standard deviation of the noise. These numbers will hopefully be an accurate way to estimate how difficult it is to homogenise a network.

      These estimates are just for a difference time series: one would still need to make choices which references series to include when computing a network wide average. Comparing SNR and number of breaks is still apples and oranges. It would be valuable to have a measure that combined both, but that would likely be more homogenisation method specific.

  2. The articles studies how well homogenisation works for a low density network based upon a Swiss dataset, which could be homogenised very well because it has few inhomogeneities, a high station density and good metadata. The homogenisation method used is HOMER, it is used in four configurations. 1) with metadata for confirmed breaks, 2) without metadata, 3) with all metadata breaks set as breaks in advance, 4) using automatic (joint) detection.

    The high density network homogenised with method 1 is used as reference/truth. The article also compares all four homogenisation configurations for the dense networks. These results are less convincing because it is less clear what the best configuration is and similarly good configurations would also show differences. However, the high density network is likely quite close to the truth for the sparse networks and can be used as validation dataset.

    Working with real data has its disadvantages, but the main advantage is that we are sure the inhomogeneities are realistic. As such they are an important independent line of evidence that complements validation studies using simulated data. One of the disadvantages is that the homogenised data is not homogeneous data, no matter how well done the homogenisation is. Another is that you have less control over the properties of the inhomogeneities. In this study the inhomogeneities did not produce a trend error in the precipitation data. As a consequence it was not possible to study how well homogenisation can remove network-wide precipitation trend errors. Furthermore, the number of breaks was lower than typical; a conservative homogenisation approach by Kuglitsch et al. (2012) found one break per 48 years in the Swiss network, in this study the more liberal HOMER found one break per 25 years. Both estimates are low compared to other networks.

    The study has some important findings that are only hinted at in the abstract.

    1) It finds that automatic homogenisation with HOMER using joint detection is not a good idea. (HOMER with pairwise detection is basically PRODIGE and well tested in, e.g., the HOME benchmark.) Also other studies and conference contributions have noted problems with joint detection. Used manually together with pairwise detection, it can be used, but it is best not used automatically until we understand the problem.

    2) HOMER could reduce the errors in the station data, and the RMSE network-wide, but working only with the sparse network it could not reduce the network average trend errors, which is an important task.

    3) The breakpoints due to automation of the network around 1980 were often not detected and thus not corrected in the sparse network. These are the most important kind of breakpoints as they are likely to lead to network-wide trend errors. For such transitions the use of parallel measurements is thus paramount, especially when the network is sparse.

    4) In this study the use of metadata in the sparse networks did not improve the results.

    Detailed comments can be found in the web annotations.

    Impact on the larger scientific community. [70]
    The findings that in sparse networks homogenisation may not be able to improve trend estimates and that break detection in case of network-wide inhomogeneities (automation) in sparse networks is hard is potentially highly important. Does need more understanding and confirmation. If this is found to be a general problem, it is highly important and would be a reason to come back to this assessment and increase this grade.

    Contribution to the scientific field of the journal. [80]
    Validation studies on real data are important as independent line of evidence complementing validation studies using stochastic data, even if the latter likely produce more accurate results. The four findings numbered above are very important.

    The technical quality of the paper. [60]
    Looks like a reliable solid paper. Not all details of the computation and homogenisation are clear. Data and software not published with paper.

    Importance at the time of publishing. [-]
    Not relevant. New paper.

    Importance of the research program. [-]
    Not relevant. Single paper.

Leave a Reply

Everyone is welcome to make comments on this paper below. The comments are pre-moderated (will only appear after approval by the editors) to ensure they are on topic.

Your email address will not be published. Required fields are marked *