Q01. How do you know whether homogenization was successful? Is there any conditions to be checked?
I guess we have to distinguish between homogenizing too much and homogenizing too little.
Homogenizing too little can be checked by running a homogenization method on the homogenized dataset. If you still find clear breaks, the first homogenization apparently did not work well. In MASH, Climatol and possibly other computer packages you would have the test statistic of the final data.
Homogenizing too much is hard to detect. If you have an automatic method you would be able to see this in a validation exercise using a dataset with known breaks.
In the literature high correlations between stations is sometimes used. It is also sometimes used to see which method worked best. However, in case of homogenizing too much, the correlations also go up. Thus this is not a good sign.
Q02. Should I only correct inhomogeneities for which there is documentary evidence (in the metadata)?
No, all breaks found by relative statistical homogenization should be corrected. It is clear from the theory and numerical tests that statistical homogenization improves the quality of the data. On the other hand, there are good reasons to expect that only correcting inhomogeneities that are known can lead to biases. For example, the gradual deterioration of the quality of the siting due to urbanization or nearby changes is often not documented, while the resulting relocation is. The installation of new instruments, or other things that cost money, are typically well documented, while maintenance changes may be documented less well.
Q03 How do you select good reference stations for homogenization
There are 3 criteria
1. The reference should have a similar climate as the candidate station. So that the expected climate change and variability is similar and the difference time series with a homogeneous series would be flat. This requires knowledge of the local climate system, but a good proxy may be a similar daily and seasonal cycle and similar responses to important climatic modes.
2. The reference should correlate well with the candidate to remove as much noise as possible. Note that the inhomogeneities can influence the correlations, so it is probably best not to estimate the correlations on single pairs of stations, but use distances and other predictors for the correlation instead.
3. The reference should be relatively homogeneous and not have too many periods with missing data.