The tables show the area-averaged root-mean-square error for 2-m temperature (tas), total large-scale and convective precipitation (pr), 850-hPa zonal wind (ua850), and 850-hPa temperature (ta850) for each of the CMIP6 models with respect to relevant reference dataset (ERA5 or GPCP). The scores are computed using time-averaged fields (not trends).
For ta850 and ta850, values below topography are filtered out using the surface pressure field to make the results as consistent as possible between the models. The reason for this is that not all models provide extrapolated below the surface for the air temperature (ta) field.
For tas, ta850, and pr, values are area-averaged for the Arctic (latitudes north of 55N). For ua850, values are area-averaged for latitudes north of 40N.
One ensemble member is used for each model.
Numbers with light yellow background are multi-model values: the multi-model mean and the multi-model spread (in terms of the standard deviation).
Red/blue background indicate that the RMSE/bias/correlation values differ from the multi-model values by more than one one standard deviation, with red indicating that the values exceed the multi-model mean by one standard deviation more more and blue indicates that the values are too small by one standard deviation or more. The larger the discrepancy with respect to the multi-model values, the darker the colors.
The rightmost column shows a measure of the overall performance for each model, here defined as the average rank for the four fields investigated, the model with the lowest ranking number is considered the best.
To compute the climatology score, the models are ranked for each variable. The model with the lowest RMSE value is considered the best/highest-ranking model and recives the ranking number 1, the second best model reciving the ranking number 2, etc. Then for each model, the average rank is computed across the variables and then the models are sorted with the allover highest ranking model (with the lowest ranking number) being placed on top.