Skip to content

Estimating observation and model errors in ocean data assimilation

License

Notifications You must be signed in to change notification settings

MetOffice/ocean_error_covs

Repository files navigation

DOI

Estimating observation and model errors in ocean data assimilation

This is a quick User Guide for a friendly, python3-based code to estimate observation and model errors required in ocean data assimilation. If you're also a developer and have new ideas on how to improve the estimations of the error components in ocean data assimilation, you're welcome to contribute to this project! We're better together 🚀!

There is a protocol in place of how to create your own development branch and merge pull requests. Please, ensure you read this document carefully if you're contributing to this project for the first time.

This code is also emoji-friendly, which means that you can add emojis before your commit messages following these guidelines.

The code consists of the following features:

  • HL_error_covs - Error covariances based on observation-minus-forecast data, using Hollingsworth and Lonnberg's methodology.
  • forecast_diff_error_covs - Error covariances based on forecast differences:
  • representation_errors - Errors due to differences between the modelled representation of an observation and what is actually observed:
    • Representation errors due to unresolved, sub-grid scales in the model (see Oke & Sakov, 2007)

The structure of the code also contains:

  • modules - Various classes of methods which are used in the features above.
  • KGO - Known Good Outputs of the features above, so we can ensure that code changes don't erroneously affect the outputs of the code.
  • run_tests - How to run the code tests with different configurations. This is recommended to get familiar with how to run the code and to make sure that your code results match the KGOs above.

All major parts of the code have detailed documentation strings that give details of their input variables and behavior. To see the documentation strings either look at the code or use the python help command. For any questions contact Davi Mignac Carneiro ([email protected]) or Matthew Martin ([email protected])

For speed, several parts of the code can be run on multiple processors using Python's multiprocessing package.

Error Covariance Calculation

Observation-minus-forecast method (Hollingsworth and Lonnberg)

Two top-level subroutines are required to calculate error covariances in HL.py. The subroutine HL_cov_accum_stats calculates the accumulated error covariance statistics over a specific time period, whereas the function HL_error_covs combines multiple files that can be generated by the former subroutine and ultimately calculates the error covariances.

To calculate the accumulated error covariance statistics, the code needs to be given a list of standard files, called feedback files, containing the observations and the model interpolated at observation locations (for more details see file examples in run_tests/test_files.tar.gz). The statistics are then calculated on a regular grid specified by the user. Horizontal correlations are calculated by averaging statistics into bins at a set distance from each grid point.

Both surface and profile observations can be processed. To process profiles, specify a list of depths defining the boundaries of the depth levels on which the statistics will be calculated. In some cases observations from multiple sources can exist in one feedback file. In this case, it is possible to process the statistics from a subset of these sources by specifying a list of observation sources or observation IDs.

Once the accumulated error covariance statistics are calculated, they can be quickly combined in order to generate the final results, containing the variances, covariances, and correlations for each grid box. It also contains the mean error and the number of observations that went into the calculation. These outputs are required by the top-level subroutine in FunctionFitting.py, which fits the error covariances to a function and therefore estimates the magnitude and length-scales of the model errors.

It is important to note that additional post-processing may also be necessary, such as smoothing the fitted data, which is not currently included in the code.

Forecast difference methods

Three top-level subroutines are required to calculate error covariances in forecast_diff_error_covs.py. The subroutine calc_fld_diffs gets the forecast differences for a specific model field. The user can choose for the differences to be either based on distinct forecast lengths valid at the same time (NMC method) or on forecast time lags (Canadian Quick method). For examples of forecast files, see files in run_tests/test_files.tar.gz.

The forecast differences are then used to generate accumulated error covariance statistics over a specific time period in subroutine calc_stats_covs, whereas the subroutine combine_stats combines multiple files that can be generated by the former subroutine and ultimately calculates the model-based error covariances. In this case, error covariances are calculated using the same model resolution and distances are chosen based on a number of grid points in each direction. Keep in mind that computations on high-resolution grids will require a large amount of CPU memory!

For horizontal covariances, the user must choose a model level to perform the calculations. In addition to horizontal covariances, the option of calculating vertical covariances is enabled here, but they will require even more CPU memory. Once the error covariances are calculated, they can be used by FunctionFitting.py and fitted to a function, in order to estimate the magnitudes and length-scales of the model errors.

Representation errors

There are several methods used in the literature to compute representation errors. The only approach coded here so far is based on the sub-grid scale variability, which aims at representing unresolved scales in the model.

Two top-level subroutines are required to calculate representation errors, due to unresolved model scales, in RE.py. The subroutine RE_unresolved_scales reads feedback files within a time window and then compute the standard deviation of the observation values within each model grid cell. The subroutine calc_RE_season reads a list of standard deviation files for a specific time period (e.g. a season) and then computes their average.

Due to the lack of observation in some areas, you might end up with gaps which are filled with undefined values. Although the post-processing is not yet included here, smoothing and spreading the information of the representation error to surrounding areas may certainly be needed.

Requirements to run

Running a simple test

All the inputs for test cases of the code are located at run_tests/test_files.tar.gz. The instructions to run the tests are given below:

  • In the folder run_tests, the user should enter the path where the tests will be performed by changing the variable scratch in the script test.sh.

  • After setting the path then just run all test cases with the command bash test.sh.

  • The shell script calls a few python scripts, each correspoding to a feature of the code (e.g. HL, NMC, etc), where many different configurations are tested.

  • All the netcdf files generated by the tests are stored in the path provided by the user.

  • The user can play with changing some input parameters of the test. In each python script there is a section of changeable parameters. This is the best place to quickly become familiar on how to run the code for different cases!

⚠️ These tests must be run before you create the merge pull request, making sure that the test outputs match the KGOs.

License

BSD-3 License