Skip to content

Commit

Permalink
Update the GATK base image to a newer LTS ubuntu release (#8610)
Browse files Browse the repository at this point in the history
* Update the GATK base image to the latest Ubuntu LTS release (22.04)

* Add some additional useful utilities to the base image

* Switch to a newer conda version with a much faster solver

* Update the scripts and documentation for building the base image

* Update the VETS integration tests to allow for a small epsilon during numeric comparisons, and include the full diff output in exception messages when a mismatch is detected
  • Loading branch information
droazen committed Dec 12, 2023
1 parent 0b18579 commit 85d13d4
Show file tree
Hide file tree
Showing 14 changed files with 298 additions and 153 deletions.
4 changes: 1 addition & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG BASE_DOCKER=broadinstitute/gatk:gatkbase-3.1.0
ARG BASE_DOCKER=broadinstitute/gatk:gatkbase-3.2.0

# stage 1 for constructing the GATK zip
FROM ${BASE_DOCKER} AS gradleBuild
Expand Down Expand Up @@ -93,8 +93,6 @@ RUN conda env create -n gatk -f /gatk/gatkcondaenv.yml && \
echo "source activate gatk" >> /gatk/gatkenv.rc && \
echo "source /gatk/gatk-completion.sh" >> /gatk/gatkenv.rc && \
conda clean -afy && \
find /opt/miniconda/ -follow -type f -name '*.a' -delete && \
find /opt/miniconda/ -follow -type f -name '*.pyc' -delete && \
rm -rf /root/.cache/pip

CMD ["bash", "--init-file", "/gatk/gatkenv.rc"]
Expand Down
19 changes: 12 additions & 7 deletions scripts/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This repo contains the scripts for creating and pushing two docker images:
- gatkbase -- a basic docker image that we do not expect to change very often. The GATK4 docker image uses this one (``FROM``)
- gatk4 -- the official docker image for GATK4. The instructions in this document pertain to this image, unless otherwise stated.

``scripts/docker/gatkbase/build_docker_base.sh`` is a script to create the gatkbase docker image.
``scripts/docker/gatkbase/build_docker_base_cloud.sh`` and ``scripts/docker/gatkbase/build_docker_base_locally.sh`` are scripts to create the gatkbase docker image.
``build_docker.sh`` is a script to create the full gatk4 docker image.
``build_docker_remote.sh`` is a script to create the full gatk4 docker image using google cloud build remotely. This is useful if you can't build the docker image locally (for example if you have an M1 Macbook) NOTE: this requires the user first specify their project with the command `gcloud config set project <PROJECT>` to a project that has access to google cloud build.

Expand Down Expand Up @@ -148,19 +148,24 @@ exit
This is a base image that does not require any files in the gatk repo (except the build script and Dockerfile, obviously). GATK docker images are dependent on this one.

**IMPORTANT**
- The gatkbase build script should be run from the ``scripts/docker/gatkbase`` directory.
- If you want to create a new version, you must modify the ``build_docker_base.sh`` script directly. Any changes should be committed to the repo.
- The gatkbase build scripts should be run from the ``scripts/docker/gatkbase`` directory.

#### Create gatkbase docker image and push it to docker hub
#### Build the gatkbase docker image on your local machine:

```bash
build_docker_base.sh -p
build_docker_base_locally.sh <docker_image_version>
```

#### Create gatkbase docker image and do not push it to docker hub
#### Build the gatkbase docker image remotely using Google Cloud Build:

```bash
build_docker_base_cloud.sh <docker_image_version>
```

#### Release a pre-built gatkbase image to the official repositories, after testing it:

```bash
build_docker_base.sh -p
release_prebuilt_base_image.sh <prebuilt_image> <version_number_for_release>
```


Expand Down
44 changes: 26 additions & 18 deletions scripts/docker/gatkbase/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# Using OpenJDK 17
# This Dockerfile does not require any files that are in the GATK4 repo.
FROM ubuntu:18.04
FROM ubuntu:22.04

# Avoid interactive prompts during apt installs/upgrades
ENV DEBIAN_FRONTEND noninteractive

#### Basic image utilities
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
RUN apt update && \
apt full-upgrade -y && \
apt install -y --no-install-recommends \
python3 \
python3-pip \
wget \
curl \
bc \
Expand All @@ -15,14 +19,18 @@ RUN apt-get update && \
less \
bedtools \
samtools \
bcftools \
tabix \
jq \
git \
gpg-agent \
build-essential \
openjdk-17-jdk \
vim \
software-properties-common && \
apt-get -y clean && \
apt-get -y autoclean && \
apt-get -y autoremove && \
apt -y clean && \
apt -y autoclean && \
apt -y autoremove && \
rm -rf /var/lib/apt/lists/*

RUN java -version
Expand All @@ -31,11 +39,11 @@ RUN java -version
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] http://packages.cloud.google.com/apt cloud-sdk main" \
| tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg \
| apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
apt-get update -y && \
apt-get install -y --no-install-recommends google-cloud-cli && \
apt-get -y clean && \
apt-get -y autoclean && \
apt-get -y autoremove && \
apt update -y && \
apt install -y --no-install-recommends google-cloud-cli && \
apt -y clean && \
apt -y autoclean && \
apt -y autoremove && \
rm -rf /var/lib/apt/lists/* && \
# Remove the anthos cli tool and related files since they are very large and we / anyone using the docker are unlikely to use them
# Remove the bundled python because we have python installed separately
Expand All @@ -55,16 +63,16 @@ ENV JAVA_LIBRARY_PATH /usr/lib/jni

# Install miniconda
ENV DOWNLOAD_DIR /downloads
ENV CONDA_URL https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
ENV CONDA_MD5 = "32d73e1bc33fda089d7cd9ef4c1be542616bd8e437d1f77afeeaf7afdb019787"
ENV CONDA_URL https://repo.anaconda.com/miniconda/Miniconda3-py310_23.10.0-1-Linux-x86_64.sh
ENV CONDA_SHA256 "c7a34df472feb69805b64df6e8db58363c5ccab41cd3b40b07e3e6dfb924359a"
ENV CONDA_PATH /opt/miniconda
ENV PATH $CONDA_PATH/bin:$PATH
RUN mkdir $DOWNLOAD_DIR && \
wget -nv -O $DOWNLOAD_DIR/miniconda.sh $CONDA_URL && \
test "`md5sum $DOWNLOAD_DIR/miniconda.sh | awk -v FS=' ' '{print $1}'` = $CONDA_MD5" && \
test "$(sha256sum $DOWNLOAD_DIR/miniconda.sh | awk -v FS=' ' -v ORS='' '{print $1}')" = "$CONDA_SHA256" && \
bash $DOWNLOAD_DIR/miniconda.sh -p $CONDA_PATH -b && \
rm $DOWNLOAD_DIR/miniconda.sh && \
conda clean -afy && \
find /opt/miniconda/ -follow -type f -name '*.a' -delete && \
find /opt/miniconda/ -follow -type f -name '*.pyc' -delete && \
rm -rf /root/.cache/pip
conda config --set auto_update_conda false && \
conda config --set solver libmamba && \
rm -rf /root/.cache/pip
23 changes: 11 additions & 12 deletions scripts/docker/gatkbase/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
# How to update the GATK base docker image:

1. choose a new version number for the base image and manually update the version in `scripts/docker/gatkbase/build_docker_base.sh`
2. build the gatkbase image using that script and upload it to the [gatk-dev docker repo](https://hub.docker.com/r/broadinstitute/gatk-dev/) or [gcr-gatk-snapshots](us.gcr.io/broad-dsde-methods/broad-gatk-snapshots)
* cd to scripts/docker/gatkbase
* run `./build_docker_base.sh`
* `docker tag broadinstitute/gatk-dev:your-version-rc1` or whatever the correct tag is for where you want it uploaded
* `docker push tagname`
3. update the Dockerfile in the main gatk to use the image you pushed
4. commit the changes to the two docker files and to a new pull request
5. wait for the tests to pass and show it to a reviewer
6. push the base image to the official [gatk repo](https://hub.docker.com/r/broadinstitute/gatk) with the right name
7. update the main docker to point to the official version you just released
8. wait for tests to pass in travis and merge
1. In a branch, make whatever updates you need to the base image Dockerfile in `scripts/docker/gatkbase`
2. Choose a new version number for the new base image
3. Build the GATK base image by running either `build_docker_base_cloud.sh` (to build in the cloud using Google Cloud Build) or `build_docker_base_locally.sh` (to build on your local machine) from within the `scripts/docker/gatkbase` directory in your GATK clone.
* Both scripts take the version of the new image as their only argument. Eg., `build_docker_base_cloud.sh 3.0.0rc1` will create the remote image `us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area:gatkbase-3.0.0rc1`
4. If you built the image locally, you'll need to manually push it to the staging area at `us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area`
5. Test the new image using the GATK branch with your changes to the base image Dockerfile, by modifying the FROM clause at the top of the main GATK Dockerfile to point to the base image you staged, and submit a PR for the branch to let the test suite run.
6. Once tests pass and everything looks good, push the new base image to the release repositories using the `release_prebuilt_base_image.sh` script
* The release script takes as arguments the prebuilt image you've tested, and a final version number for release. For example: `release_prebuilt_base_image.sh us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area:gatkbase-3.0.0rc1 3.0.0`
* The release script looks for the image locally first, and if not found locally it pulls it from the remote repository before releasing.
7. Update the FROM clause in the main GATK Dockerfile in your GATK PR to point to the officially-released image, and merge it once tests pass.



53 changes: 0 additions & 53 deletions scripts/docker/gatkbase/build_docker_base.sh

This file was deleted.

31 changes: 31 additions & 0 deletions scripts/docker/gatkbase/build_docker_base_cloud.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
#
# A script that builds the GATK base image using Google Cloud Build, and pushes it to
# a staging location at us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area
#
# Usage: build_docker_base_cloud.sh <docker_image_version>
#
# After staging the image, you should test it with GATK before actually releasing it
# using the release_prebuilt_base_image.sh script. You can test it by modifying the
# main GATK Dockerfile FROM clause to point temporarily at the staged base image, and
# submitting that as a PR to trigger a test suite run.
#

if [ $# -ne 1 ]; then
echo "Usage: $0 <docker_image_version>"
exit 1
fi

IMAGE_VERSION=$1
IMAGE_NAME="us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area"
DOCKER_IMAGE_TAG="${IMAGE_NAME}:gatkbase-${IMAGE_VERSION}"

gcloud builds submit --tag ${DOCKER_IMAGE_TAG} --timeout=24h --machine-type n1_highcpu_32

if [ $? -ne 0 ]; then
echo "gcloud builds submit failed"
exit 1
fi

echo "Successfully published image to staging area at ${DOCKER_IMAGE_TAG}"
exit 0
33 changes: 33 additions & 0 deletions scripts/docker/gatkbase/build_docker_base_locally.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
#
# A script that builds the GATK base image locally (but does not push it anywhere).
#
# Usage: build_docker_base_locally.sh <docker_image_version>
#
# After building the image, you should test it with GATK before actually releasing it
# using the release_prebuilt_base_image.sh script. You can test it by modifying the
# main GATK Dockerfile FROM clause to point temporarily at the candidate base image, and
# submitting that as a PR to trigger a test suite run.
#

if [ $# -ne 1 ]; then
echo "Usage: $0 <docker_image_version>"
exit 1
fi

IMAGE_VERSION=$1
IMAGE_REPO="us.gcr.io/broad-dsde-methods/gatk-base-image-staging-area"
IMAGE_FULL_TAG="${IMAGE_REPO}:gatkbase-${IMAGE_VERSION}"

# Build
echo "Building image to tag ${IMAGE_FULL_TAG}..."
docker build --squash -t "${IMAGE_FULL_TAG}" .

if [ $? -ne 0 ]; then
echo "docker build failed"
exit 1
fi

echo "docker build succeeded"
exit 0

60 changes: 60 additions & 0 deletions scripts/docker/gatkbase/release_prebuilt_base_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/bin/bash
#
# A script that takes a prebuilt GATK base image, and pushes it to the GATK release repositories on
# dockerhub and GCR. Use the build_docker_base_cloud.sh or build_docker_base_locally.sh scripts to
# build the image before running this script.
#
# Usage: release_prebuilt_base_image.sh <prebuilt_image> <version_number_for_release>
#
# If the prebuilt image exists locally, this script will push the local version. Otherwise,
# it will pull the image from the remote repository before pushing it to the GATK release
# repositories.
#
# prebuilt_image: The pre-built image you want to release (make sure you've tested it!)
# May be either local or remote
# version_tag_for_release: The version of the GATK base image you're releasing (eg., 3.1.0)
#

if [ $# -ne 2 ]; then
echo "Usage: $0 <prebuilt_image> <version_tag_for_release>"
exit 1
fi

PREBUILT_IMAGE="$1"
VERSION="$2"
DOCKERHUB_REPO="broadinstitute/gatk"
GCR_REPO="us.gcr.io/broad-gatk/gatk"
BASE_IMAGE_FULL_TAG="gatkbase-${VERSION}"

function fatal_error() {
echo "$1" 1>&2
exit 1
}

function docker_push() {
echo "Pushing to ${1}"
docker push "${1}"
if [ $? -ne 0 ]; then
fatal_error "Failed to push to ${1}"
fi
}

# Test if the prebuilt image exists locally, and pull it if it doesn't
docker image inspect "${PREBUILT_IMAGE}" > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Image ${PREBUILT_IMAGE} not found locally: attempting to pull it now"
docker pull "${PREBUILT_IMAGE}"
if [ $? -ne 0 ]; then
fatal_error "Failed to pull pre-built image ${PREBUILT_IMAGE}"
fi
else
echo "Image ${PREBUILT_IMAGE} found locally: pushing it to the release repositories"
fi

docker tag "${PREBUILT_IMAGE}" "${DOCKERHUB_REPO}:${BASE_IMAGE_FULL_TAG}"
docker tag "${PREBUILT_IMAGE}" "${GCR_REPO}:${BASE_IMAGE_FULL_TAG}"

docker_push "${DOCKERHUB_REPO}:${BASE_IMAGE_FULL_TAG}"
docker_push "${GCR_REPO}:${BASE_IMAGE_FULL_TAG}"

exit 0
2 changes: 1 addition & 1 deletion scripts/gatkcondaenv.yml.template
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:

# core python dependencies
- conda-forge::python=3.6.10 # do not update
- pip=20.0.2 # specifying channel may cause a warning to be emitted by conda
- conda-forge::pip=21.3.1
- conda-forge::mkl=2019.5 # MKL typically provides dramatic performance increases for theano, tensorflow, and other key dependencies
- conda-forge::mkl-service=2.3.0
- conda-forge::joblib=1.1.1 # must pin joblib - versions after 1.1.1 no longer support python 3.6
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,17 +158,18 @@ public void testValidInputs(final String tag,
private static void assertOutputs(final String tag,
final String outputPrefix) {
// vcf.idx files are not reproducible
SystemCommandUtilsTest.runSystemCommand(String.format("h5diff %s/%s %s",
EXPECTED_TEST_FILES_DIR,
tag + LabeledVariantAnnotationsWalker.ANNOTATIONS_HDF5_SUFFIX,
outputPrefix + LabeledVariantAnnotationsWalker.ANNOTATIONS_HDF5_SUFFIX));
SystemCommandUtilsTest.runSystemCommand(String.format("diff %s/%s.vcf %s.vcf",
EXPECTED_TEST_FILES_DIR, tag, outputPrefix));
SystemCommandUtilsTest.runH5Diff(
String.format("%s/%s", EXPECTED_TEST_FILES_DIR, tag + LabeledVariantAnnotationsWalker.ANNOTATIONS_HDF5_SUFFIX),
String.format("%s", outputPrefix + LabeledVariantAnnotationsWalker.ANNOTATIONS_HDF5_SUFFIX));

SystemCommandUtilsTest.runDiff(
String.format("%s/%s.vcf", EXPECTED_TEST_FILES_DIR, tag),
String.format("%s.vcf", outputPrefix));

if (tag.contains("posUn")) {
SystemCommandUtilsTest.runSystemCommand(String.format("h5diff %s/%s %s",
EXPECTED_TEST_FILES_DIR,
tag + ExtractVariantAnnotations.UNLABELED_TAG + ExtractVariantAnnotations.ANNOTATIONS_HDF5_SUFFIX,
outputPrefix + ExtractVariantAnnotations.UNLABELED_TAG + ExtractVariantAnnotations.ANNOTATIONS_HDF5_SUFFIX));
SystemCommandUtilsTest.runH5Diff(
String.format("%s/%s", EXPECTED_TEST_FILES_DIR, tag + ExtractVariantAnnotations.UNLABELED_TAG + ExtractVariantAnnotations.ANNOTATIONS_HDF5_SUFFIX),
String.format("%s", outputPrefix + ExtractVariantAnnotations.UNLABELED_TAG + ExtractVariantAnnotations.ANNOTATIONS_HDF5_SUFFIX));
} else {
Assert.assertFalse(new File(outputPrefix + ExtractVariantAnnotations.UNLABELED_TAG + ExtractVariantAnnotations.ANNOTATIONS_HDF5_SUFFIX).exists());
}
Expand Down
Loading

0 comments on commit 85d13d4

Please sign in to comment.