Skip to content

Tutorial (Mapping mutations)

Collin Tokheim edited this page Aug 3, 2024 · 10 revisions

HotMAPS pipeline

This tutorial will cover how to go from mutations to identifying hotspot regions in protein structure. The major difference from previous examples is the need to first mapping the mutations to protein structures. Subsequent steps will be similar. An important aspect is that the MuPIT MySQL database will be needed (see [here](MySQL database)).

Mapping mutations to protein structure

Protein structure

You will need to setup a directory containing protein structures. In this example we are only mapping mutations to a few mutations, which are particularly relevant for the protein structure 1e96 that we used in the Quick Start. Therefore you could setup the entire set of protein databank structures and homology models by following the exome scale tutorial. But you could also only setup the protein structure that is needed to run this example by following the instructions in the quick start.

Mutations

HotMAPS uses mutations in MAF format as input. All MAF files should be placed into a single directory (the default location is data/mutations). By default, HotMAPS finds the MAF files by examining the filename. The files should be named in the following format: start with input.TYPE. and ends with .maf. TYPE is used to distinguish between multiple MAF files, which may, for example, have a file with "LUAD" for lung adenocarcinoma and a separate file which has "PRAD" for prostate adenocarcinoma. In this tutorial, we will reuse the same example data found in the quick start example

$ wget https://www.dropbox.com/scl/fi/7x3ss1sn91awajkmxbfhr/1e96_example.tar.gz?rlkey=zvm0vl5e6xos6ut6f8cgqz7o3&st=dkyw48zn&dl=1 -o 1e96_example.tar.gz
$ tar xvzf 1e96_example.tar.gz

For this example, there is a single MAF file (1e96_example/input.HNSC.maf) containing mutations for Head and Neck squamous cell carcinoma.

Mapping mutations

The mutations can be mapped with the prepMutations make command by specifying the directory containing the mutations (1e96_example directory) via the MUT_DIR parameter. If you do not specify the MUT_DIR directory, then the mutations are assumed to be found in the data/mutations directory.

$ make prepMutations \
      MUT_DIR=1e96_example \
      MUPIT_ANNOTATION_DIR=data/annotation/mupit_annotations \
      MYSQL_DB=mupit_modbase \
      MYSQL_USER=myuser \
      MYSQL_PASSWD=mypassword \
      HYPERMUT=500 

Where myuser is your MySQL user name, mypassword should be your password, myhost is the host name for MySQL, and the database name for Mupit by default is mupit_modbase. Hypermutated samples (i.e. samples containing many mutations) can often reduce the statistical power of HotMAPS. We include the HYPERMUT parameter to filter out samples with more mutations than a given number. The MUPIT_ANNOTATION_DIR parameter points to the directory where the mapping information from mutation to protein structure is saved.

Running HotMAPS

Preparing HotMAPS input

Next, the input files need to be generated before starting the next section. The initial input information is retrieved from the MuPIT MySQL database, in contrast to downloading already made files (done in the Initial Setup section of the tutorial). To prepare the input files simply invoke the following make command.

$ make prepareHotspotInput MYSQL_USER=myuser MYSQL_HOST=myhost MYSQL_DB=mydb 

Where myuser is your MySQL user name, myhost is the host name for MySQL, and mydb is the database name for Mupit (Default: mupit_modbase). Note, you will be prompted to type in your mysql password.

HotMAPS

Running HotMAPS algorithm proceeds in the same manner as the previous tutorial's Running 3D HotMAPS section. Note, for this example you will want to use the runNormalHotspot command.