-
Notifications
You must be signed in to change notification settings - Fork 0
/
acorn_vignette.Rmd
273 lines (241 loc) · 9.92 KB
/
acorn_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
title: "acorn Vignette"
author: "Tychele N. Turner, Ph.D., Washington University School of Medicine"
date: "2023-05-18"
output:
pdf_document:
toc: yes
toc_depth: '3'
html_document:
toc: yes
toc_depth: 3
vignette: >
%\VignetteIndexEntry{acorn Vignette}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### Load the acorn Package
```{r}
library('acorn')
```
### Load in the Test DNV Data
readDNV = Reads in a de novo variant (DNV) file in the format of sample,
chromosome, genomic position, reference allele, alternate allele, and
then any optional columns. File must be tab-delimited and the file must
have the data in the order listed above (i.e., sample is field 1, chromosome
is field 2, genomic position is field 3, reference allele is field 4, and
alternate allele is field 5. The file can either be a uncompressed file or
can be a gz compressed file. Please note that the chromosome data should
take the form with a "chr" at the beginning (e.g., chr1).
Returns back a loaded in version of the DNV file that can be assigned to
an object.
```{r}
input <- readDNV(paste(path.package("acorn"),"/extdata/dnms_from_Ng_et_al_2022_Human_Mutation_paper.txt.gz",sep=""))
head(input)
str(input)
```
### Extract an Individual
extractIndividual = Extracts the DNVs out of a dnvObject from a particular
individual. Returns a DNV object containing only DNVs in the specified
individual.
```{r}
ind <- extractIndividual(input, "HG01928")
head(ind)
nrow(ind)
table(ind[,1])
```
### Extract Multiple Individuals
```{r}
ind <- extractIndividual(input, c("HG01928", "HG03915"))
head(ind)
nrow(ind)
table(ind[,1])
```
### extract SNVs
extractSNVs = Extracts single-nucleotide variants (SNVs) out from a DNV object
generated using the readDNV function. Returns a DNV object containing only SNVs.
```{r}
snvs <- extractSNVs(input)
nrow(snvs)
```
### Extract INDELs Only
extractINDELs = Extracts small insertions/deletions (INDELs) out from a DNV
object generated using the readDNV function. Returns a DNV object containing
only INDELs.
```{r}
indels <- extractINDELs(input)
nrow(indels)
```
### Extract MNVs Only (This is a very small test set)
extractMNVs = Extracts multi-nucleotide variants (MNVs) out from a DNV object
generated using the readDNV function. Returns a DNV object containing only MNVs.
```{r}
denovoMNVs <- readDNV(paste(path.package("acorn"),"/extdata/mnv_test.txt",sep=""))
mnvs <- extractMNVs(denovoMNVs)
nrow(mnvs)
```
### Calculate the MNV Lengths
calculateMNVLengths = This function will automatically grab only
the MNVs from the DNV object for the calculation of the MNV lengths
ratio. Returns the length of the MNVs, in the form of an object,
observed in the DNV object. It also returns a barplot of the MNV lengths.
```{r}
calculateMNVLengths(mnvs)
```
### Calculate the Transition/Transversion Ratio
calculateTiTvRatio = This function will automatically grab only the SNVs from
the DNV object for the calculation of the transition/transversion (Ti/Tv) ratio.
Returns the counts of transitions, the counts of transversions, the Ti/Tv ratio,
and a barplot of the different types of SNV changes observed in the DNV object.
```{r}
calculateTiTvRatio(input)
```
### Calculate the Deletion/Insertion Ratio
calculateDeletionInsertionRatio = This function will automatically grab only
the INDELs from the DNV object for the calculation of the deletion/insertion
ratio. Returns the counts of deletions, the counts of insertions, and the
deletion/insertion ratio.
```{r}
calculateDeletionInsertionRatio(input)
```
### Calculate Deletion Lengths
calculateDeletionLengths = This function will automatically grab only the
deletions from the DNV object for the calculation of the length of the
deletions. Returns the length of the deletions, in the form of an object,
observed in the DNV object. It also returns a barplot of the deletion lengths.
```{r}
dellengths <- calculateDeletionLengths(input)
head(dellengths)
```
### Calculate Insertion Lengths
calculateInsertionLengths = This function will automatically grab only the
insertions from the DNV object for the calculation of the length of the
insertions. Returns the length of the insertions, in the form of an object,
observed in the DNV object. It also returns a barplot of the insertion lengths.
```{r}
inslengths <- calculateInsertionLengths(input)
head(inslengths)
```
### Keep Only the Autosomes
extractAutosomes = Extracts the autosomes (chromosomes 1 to 22) out from a DNV
object originally generated using the readDNV function. You can also run this
on objects generated from extractSNVs, extractINDELs, or extractMNVs. Returns
a DNV object containing only DNVs on the autosomes.
```{r}
aut <- extractAutosomes(input)
nrow(aut)
table(aut[,2])
```
### Keep Only the X Chromosome
extractX = Extracts the X chromosome out from a DNV object originally generated
using the readDNV function. You can also run this on objects generated from
extractSNVs, extractINDELs, or extractMNVs. Returns a DNV object containing
only DNVs on the X chromosome.
```{r}
X <- extractX(input)
nrow(X)
table(X[,2])
```
### Keep Only the Y Chromosome (There are Mone on the Y in the Test Dataset)
extractY = Extracts the Y chromosome DNVs out from a DNV object originally
generated using the readDNV function. You can also run this on objects
generated from extractSNVs, extractINDELs, or extractMNVs. Returns a DNV
object containing only DNVs on the Y chromosome.
```{r}
Y <- extractY(input)
nrow(Y)
```
### Calculate Counts per Individual
countsPerIndividual = This function will count the DNVs from a DNV object
originally generated using the readDNV function. You can also run this on
objects generated from extractSNVs, extractINDELs, or extractMNVs. Returns
the mean of the DNV counts per individual, the standard deviation of the DNV
counts per individual, a plot of the density of the DNV counts per individual,
and an object consisting of the sample name and the counts of their DNVs that
can be assigned to another object.
```{r}
counts <- countsPerIndividual(input)
head(counts)
```
### Load in Example Data for Parental Age Analyses
```{r}
input <- readDNV(paste(path.package("acorn"),"/extdata/dnms_from_Ng_et_al_2022_Human_Mutation_paper.txt.gz",sep=""))
countExample <- read.delim(paste(path.package("acorn"),"/extdata/dnm_count_example.txt",sep=""))
parentExample <- read.delim(paste(path.package("acorn"),"/extdata/parental_age_example.txt",sep=""))
```
### Make Parental Age Object
parentalAgeObject = Takes in a counts object that is either the result of
countsPerIndividual() or is already read into an object from a file that
contains the following two fields: sample and number of DNVs. The parental
age object should be read in and contain the following fields: sample,
father age at child's birth, and mother age at child's birth. Returns back
an object with the de novo counts and parental age data together. The
fields in this file are sample, dnm_counts, fatherAge, and motherAge.
```{r}
parents <- parentalAgeObject(countExample, parentExample)
```
### Run Parental Age Analyses Including Both Mother and Father
parentalAge = This function will calculate the correlation between father's
and mother's age at birth and DNV counts per individual, the results of the
linear model taking the form: lm(formula = dnm_counts ~ fatherAge+motherAge,
data = parentalAgeObject) or the exponential model taking the form of
lm(log(dnm_counts)~fatherAge+motherAge, data=parentalAgeObject). Input
required is output from the parentalAgeObject function in this package.
Returns the results of the linear model taking the form:
lm(formula = dnm_counts ~ fatherAge + motherAge, data = parentalAgeObject) or
the exponential model taking the form
lm(log(dnm_counts)~fatherAge+motherAge, data=parentalAgeObject).
It also returns a plot of father's and mother's age at birth and DNV counts.
The default is the "linear" model.
```{r}
parentalAge(parents)
```
If you prefer to run the exponential model use:
```{r}
parentalAge(parents, modelType="exponential")
```
### Run Parental Age Analyses for Father Age Only
fatherAge = This function will calculate the correlation between father's
age at birth and DNV counts per individual, the results of the linear model
taking the form: lm(formula = dnm_counts ~ fatherAge, data = parentalAgeObject)
or the exponential model taking the form
lm(log(dnm_counts)~fatherAge, data=parentalAgeObject).
Input required is output from the parentalAgeObject function in this package.
Returns the correlation between father's age at birth and DNV counts per
individual and the results of the linear model taking the form: lm(formula =
dnm_counts ~ fatherAge, data = parentalAgeObject) or the exponetial model taking
the form lm(log(dnm_counts)~fatherAge, data=parentalAgeObject).
It also returns a plot of father's age at birth and DNV counts.
```{r}
fatherAge(parents)
```
If you prefer to run the exponential model use:
```{r}
fatherAge(parents, modelType="exponential")
```
### Run Parental Age Analyses for Mother Age Only
motherAge = This function will calculate the correlation between mother's
age at birth and DNV counts per individual, the results of the linear model
taking the form: lm(formula = dnm_counts ~ motherAge, data = parentalAgeObject)
or the exponential model taking the form
lm(log(dnm_counts)~motherAge, data=parentalAgeObject).
Input required is output from the parentalAgeObject function in this package.
Returns the correlation between mother's age at birth and DNV counts per
individual and the results of the linear model taking the form: lm(formula =
dnm_counts ~ motherAge, data = parentalAgeObject) or the exponential model
taking the form lm(log(dnm_counts)~motherAge, data=parentalAgeObject).
It also returns a plot of mother's age at birth and DNV counts.
```{r}
motherAge(parents)
```
If you prefer to run the exponential model use:
```{r}
motherAge(parents, modelType="exponential")
```
### Print the Session Information
```{r}
sessionInfo()
```