The goal of mzrtsim is to make raw data and features table simulation for LC/GC-MS based data
You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("yufree/mzrtsim")
You could use simmzml
to generate one mzML file.
library(mzrtsim)
data("monams1")
simmzml(db=monams1, name = 'test')
You will find test.mzML
and corresponding test.csv
with m/z,
retention time and compound name of the peaks. Here the monams1
and
monahrms1
is from the MS1 data of MassBank of North America (MoNA) and
could be downloaded from their
website. You could also
use hmdbcms
to simulate EI source data extracted from
HMDB. Here we only use the MS1 full scan
data for simulation.
You could stimulate two groups of raw data with different peak widths for the same compounds. Retention time could follow a uniform distribution. 100 compounds could be selected randomly and base peaks’ signal to noise ratio could be sample from 100 to 1000. Each group contain 10 samples and 30% compounds are changed between case and control groups.
dir.create('case')
dir.create('control')
# set different peak width for 100 compounds
pw1 <- c(rep(5,30),rep(10,40),rep(15,30))
pw2 <- c(rep(5,20),rep(10,30),rep(15,50))
# set retention time for 100 compounds
rt <- seq(10,590,length.out=100)
set.seed(1)
# select compounds from database
compound <- sample(c(1:4000),100)
set.seed(2)
# select signal to noise ration
sn <- sample(c(100:10000),100)
for(i in c(1:10)){
simmzml(name=paste0('case/case',i),db=monahrms1,pwidth = pw1,compound=compound,rtime = rt, sn=sn)
}
for(i in c(1:10)){
simmzml(name=paste0('control/control',i),db=monahrms1,pwidth = pw2,compound=compound,rtime = rt, sn=sn)
}
Then you could find 10 mzML files in case sub folder and another 10 mzML files in control sub folder, as well as corresponding csv files with m/z, retention time and compound name of the peaks.
You could also use simmzml
to stimulate tailing/leading peaks by
defining the tailing factor of the peaks. When the tailing factor is
lower than 1, the peaks are leading peaks. When the tailing factor is
larger than 1, the peaks are tailing peaks.
# leading peaks
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,tailingfactor = 0.8)
# tailing peaks
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,tailingfactor = 1.5)
You could also input a m/z vector as matrix masses. Those masses will generate background baseline signals. By default, the mass vector is from matrix samples previous published.
data(mzm)
simmzml(name='test',db=monahrms1,pwidth = 10,compound=1,rtime = 100, sn=10,matrixmz = mzm,matrix = TRUE)
You could also use mzrtsim
to make simulation of peak list.
Here we make a simulation of 100 compounds from selected database with two conditions and three batches. 5 percentage of the peaks were influenced by conditions and 10 percentage of the peaks were influenced by batch effects. Three different type could be simulated: monotonic, random and block. You could also bind batch type, for example, ‘mb’ means the simulation would contain both monotonic and block batch effects. ‘db’ means the spectra database to be used for simulation as metioned in raw data simulation section.
library(mzrtsim)
data("monams1")
simdata <- mzrtsim(ncomp = 100, ncond = 2, ncpeaks = 0.05,
nbatch = 3, nbpeaks = 0.1, npercond = 10, nperbatch = c(8, 5, 7), seed = 42, batchtype = 'mb', db=monams1)
You could save the simulated data into multiple csv files by simdata
function. simraw.csv
could be used for metaboanalyst. simraw2.csv
show the raw peaks list. simcon.csv
show peaks influenced by
conditions only.simbatchmatrix.csv
show peaks influnced by batch
effects only. simbat.csv
show peaks influenced by batch effects and
conditions. simcomp.csv
show independent peaks influenced by
conditions and batch effects. simcompchange.csv
show the conditions
changes of each groups. simblockbatchange.csv
show the block batch
changes of each groups. simmonobatchange.csv
show the monotonic batch
changes of each group.
simdata(sim,name = "sim")