Set variables for LaTeX export:
Redirect stderr
output to stdoout
so that it gets printed correctly (found on the Kitchin’s blog):
(setq org-babel-default-header-args:sh
'((:prologue . "exec 2>&1") (:epilogue . ":"))
)
(setq org-babel-use-quick-and-dirty-noweb-expansion t)
What follows is a detailed description of the analysis presented during the course. The analysis is carried out with C
programs, gnuplot
(for the figures) and some shell
commands and scripts. The same analysis (or almost the same) is “available” using R
on zenodo and using Python
on GitHub (the IPython
notebook can be visualized directly with nbviewer).
This document is the exploration of an idea: use the shell (bash
or zsh
) instead of the Python
or R
command line for interactive analysis and write the short functions performing the actual analysis in C
. The motivation for this exploration comes from two books by Ben Klemens: Modeling With Data and 21st Century C. The main advantages of C
compared to the other two languages are:
- Its stability (the programs written here are very likely to run unchanged in 20 years from now; what can be sure that this won’t be true with
Python
). - The development tools that come with it are just spectacular (see the very short and very clear book of Brian Gough An Introduction to GCC to understand what I mean by that).
Since a Bash or a Z shell are going to be used, Windows
users will have to install Cygwin, Linux
and MacOS
users should have the bash
shell by default and the zsh
shell readily available from their package manager. To dig deeper into the amazing possibilities (and spectacular editorial support) of these tools, check From Bash to Z Shell. Conquering the Command Line by Kiddle, Peek and Stephenson.
The no-shell codes are going to be written in C
, meaning that a C
compiler together with the “classical” development tools (make
, etc) are required. I’m going to use =gcc= here.
The heavy computational work is going to be performed mainly by the gsl (the GNU Scientific Library) that is easily installed through your package manager (from now one, for windows users, the “package manager” refers to the one of Cygwin
). The graphs are be generated with gnuplot; for a quick tutorial check http://physicspmb.ukzn.ac.za/index.php/Gnuplot_tutorial, for an easy to navigate set of (sophisticated) recipes check http://www.gnuplotting.org/. The data sets are in HDF5 format and the C
library, as well as the command line tools, developed by the HDF5 group are going to be heavily used here.
The literate programming approach is used here. This means that the code is broken into “manageable” pieces that are individually explained (when just reading the code is not enough), they are then pasted together to give the code that will actually get compiled. These manageable pieces are called blocks and each block gets a name like: <<name-of-the-block>>
upon definition. It is then referred to by this name when used in subsequent codes. See Schulte, Davison, Dye and Dominik (2010) A Multi-Language Computing Environment for Literate Programming and Reproducible Research for further explanations.
HDF5
and its C
library is easier to use than the one of HDF5
. Both formats are (essentially) fully supported by Python
libraries (=h5py= and =pyFits=). HDF5
is fully supported in R
by =rhdf5= while the support for FITS
is not as good (especially for writting) and is provided by =FITSio=
The data can be download from the command line using wget
as follows:
wget http://xtof.disque.math.cnrs.fr/data/Data_POMC.hdf5
Doing the conversion with Python
is rather straightforward but
requires, in addition to numpy
, to modules: =h5py= and =PyFITS=.
Once these two modules have been intalled–the =anaconda= distribution
include h5py
and astropy
that is very similar to PyFITS
–write the
following in a file called h5_to_fits.py
(see the
=PyFITS= documentation for details):
#!/usr/bin/env python3
import h5py
import pyfits
import numpy as np
hdf_data = h5py.File("Data_POMC.hdf5",'r')
time_hdf = hdf_data['time'][...]
stack_hdf = hdf_data['stack'][...]
hdf_data.close()
pyfits.writeto('Data_POMC.fits', stack_hdf.swapaxes(0,2).swapaxes(1,2))
pyfits.append('Data_POMC.fits',time_hdf)
hdulist = pyfits.open('Data_POMC.fits',mode='update')
prihdr = hdulist[0].header
prihdr.set('AUTHOR','Andreas Pippow')
prihdr.set('LENGTH',340,'Excitation wavelength (nm)')
prihdr.set('DYE','Fura-2')
prihdr.set('UNITS','ADU')
prihdr.set('REF','Joucla et al (2013) Cell Calcium. 54(2):71-85.')
prihdr['comment'] = 'A stim. (depol. induced calcium entry)\
comes at time 527'
prihdr = hdulist[1].header
prihdr.set('UNITS','second')
prihdr.set('TIME',527,'Stimulation time')
hdulist.flush()
hdulist.close()
Once this code has been saved into a file called h5_to_fits.py
, change its file permission to make executable and use it:
chmod u+x h5_to_fits.py
./h5_to_fits.py
After running this short code you will have file Data_POMC.fits
in your working directory. You can quickly visualize it with ImageJ.
We can start by using some of the “tools” that come with the HDF5
library in order to get a quick idea on the data file structure. We can do that with h5ls
that lists (by default) the first level of a file:
h5ls Data_POMC.hdf5
stack Dataset {60, 80, 168} time Dataset {168}
We see that two “Datasets” are included in the file, one named stack
(a 3-dimensional dataset) and the other named time
(one dimensional).
The more sophisticated program h5dump
gives us more details with:
h5dump -n 1 Data_POMC.hdf5
HDF5 "Data_POMC.hdf5" { FILE_CONTENTS { group / attribute /README dataset /stack attribute /stack/CCD chip dimensions attribute /stack/Dye attribute /stack/Excitation wavelength attribute /stack/Recordings performed by attribute /stack/Reference attribute /stack/Units dataset /time attribute /time/Stimulation time attribute /time/Units } }
We can visualize the content of the README
attribute with (result not shown):
h5dump -a README Data_POMC.hdf5
The content of attribute CCD chip dimensions
of Dataset stack
is most easily visualized with:
h5dump -N "CCD chip dimensions" Data_POMC.hdf5
HDF5 "Data_POMC.hdf5" { ATTRIBUTE "CCD chip dimensions" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "60 x 80 pixels" } } }
We will regenerate the first figure of the course with =gnuplot= and for that we need to extract the “interesting part” of the data file Data_POMC.hdf5
. This will be done with a short C
code that opens the file, gets the stack
Dataset. By the “interesting part” we mean the specification of ranges of rows and columns of the CCD chip. The display we want to generate is a matrix of ADU time series (with the lowest interesting row of the CCD chip at the bottom and the lowest interesting column at the left). Each element of the matrix will correspond a pixel of the interesting part. The ADU time series will all be display using the same horizontal and vertical scales. Since we will display all these time series on a single plot, we will have to scale them. Our program will output data tailored for gnuplot display in ASCII format to the standard output. The data will be organized in two columns with the scaled time (first column) and scaled ADU (second column). Measurements corresponding to successive pixels are separated by a blank line (this allows us to plot the whole set of time series with a single and simple plot
command in gnuplot
, see pp 85-86 of the gnuplot manual). The output starts with “general information” on the data set entered as a gnuplot
comment, that is, following a “#”. Our presentation of the code follows what is properly called literate programming, that is, the program is broken into logical pieces—code blocks whose names appear like <<code-block-name>>
—and only its skeleton is presented first as here for our program read_POMC_stack
that should be saved in a file named read_POMC_stack.c
.
<<read_POMC_stack-include>>
<<read_POMC_stack-define>>
int main(int argc, char *argv[])
{
<<read_POMC_stack-read-args>>
<<read_POMC_stack-open-FILE-and-read-DSET>>
<<read_POMC_stack-find-min-and-max-in-DSET>>
<<read_POMC_stack-print-results>>
<<read_POMC_stack-close-file>>
return 0;
}
The first code block <<read_POMC_stack-include>>
contains the necessary include directives. We need <stdio.h>
to call the printing functions, <stdlib.h>
for the atoi
function and the other two <hdf5.h>
and <hdf5_hl.h>
to get the hdf5 library functions.
#include <stdio.h>
#include <stdlib.h>
#include <hdf5.h>
#include <hdf5_hl.h>
Code block <<read_POMC_stack-define>>
contains expansions definitions. This is where the parameters of the program are defined (to keep the code short we pass only the limits of the interesting part as arguments): the file we are going to work on (FILE
), the Dataset of interest within the file (DSET
):
#define FILE "Data_POMC.hdf5"
#define DSET "stack"
Code block <<read_POMC_stack-read-args>>
reads the four arguments of the program: first_row
, last_row
, first_col
, last_col
. The only check that is made is that four argument are given. This is very rudimentary, we should name the argument to get a proper code and make more checks, but this program is not meant for general use.
if (argc != 5) {
fprintf(stderr,"Expecting four arguments\n");
return -1;
}
size_t first_row = atoi(argv[1]);
size_t last_row = atoi(argv[2]);
size_t first_col = atoi(argv[3]);
size_t last_col = atoi(argv[4]);
Code block <<read_POMC_stack-open-FILE-and-read-DSET>>
opens the FILE
, gets information on the dimensions, initializes the number of rows (nrow
), the number of columns (ncol
) and the number of time points (nsamp
) before reading the data, storing the result in the 3D array data
:
// Open FILE
hid_t file_id = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,DSET,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read DSET in DATA
int data[nrow][ncol][nsamp];
H5LTread_dataset_int(file_id,"/stack",data);
The strategy used in the last code block will work fine for “small” datasets since the line int data[nrow][ncol][nsamp];
automatically allocates memory in the stack
. If you run into problems running a similar code you would have to allocate “by hand” memory in the heap
with something like:
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
Since such an allocation is done by hand you would have to free the memory yourself before the return
statement with:
free(data);
Within the code, statements like: data[i][j][k]
should then be replaced by: data[(i*ncol+j)*nsamp+k]
.
Code block <<read_POMC_stack-find-min-and-max-in-DSET>>
looks at each measurement (ADU) of the interesting part and find the smallest and largest values stored in variables adu_min
and adu_max
:
// Find out the smallest and largest observations
// in the selected part of DSET
int adu_max=0;
int adu_min=10000;
for (size_t i=first_row; i<last_row; i++) {
for (size_t j=first_col; j<last_col; j++) {
for (size_t k=0; k<nsamp; k++) {
double adu = data[i][j][k];
if (adu < adu_min) adu_min=adu;
if (adu > adu_max) adu_max=adu;
}
}
}
Code block <<read_POMC_stack-print-results>>
prints some general information on the dataset and on the interesting part of it (as gnuplot
comments) before printing the scaled time and adu:
// Print some info to STDOUT
printf("# Data set stack from file: %s\n",FILE);
printf("# Data set dimensions: (%d,%d,%d)\n",nrow,ncol,nsamp);
printf("# Using rows from %d (inclusive) to %d (exclusive)\n",first_row,last_row);
printf("# Using columns from %d (inclusive) to %d (exclusive)\n",first_col,last_col);
printf("# Minimal ADU in this range: %d; maximal value: %d\n",adu_min,adu_max);
double adu_delta = (double) (adu_max-adu_min);
// Write the DATA in a 2 columns format with time in the first and normalized
// ADU in the second
for (size_t i=first_row; i<last_row; i++) {
double y_min = i-first_row;
for (size_t j=first_col; j<last_col; j++) {
for (size_t k=0; k<nsamp; k++) {
double adu = (data[i][j][k]-adu_min)/adu_delta+y_min;
printf("%g %g\n",((double) k/nsamp+j-first_col),adu);
}
printf("\n");
}
}
Code block <<read_POMC_stack-close-file>>
close the hdf5 file:
H5Fclose (file_id);
Once the code has been properly “tangled” and stored in a file named read_POMC_stack.c
(in the code
sub-directory) we compile it with gcc (we could also do it with another compiler):
gcc -W -g -o code/read_POMC_stack code/read_POMC_stack.c -lhdf5 -lhdf5_hl -lm -std=gnu11
We then call our code and redirect the output to a file called stack.txt
. Row 23 is the first row of interest while row 34 is the last of interest. Column 33 is the first of interest while column 43 is the last one (don’t forget we start counting at 0 and we are using Python
convention):
./code/read_POMC_stack 23 35 33 44 > stack.txt
We can look at the first lines of stack.txt
with the head
program:
head stack.txt
# Data set stack from file: Data_POMC.hdf5 # Data set dimensions: (60,80,168) # Using rows from 23 (inclusive) to 35 (exclusive) # Using columns from 33 (inclusive) to 44 (exclusive) # Minimal ADU in this range: 261; maximal value: 1118 0 0.0536756 0.00595238 0.0793466 0.0119048 0.0548425 0.0178571 0.0700117 0.0238095 0.0723454
Our figure is then simply generated with the following gnuplot
commands:
unset key
unset tics
set xlabel ""
set ylabel ""
plot 'stack.txt' using 1:2 with lines\
linecolor rgb "black"
We get the second figure that shows the ADU time series of pixel (27,39) by first dumping the data to a file:
./code/read_POMC_stack 27 28 39 40 > sgl_pixel.txt
Then, within gnuplot
:
set xlabel "Time (s)"
set ylabel "ADU count"
set grid
set tics
unset key
plot 'sgl_pixel.txt' using (525+25*$1):(776+$2*(937-776)) \
with lines linewidth 2 linecolor rgb "blue"
We are going to use the =CFITSIO= library that should therefore be installed in order to compile the following code.
/** @file var_stab_fits.c
* @brief Stabilizes the variance given a CCD gain and read-out
* variance of data contained in a FITS file.
*
* The code is an adaptation of imarith from the set of
* exemples of the CFITSIO library.
* @author C Pouzat <[email protected]>
*/
#include <string.h>
#include <stdio.h>
#include <math.h>
#include "fitsio.h"
int main(int argc, char *argv[])
{
if (argc != 5) {
printf("Usage: var_stab_fits infile gain s2 outfile \n");
printf("\n");
printf("Stablizes at 1 the variance of infile that is:\n"
"each pixel value v is replaced by 2*sqrt(v/g+s2)\n"
"where g is the gain and s2 the read-out variance\n"
"of the CCD chip.\n");
printf("The variance stabilized version is"
"stored in outfile.\n\n");
return(0);
}
fitsfile *infptr; /* FITS file pointers */
int status = 0; /* CFITSIO status value MUST be initialized to zero! */
fits_open_file(&infptr, argv[1], READONLY, &status); /* open input images */
if (status) {
/* print error message */
fits_report_error(stderr, status);
return(status);
} else {
printf("File %s opened.\n",argv[1]);
}
int innaxis;
/* read dimensions */
fits_get_img_dim(infptr, &innaxis, &status);
long innaxes[3] = {1,1,1};
fits_get_img_size(infptr, 3, innaxes, &status);
if (status) {
/* print error message */
fits_report_error(stderr, status);
return(status);
}
int check=1;
if (innaxis > 3) {
fprintf(stderr,"Error: images with > 3"
"dimensions are not supported\n");
check = 0;
}
double gain = atof(argv[2]);
if (gain <= 0) {
fprintf(stderr,"The CCD gain must be > 0\n");
check = 0;
}
printf("The CCD chip gain is: %g\n",gain);
double s2 = atof(argv[3]);
if (s2 < 0) {
fprintf(stderr,"The CCD read-out variance must be >= 0\n");
check = 0;
}
printf("The CCD chip read-out variance is: %g\n",s2);
fitsfile *outfptr; /* FITS file pointers */
/* create the new empty output file if the above checks are OK */
if (check && !fits_create_file(&outfptr, argv[4], &status) )
{
printf("File %s opened.\n",argv[4]);
/* copy all the header keywords from first image to new output file */
fits_copy_header(infptr, outfptr, &status);
size_t npixels = innaxes[0]; /* no. of pixels to read in each row */
/* mem for 1 row */
double *inpix = malloc(npixels * sizeof(double));
if (inpix == NULL) {
fprintf(stderr,"Memory allocation error\n");
return(1);
}
/* loop over all planes of the cube (2D images have 1 plane) */
long firstpix[3] = {1,1,1};
for (firstpix[2] = 1; firstpix[2] <= innaxes[2]; firstpix[2]++) {
/* loop over all rows of the plane */
for (firstpix[1] = 1; firstpix[1] <= innaxes[1]; firstpix[1]++) {
/* Read both images as doubles, regardless of
actual datatype. Give starting pixel
coordinate and no. of pixels to read. This
version does not support undefined pixels
in the image. */
if (fits_read_pix(infptr, TDOUBLE, firstpix,
npixels, NULL, inpix, NULL, &status))
break; /* jump out of loop on error */
for(size_t ii=0; ii < npixels; ii++)
inpix[ii] = 2*sqrt(inpix[ii]/gain+s2);
/* write new values to output image */
fits_write_pix(outfptr, TDOUBLE, firstpix, npixels,
inpix, &status);
}
} /* end of loop over planes */
fits_close_file(outfptr, &status);
printf("File %s closed.\n",argv[4]);
free(inpix);
}
fits_close_file(infptr, &status);
printf("File %s closed.\n",argv[1]);
/* print any error message */
if (status) fits_report_error(stderr, status);
return(status);
}
If this code has been written in a file named var_stab_fits.c
the
compilation is done with:
gcc -Wall -g -o var_stab_fits -lcfitsio -lm var_stab_fits.c
And the function is called with:
./var_stab_fits Data_POMC.fits 0.14 290 Data_POMC_stab.fits
File Data_POMC.fits opened. The CCD chip gain is: 0.14 The CCD chip read-out variance is: 290 File Data_POMC_stab.fits opened. File Data_POMC_stab.fits closed. File Data_POMC.fits closed.
We write a short C
code that takes:
F_inf
- an asymptotic value whose default is 800.
Delta
- a jump size whose default is 150.
beta
- an inverse time constant whose default is 0.1.
t_onset
- an onset time whose default is 5.
t_0
- the time at which the observation starts (default of 0).
delta_t
- the interval between successive observations (default of 0.15).
n_obs
- the number of observations (default of 168).
The program returns a three columns text file containing:
- first column
- the observation times.
- second column
- the corresponding deterministic part.
- third column
- the observations (
deterministic part + noise
).
Here the deterministic part is given by:
\begin{equation}\label{eq:deterministic-mono-exp}
\mathtt{F}(t) = \mathtt{F\_\mathtt{inf}} + H(t-\mathtt{t\_\mathtt{onset}}) \mathtt{Delta} exp \left(-\mathtt{beta} (t-\mathtt{t\_\mathtt{onset}})\right) \, ,
\end{equation}
where t_0
+ i x delta_t
and we have:
\begin{equation}\label{eq:deterministic-mono-exp-discrete}
\mathtt{F}_i = \mathtt{F\_\mathtt{inf}} + H(\mathtt{t\_0}+i \, \mathtt{delta\_t}-\mathtt{t\_\mathtt{onset}}) \mathtt{Delta} exp \left(-\mathtt{beta} (\mathtt{t\_0}+i \, \mathtt{delta\_t}-\mathtt{t\_\mathtt{onset}})\right) \quad \mathrm{for} \quad i=0,1,\ldots,\mathtt{n\_\mathtt{obs}}-1 \, .
\end{equation}
The observations are generated by drawing a random number from a Poisson distribution whose parameter is the deterministic part at the time considered:
\begin{equation}\label{eq:obs-mono-exp-discrete}
\mathtt{Y}_i ∼ \mathtt{Pois}(\mathtt{F}_i) \, .
\end{equation}
Compared with our previous program, read_POMC_stack
, we have to do more efforts at the argument reading stage and we have to do more “interesting” computations since we have to draw random numbers, compute exponential, etc. All this technical part will be carried out by calling functions from the GSL (GNU Scientific Library
). Its Reference manual is clearly written and contains many examples that are the best way to learn how to use the library’s functions.
We now write a program sim_mono_exp
(that we store in a file called sim_mono_exp.c
in the code
sub-directory) fulfilling the requirements of the previous section. The code skeleton is:
<<sim_mono_exp-include>>
int main(int argc, char *argv[])
{
<<sim_mono_exp-usage>>
<<sim_mono_exp-read-args>>
<<sim_mono_exp-setup-rng>>
<<sim_mono_exp-simulate-and-print-results>>
<<sim_mono_exp-free-rng>>
return 0;
}
Every program that does some (half) decent command line argument reading, like sim_mono_exp
, will have a usage
output printed to the standard output (stdout
) with a short description of what the program arguments are and how to use them. This string is here defined in code block <<sim_mono_exp-usage>>
.
Code block <<sim_mono_exp-include>>
contains the include directives, getopt
is required for reading the command line arguments, math
for the exp
function and two gsl
directives are required for the random number generation functions.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
Code block <<sim_mono_exp-usage>>
declares string usage
(making use of the string literal concatenation). That is what gets printed to the stdout
when something “wrong” is entered by the user:
static char usage[] = \
"usage: %s [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ...\n" \
" ... [--t_0=real] [--delta_t=real] [--n_obs=int]\n\n"\
" --F_inf <positive real>: asymptotic value (default 800)\n"\
" --Delta <positive real>: signal jump amplitude (default 150)\n"\
" --beta <positive real>: inverse time constant (default 0.1)\n"\
" --t_onset <real>: the stimulus onset time (default 5)\n" \
" --t_0 <real>: observations start time (default 0)\n"\
" --delta_t <positive real>: sampling period (default 0.15)\n"\
" --n_obs <positive integer>: number of observations (default 168)\n\n";
Code block <<sim_mono_exp-read-args>>
reads the command line arguments, checks that they have acceptable values and initializes the following variables: F_inf
, Delta
, beta
, t_onset
, t_0
, delta_t
, n_obs
. Here the =getop_long= function is heavily used. This is the longest code block of the program…
double F_inf = 800;
double Delta = 150;
double beta = 0.1;
double t_onset = 5;
double t_0 = 0;
double delta_t = 0.15;
size_t n_obs = 168;
{int opt;
static struct option long_options[] = {
{"F_inf",optional_argument,NULL,'f'},
{"Delta",optional_argument,NULL,'d'},
{"beta",optional_argument,NULL,'b'},
{"t_onset",optional_argument,NULL,'o'},
{"t_0",optional_argument,NULL,'s'},
{"delta_t",optional_argument,NULL,'t'},
{"n_obs",optional_argument,NULL,'n'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hf:d:b:o:s:t:n",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
F_inf = (double) atof(optarg);
if (F_inf <= 0)
{
fprintf(stderr,"The asymptotic value should be > 0.\n");
return -1;
}
}
break;
case 'd':
{
Delta = (double) atof(optarg);
if (Delta <= 0)
{
fprintf(stderr,"Delta should be > 0.\n");
return -1;
}
}
break;
case 'b':
{
beta = (double) atof(optarg);
if (beta <= 0)
{
fprintf(stderr,"beta should be > 0.\n");
return -1;
}
}
break;
case 'o':
{
t_onset = (double) atof(optarg);
}
break;
case 's':
{
t_0 = (double) atof(optarg);
}
break;
case 't':
{
delta_t = (double) atof(optarg);
if (delta_t <= 0)
{
fprintf(stderr,"delta_t should be > 0.\n");
return -1;
}
}
break;
case 'n':
{
n_obs = (size_t) atoi(optarg);
if (n_obs <= 0)
{
fprintf(stderr,"n_obs should be > 0.\n");
return -1;
}
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
}
Code block <<sim_mono_exp-setup-rng>>
sets up the a random number generator
named r
:
gsl_rng * r;
gsl_rng_env_setup();
r = gsl_rng_alloc(gsl_rng_default);
Code block <<sim_mono_exp-free-rng>>
frees the memory occupied by the rng r
:
gsl_rng_free(r);
Code block <<sim_mono_exp-simulate-and-print-results>>
computes the deterministic part (variable ideal
in the code) and generates the observations by calling gsl_ran_poisson
before writing the results to the stdout
:
for (size_t i=0; i<n_obs; i++) {
double t = t_0 + i*delta_t;
double ideal = F_inf;
if (t >= t_onset) ideal += Delta*exp(-beta*(t-t_onset));
double obs = gsl_ran_poisson(r, ideal);
printf("%g %g %g\n",t,ideal,obs);
}
Once the code has been properly “tangled” and stored in a file named sim_mono_exp.c
(in the code
sub-directory) we compile it with gcc (we could also do it with another compiler):
gcc -W -g -o code/sim_mono_exp code/sim_mono_exp.c -lgsl -lgslcblas -lm -std=gnu11
We can now check that our new program prints some hint when a wrong parameter value is given at the command line:
./code/sim_mono_exp --F_inf=-10
The asymptotic value should be > 0.
We can also get help with:
./code/sim_mono_exp --help
usage: ./code/sim_mono_exp [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ... ... [--t_0=real] [--delta_t=real] [--n_obs=int] --F_inf <positive real>: asymptotic value (default 800) --Delta <positive real>: signal jump amplitude (default 150) --beta <positive real>: inverse time constant (default 0.1) --t_onset <real>: the stimulus onset time (default 5) --t_0 <real>: observations start time (default 0) --delta_t <positive real>: sampling period (default 0.15) --n_obs <positive integer>: number of observations (default 168)
We get a simulation using parameters parameter values leading results similar to the previous courses with:
./code/sim_mono_exp --F_inf=100 --Delta=900 --beta=1 --n_obs=51 \
--t_onset=0 --t_0=0 --delta_t=0.1 > sim_mono.txt
By default, the random number generator used by the GSL
and therefore by our sim_mono_exp
code is the Mersenne-Twister
of Matsumoto and Nishimura (version of 2002) and the default seed is zero (if you have no clue about random number generators, check the chapter “Random Number Generation” by Pierre L’Ecuyer in the Handbook of Computational Statistics). We can change both of these features, setting the seed at 123 and using the “taus2” generator of L’Ecuyer by setting environment variables like this:
GSL_RNG_SEED=123 GSL_RNG_TYPE=taus2 ./code/sim_mono_exp --F_inf=100 \
--Delta=900 --beta=1 --n_obs=51 --t_onset=0 --t_0=0 \
--delta_t=0.1 > sim_mono2.txt
We get a plot of the simulated data together with the deterministic curve and the two selected points with the command sequence that follows. We are making use of the substitution of system commands in backquotes capability of gnuplot
(see the section with that name on p. 42 of the User manual). In order to show our two chosen points–whose time and amplitude coordinates are located on lines 37 and 102 of the file we just generated, sim_mono.txt
–we extract the first column on line 37 to get the time of the first point with t1 = "`sed -n '37,37p' sim_mono.txt | awk '{print $1}'`"
(the =sed= command extracts the line and pipes the result to the =awk= command that extracts the first column) and the third column of the same line to get the amplitude (we repeat this two commands sequences to get the coordinates of the second point). The second data set that gets plotted is again obtained by piping the result of the execution of a shell command: "<sed -n '37,37p' sim_mono.txt"
. Since gnuplot
has limited capabilities for manipulating data files, we are using general POSIX programs like sed
and awk
to do it.
set grid
set xlabel "Time (s)"
set ylabel "Observations"
unset key
t1 = "`sed -n '4,4p' sim_mono.txt | awk '{print $1}'`"
y1 = "`sed -n '4,4p' sim_mono.txt | awk '{print $3}'`"
t2 = "`sed -n '31,31p' sim_mono.txt | awk '{print $1}'`"
y2 = "`sed -n '31,31p' sim_mono.txt | awk '{print $3}'`"
set arrow 1 from t1,0 to t1,y1 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 1 "t1" at t1+0.1,25
set arrow 2 from 0,y1 to t1,y1 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 2 "y1" at 0.1,y1-25
set arrow 3 from t2,0 to t2,y2 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 3 "t2" at t2+0.1,25
set arrow 4 from 0,y2 to t2,y2 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 4 "y2" at 0.1,y2+25
plot [0:5][0:1100] 'sim_mono.txt' using 1:3 with points pointtype 6 linecolor rgb "black",\
"" using 1:2 with lines linecolor rgb "red" linewidth 2,\
"<sed -n '4,4p' sim_mono.txt" using 1:3 with points pointtype 7 linecolor rgb "blue",\
"<sed -n '31,31p' sim_mono.txt" using 1:3 with points pointtype 7 linecolor rgb "blue"
We explore here in a simple setting (only one model parameter to estimate) the consequences at the inference level of a wrong noise model. So we start from the “simple” mono-exponential decay of the previous section, assume that only the inverse time constant, beta
, is unknown and that only two observations, like the two points of the previous figure, are available. We then consider two estimators, first the classical residual sum of square (RSS
) minimizer:
\begin{equation}\label{eq:classical-RSS}
˜{β} ≡ arg min_{β} ∑_{i=1}^2\left(y_i -\mathtt{F\_\mathtt{inf}} - H(t_i-\mathtt{t\_\mathtt{onset}})\, \mathtt{Delta}\, exp \left(-β (t_i-\mathtt{t\_\mathtt{onset}})\right) \right)^2
\end{equation}
and the minimizer of a residual sum computed from the square root of the observations and the predictions:
\begin{equation}\label{eq:sqrt-RSS}
\hat{β} ≡ arg min_{β} ∑_{i=1}^2\left(\sqrt{y_i} -\sqrt{\mathtt{F\_\mathtt{inf}} + H(t_i-\mathtt{t\_\mathtt{onset}})\, \mathtt{Delta}\, exp \left(-β (t_i-\mathtt{t\_\mathtt{onset}})\right)} \right)^2 \, .
\end{equation}
We want to study the sampling distribution of the two estimators. To that end we are going to simulate a large number of data sets
Minimizing a residual sum of squares can be done with linear or, depending on the problem, non-linear regression routines. The predictors in the two cases above (Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS}) are non-linear functions of the parameter of interest (gsl
manual. We will moreover start with a single case (the one we just simulated) before embarking in the full scale simulation study. We can then just adapt the first example of the gsl
manual to suit our needs.
The functions that the non-linear regression routines of the gsl
minimize must return a vector of residuals not the sum of squared residuals. They must take three formal parameters:
- x
- a (pointer to a gsl) vector of model parameters (in the case of Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS}, this vector contains a single element, the parameter $β$).
- params
- a pointer to an adequate structure (in general) containing everything fixed (or known) and required to compute the residual (in the case of Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS} it should contain $t_1,t_2,y_1,y_2,\mathtt{F\_\mathtt{inf}},\mathtt{t\_\mathtt{onset}},\mathtt{Delta}$).
- f
- a (pointer to a gsl) vector that will contain the residuals.
In addition to the required function returning the residuals, we can provide a function returning the “Jacobian” matrix with has many rows as there are observations and as many column as there are parameters to estimate. Element gsl
routines using finite differences.
We will therefore construct our new program following the lines of the previous one =sim_mono_exp=, essentially changing the end part of if in order to estimate beta
instead of writing the simulated data to the standard output. We therefore define a program expl_beta_est
(that we store in a file called expl_beta_est.c
) whose somewhat “heavy” skeleton is:
<<sim_mono_exp-include>>
<<expl_beta_est-extra-include>>
<<expl_beta_est-print-result-macro>>
<<expl_beta_est-data-structure>>
<<expl_beta_est-tilde_f>>
<<expl_beta_est-tilde_df>>
<<expl_beta_est-hat_f>>
<<expl_beta_est-hat_df>>
<<expl_beta_est-callback>>
int main(int argc, char *argv[])
{
<<sim_mono_exp-usage>>
<<sim_mono_exp-read-args>>
<<sim_mono_exp-setup-rng>>
<<expl_beta_est-simulate>>
<<expl_beta_est-set-tolerance>>
<<expl_beta_est-fdf-tilde>>
<<expl_beta_est-setup-tilde-solver>>
<<expl_beta_est-inital-cost-tilde>>
<<expl_beta_est-get-beta-tilde>>
PRINT(tilde)
<<expl_beta_est-fdf-hat>>
<<expl_beta_est-setup-hat-solver>>
<<expl_beta_est-inital-cost-hat>>
<<expl_beta_est-get-beta-hat>>
PRINT(hat)
<<sim_mono_exp-free-rng>>
<<expl_beta_est-free-solver>>
<<expl_beta_est-free-covar>>
return 0;
}
We first recycle with <<sim_mono_exp-include>>
the header of our previous program and we extend it with <<expl_beta_est-extra-include>>
in order to have the signatures of the non-linear fitting routines as well as gsl_vector
and gsl_matrix
types and functions definitions:
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_multifit_nlinear.h>
We define next a PRINT
macro in code block <<expl_beta_est-print-result-macro>>
. This is to save code writing since it allows us to write “code that generates code” since text substitution of macros is performed by the preprocessor. Calling following macro with what
set to tilde
will result in a third line having w_tilde
where there is w_##what
in the macro definition.
#define PRINT(what) { \
fprintf(stderr, "summary from method '%s/%s'\n", \
gsl_multifit_nlinear_name(w_##what), \
gsl_multifit_nlinear_trs_name(w_##what)); \
fprintf(stderr, "number of iterations: %zu\n", \
gsl_multifit_nlinear_niter(w_##what)); \
fprintf(stderr, "function evaluations: %zu\n", fdf_##what.nevalf); \
fprintf(stderr, "Jacobian evaluations: %zu\n", fdf_##what.nevaldf); \
fprintf(stderr, "reason for stopping: %s\n", \
(info_##what == 1) ? "small step size" : "small gradient"); \
fprintf(stderr, "initial |f(x)| = %f\n", sqrt(chisq0_##what)); \
fprintf(stderr, "final |f(x)| = %f\n", sqrt(chisq_##what)); \
fprintf (stderr, "beta = %.5f +/- %.5f\n", \
gsl_vector_get(w_##what->x, 0), \
1.96*sqrt(gsl_matrix_get(covar_##what,0,0))); \
fprintf (stderr, "status = %s\n", gsl_strerror (status_##what)); \
}
Code block <<expl_beta_est-data-structure>>
contains the definition of a structure, names data
that will hold what is fixed and required to compute the residuals.
struct data {
size_t n; // number of observations
double * t; // pointer to an array (of size n) holding t_1 and t_2
double * y; // pointer to an array (of size n) holding y_1 and y_2
double F_inf,Delta,t_onset; // the known model parameters
};
Code block <<expl_beta_est-tilde_f>>
contains the definition of the function returning the residuals corresponding to Eq. \ref{eq:classical-RSS}.
int
tilde_f (const gsl_vector * x, void *data,
gsl_vector * f)
{
size_t n = ((struct data *)data)->n;
double *y = ((struct data *)data)->y;
double *t = ((struct data *)data)->t;
double F_inf = ((struct data *)data)->F_inf;
double Delta = ((struct data *)data)->Delta;
double t_onset = ((struct data *)data)->t_onset;
double beta = gsl_vector_get (x, 0);
for (size_t i = 0; i < n; i++) {
/* Model Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t) */
double Yi = F_inf;
double delta_t = t[i]-t_onset;
if (delta_t >= 0.)
Yi += Delta * exp (-beta * delta_t);
gsl_vector_set (f, i, Yi - y[i]);
}
return GSL_SUCCESS;
}
Code block <<expl_beta_est-tilde_df>>
contains the definition of the function returning the Jacobian corresponding to Eq. \ref{eq:classical-RSS}.
int
tilde_df (const gsl_vector * x, void *data,
gsl_matrix * J)
{
size_t n = ((struct data *)data)->n;
double *y = ((struct data *)data)->y;
double *t = ((struct data *)data)->t;
double F_inf = ((struct data *)data)->F_inf;
double Delta = ((struct data *)data)->Delta;
double t_onset = ((struct data *)data)->t_onset;
double beta = gsl_vector_get (x, 0);
for (size_t i = 0; i < n; i++) {
/* Jacobian matrix J(i,j) = dfi / dxj, */
/* where fi = (Yi - yi)/sigma[i], */
/* Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t) */
/* and xj is the model parameter beta */
double delta_t = t[i] - t_onset;
if (delta_t >= 0) {
double e = exp(-beta * delta_t);
gsl_matrix_set (J, i, 0, -delta_t * Delta * e);
} else {
gsl_matrix_set (J, i, 0, 0);
}
}
return GSL_SUCCESS;
}
Code block <<expl_beta_est-hat_f>>
contains the definition of the function returning the residuals corresponding to Eq. \ref{eq:sqrt-RSS}.
int
hat_f (const gsl_vector * x, void *data,
gsl_vector * f)
{
size_t n = ((struct data *)data)->n;
double *y = ((struct data *)data)->y;
double *t = ((struct data *)data)->t;
double F_inf = ((struct data *)data)->F_inf;
double Delta = ((struct data *)data)->Delta;
double t_onset = ((struct data *)data)->t_onset;
double beta = gsl_vector_get (x, 0);
for (size_t i = 0; i < n; i++) {
/* Model Yi = sqrt(F_inf + H(t-t_onset) * Delta * exp(-beta * t)) */
double Yi = F_inf;
double delta_t = t[i]-t_onset;
if (delta_t >= 0)
Yi += Delta * exp (-beta * delta_t);
gsl_vector_set (f, i, sqrt(Yi) - sqrt(y[i]));
}
return GSL_SUCCESS;
}
Code block <<expl_beta_est-hat_df>>
contains the definition of the function returning the Jacobian corresponding to Eq. \ref{eq:sqrt-RSS}.
int
hat_df (const gsl_vector * x, void *data,
gsl_matrix * J)
{
size_t n = ((struct data *)data)->n;
double *y = ((struct data *)data)->y;
double *t = ((struct data *)data)->t;
double F_inf = ((struct data *)data)->F_inf;
double Delta = ((struct data *)data)->Delta;
double t_onset = ((struct data *)data)->t_onset;
double beta = gsl_vector_get (x, 0);
for (size_t i = 0; i < n; i++) {
/* Jacobian matrix J(i,j) = dfi / dxj, */
/* where fi = (Yi - yi)/sigma[i], */
/* Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t) */
/* and xj is the model parameter beta */
double delta_t = t[i] - t_onset;
if (delta_t >= 0) {
double e = exp(-beta * delta_t);
double Yi = F_inf+Delta*e;
gsl_matrix_set (J, i, 0, -0.5 * delta_t * Delta * e / sqrt(Yi));
} else {
gsl_matrix_set (J, i, 0, 0);
}
}
return GSL_SUCCESS;
}
We optionally pass to gsl
routine doing the actual optimization =gsl_multifit_nlinear_driver= a “callback” function that is called after each iteration of the solver and that can be use to print information on the progress of the procedure. This is what code black <<expl_beta_est-callback>>
does here:
void
callback(const size_t iter, void *params,
const gsl_multifit_nlinear_workspace *w)
{
gsl_vector *f = gsl_multifit_nlinear_residual(w);
gsl_vector *x = gsl_multifit_nlinear_position(w);
fprintf(stderr, "iter %2zu: beta = %.4f, |f(x)| = %.4f\n",
iter,
gsl_vector_get(x, 0),
gsl_blas_dnrm2(f));
}
Code block <<expl_beta_est-simulate>>
simulates the data after declaring vectors y
and t
to hold them. A data
structure, called d
, is then declared and initialized by designated initialization. A vector (of length 1) holding the initial guess (set to the value used to simulate the data, that is the true value) is declared and initialized and a gsl_vector_view
associated to it is declared and initialized (this is required for using the fitting routines):
double y[n_obs], t[n_obs];
for (size_t i=0; i<n_obs; i++) {
t[i] = t_0 + i*delta_t;
double ideal = F_inf;
if (t[i] >= t_onset) ideal += Delta*exp(-beta*(t[i]-t_onset));
y[i] = gsl_ran_poisson(r, ideal);
}
struct data d = { .n = n_obs, .t = t, .y = y,
.F_inf = F_inf, .Delta = Delta,
.t_onset = t_onset};
double x_init[1] = { beta }; /* starting values */
gsl_vector_view x = gsl_vector_view_array (x_init, 1);
Code block <<expl_beta_est-set-tolerance>>
contains the declaration and initialization of the tolerances used to stop the iterative fitting procedure. Check the manual to see what they mean (you should really check that).
const double xtol = 1e-8;
const double gtol = 1e-8;
const double ftol = 0.0;
5.2.2.12 <<expl_beta_est-setup-tilde-solver>>
: solver memory allocation and initialization for the first estimator
const gsl_multifit_nlinear_type *T_tilde = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_tilde =
gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_tilde = gsl_multifit_nlinear_alloc (T_tilde,
&fdf_params_tilde,
n_obs, 1);
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_tilde, w_tilde);
5.2.2.13 <<expl_beta_est-setup-hat-solver>>
: solver memory allocation and initialization for the second estimator
const gsl_multifit_nlinear_type *T_hat = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_hat =
gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_hat = gsl_multifit_nlinear_alloc (T_hat,
&fdf_params_hat,
n_obs, 1);
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_hat, w_hat);
gsl_multifit_nlinear_free (w_tilde);
gsl_multifit_nlinear_free (w_hat);
gsl_multifit_nlinear_fdf fdf_tilde;
fdf_tilde.f = tilde_f;
fdf_tilde.df = tilde_df; /* set to NULL for finite-difference Jacobian */
fdf_tilde.fvv = NULL; /* not using geodesic acceleration */
fdf_tilde.n = n_obs;
fdf_tilde.p = 1;
fdf_tilde.params = &d;
gsl_multifit_nlinear_fdf fdf_hat;
fdf_hat.f = hat_f;
fdf_hat.df = hat_df; /* set to NULL for finite-difference Jacobian */
fdf_hat.fvv = NULL; /* not using geodesic acceleration */
fdf_hat.n = n_obs;
fdf_hat.p = 1;
fdf_hat.params = &d;
We get the initial residual and the initial RSS
value for the first estimator:
gsl_vector *f_tilde = gsl_multifit_nlinear_residual(w_tilde);
double chisq0_tilde;
gsl_blas_ddot(f_tilde, f_tilde, &chisq0_tilde);
We get the initial residual and the initial RSS
value for the second estimator:
gsl_vector *f_hat = gsl_multifit_nlinear_residual(w_hat);
double chisq0_hat;
gsl_blas_ddot(f_hat, f_hat, &chisq0_hat);
We perform the least squares optimization, get the standard error and the final RSS
value for the first estimator:
/* solve the system with a maximum of 20 iterations */
int info_tilde;
int status_tilde = gsl_multifit_nlinear_driver(20, xtol, gtol, ftol,
callback, NULL, &info_tilde,
w_tilde);
/* compute covariance of best fit parameters */
gsl_matrix *J_tilde = gsl_multifit_nlinear_jac(w_tilde);
gsl_matrix *covar_tilde = gsl_matrix_alloc (1, 1);
gsl_multifit_nlinear_covar (J_tilde, 0.0, covar_tilde);
/* compute final cost */
double chisq_tilde;
gsl_blas_ddot(f_tilde, f_tilde, &chisq_tilde);
We perform the least squares optimization, get the standard error and the final RSS
value for the second estimator:
/* solve the system with a maximum of 20 iterations */
int info_hat;
int status_hat = gsl_multifit_nlinear_driver(20, xtol, gtol, ftol,
callback, NULL, &info_hat,
w_hat);
/* compute covariance of best fit parameters */
gsl_matrix *J_hat = gsl_multifit_nlinear_jac(w_hat);
gsl_matrix *covar_hat = gsl_matrix_alloc (1, 1);
gsl_multifit_nlinear_covar (J_hat, 0.0, covar_hat);
/* compute final cost */
double chisq_hat;
gsl_blas_ddot(f_hat, f_hat, &chisq_hat);
gsl_matrix_free (covar_tilde);
gsl_matrix_free (covar_hat);
Once the code has been properly “tangled” and stored in a file named expl_beta_est.c
(in the code
sub-directory) we compile it with gcc (we could also do it with another compiler):
gcc -W -g -o code/expl_beta_est code/expl_beta_est.c -lgsl -lgslcblas -lm -std=gnu11
We can run the code using a data set made of two observations made at the same times as the two points of the previous figure with:
./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 \
--t_onset=0 --t_0=0.3 --delta_t=2.5
iter 0: beta = 1.0000, |f(x)| = 10.9021 iter 1: beta = 0.9957, |f(x)| = 10.8452 iter 2: beta = 0.9955, |f(x)| = 10.8451 iter 3: beta = 0.9955, |f(x)| = 10.8451 iter 4: beta = 0.9955, |f(x)| = 10.8451 iter 5: beta = 0.9955, |f(x)| = 10.8451 iter 6: beta = 0.9955, |f(x)| = 10.8451 summary from method 'trust-region/levenberg-marquardt' number of iterations: 6 function evaluations: 10 Jacobian evaluations: 7 reason for stopping: small step size initial |f(x)| = 10.902140 final |f(x)| = 10.845086 beta = 0.99547 +/- 0.00774 status = success iter 0: beta = 1.0000, |f(x)| = 0.3816 iter 1: beta = 0.9630, |f(x)| = 0.2707 iter 2: beta = 0.9630, |f(x)| = 0.2707 iter 3: beta = 0.9629, |f(x)| = 0.2707 iter 4: beta = 0.9629, |f(x)| = 0.2707 iter 5: beta = 0.9629, |f(x)| = 0.2707 summary from method 'trust-region/levenberg-marquardt' number of iterations: 5 function evaluations: 6 Jacobian evaluations: 6 reason for stopping: small step size initial |f(x)| = 0.381617 final |f(x)| = 0.270744 beta = 0.96295 +/- 0.25700 status = success
Since our code makes several allocations (in the heap) it is a could idea to check that we did not forget to free of this allocated memory space when we quit the program. This is easily done with valgrind as illustrated next:
valgrind ./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 \
--t_onset=0 --t_0=0.3 --delta_t=2.5
==4656== Memcheck, a memory error detector ==4656== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==4656== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==4656== Command: ./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 --t_onset=0 --t_0=0.3 --delta_t=2.5 ==4656== iter 0: beta = 1.0000, |f(x)| = 10.9021 iter 1: beta = 0.9957, |f(x)| = 10.8452 iter 2: beta = 0.9955, |f(x)| = 10.8451 iter 3: beta = 0.9955, |f(x)| = 10.8451 iter 4: beta = 0.9955, |f(x)| = 10.8451 iter 5: beta = 0.9955, |f(x)| = 10.8451 iter 6: beta = 0.9955, |f(x)| = 10.8451 summary from method 'trust-region/levenberg-marquardt' number of iterations: 6 function evaluations: 10 Jacobian evaluations: 7 reason for stopping: small step size initial |f(x)| = 10.902140 final |f(x)| = 10.845086 beta = 0.99547 +/- 0.00774 status = success iter 0: beta = 1.0000, |f(x)| = 0.3816 iter 1: beta = 0.9630, |f(x)| = 0.2707 iter 2: beta = 0.9630, |f(x)| = 0.2707 iter 3: beta = 0.9629, |f(x)| = 0.2707 iter 4: beta = 0.9629, |f(x)| = 0.2707 iter 5: beta = 0.9629, |f(x)| = 0.2707 summary from method 'trust-region/levenberg-marquardt' number of iterations: 5 function evaluations: 6 Jacobian evaluations: 6 reason for stopping: small step size initial |f(x)| = 0.381617 final |f(x)| = 0.270744 beta = 0.96295 +/- 0.25700 status = success ==4656== ==4656== HEAP SUMMARY: ==4656== in use at exit: 0 bytes in 0 blocks ==4656== total heap usage: 186 allocs, 186 frees, 10,136 bytes allocated ==4656== ==4656== All heap blocks were freed -- no leaks are possible ==4656== ==4656== For counts of detected and suppressed errors, rerun with: -v ==4656== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We can now code a systematic simulation where a sample is generated nrep
times according to the model specified by Eq. \ref{eq:deterministic-mono-exp-discrete} and \ref{eq:obs-mono-exp-discrete}, the two estimators
The first major difference with the previous code is the loop (appearing explicitly) through which the different bootstrap replicates are generated and fitted. Array have to be allocated and later freed for storing the estimates <<beta_samp_dist-beta-range>>
and <<beta_samp_dist-write-hist>>
generates the histograms.
<<beta_samp_dist-include>>
<<expl_beta_est-data-structure>>
<<expl_beta_est-tilde_f>>
<<expl_beta_est-tilde_df>>
<<expl_beta_est-hat_f>>
<<expl_beta_est-hat_df>>
int main(int argc, char *argv[])
{
<<beta_samp_dist-usage>>
<<beta_samp_dist-args>>
<<sim_mono_exp-setup-rng>>
<<expl_beta_est-set-tolerance>>
<<beta_samp_dist-fdf-tilde>>
<<beta_samp_dist-setup-tilde-solver>>
<<beta_samp_dist-fdf-hat>>
<<beta_samp_dist-setup-hat-solver>>
<<beta_samp_dist-allocate-beta-vectors>>
for (size_t rep_idx=0; rep_idx<n_rep; rep_idx++) {
<<expl_beta_est-simulate>>
<<beta_samp_dist-get-beta-tilde>>
<<beta_samp_dist-get-beta-hat>>
<<beta_samp_dist-store-results>>
}
<<beta_samp_dist-beta-range>>
<<beta_samp_dist-allocate-hist>>
<<beta_samp_dist-fill-hist>>
<<beta_samp_dist-write-hist>>
<<beta_samp_dist-free-beta-vectors>>
<<beta_samp_dist-free-hist>>
<<sim_mono_exp-free-rng>>
<<expl_beta_est-free-solver>>
return 0;
}
We need to had #include <gsl/gsl_histogram.h>
compared to our last header so that we get access to the gsl
functions and structures dealing with histograms:
<<sim_mono_exp-include>>
<<expl_beta_est-extra-include>>
#include <gsl/gsl_histogram.h>
Our new code takes three additional formal parameters: n_rep
, n_bin
and max_iter
:
static char usage[] = \
"usage: %s [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ...\n" \
" ... [--t_0=real] [--delta_t=real] [--n_obs=int] [--n_rep=int] ...\n"\
" ... [--n_bin=int] [--max_iter=int]\n\n"\
" --F_inf <positive real>: asymptotic value (default 800)\n"\
" --Delta <positive real>: signal jump amplitude (default 150)\n"\
" --beta <positive real>: inverse time constant (default 0.1)\n"\
" --t_onset <real>: the stimulus onset time (default 5)\n" \
" --t_0 <real>: observations start time (default 0)\n"\
" --delta_t <positive real>: sampling period (default 0.15)\n"\
" --n_obs <positive integer>: number of observations (default 168)\n"
" --n_rep <positive integer>: number of Monte Carlo replicates (default 10000)\n"
" --n_bin <positive integer>: number of bins for the histograms (default 50)\n"
" --max_iter <positive integer>: maximal number of optimization iterations (default 500)\n\n"
" The program write to the stdout two gsl histograms containing the (estimated) sampling\n"
" distribution of the two beta estimators separated by two blank lines\n\n";
The formal parameters reading part is modified in order to deal with the three new parameters:
double F_inf = 800;
double Delta = 150;
double beta = 0.1;
double t_onset = 5;
double t_0 = 0;
double delta_t = 0.15;
size_t n_obs = 168;
size_t n_rep = 10000;
size_t n_bin = 50;
size_t max_iter = 500;
{int opt;
static struct option long_options[] = {
{"F_inf",optional_argument,NULL,'f'},
{"Delta",optional_argument,NULL,'d'},
{"beta",optional_argument,NULL,'b'},
{"t_onset",optional_argument,NULL,'o'},
{"t_0",optional_argument,NULL,'s'},
{"delta_t",optional_argument,NULL,'t'},
{"n_obs",optional_argument,NULL,'n'},
{"n_rep",optional_argument,NULL,'r'},
{"n_bin",optional_argument,NULL,'e'},
{"max_iter",optional_argument,NULL,'m'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hf:d:b:o:s:t:n:r:e:m:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
F_inf = (double) atof(optarg);
if (F_inf <= 0)
{
fprintf(stderr,"The asymptotic value should be > 0.\n");
return -1;
}
}
break;
case 'd':
{
Delta = (double) atof(optarg);
if (Delta <= 0)
{
fprintf(stderr,"Delta should be > 0.\n");
return -1;
}
}
break;
case 'b':
{
beta = (double) atof(optarg);
if (beta <= 0)
{
fprintf(stderr,"beta should be > 0.\n");
return -1;
}
}
break;
case 'o':
{
t_onset = (double) atof(optarg);
}
break;
case 's':
{
t_0 = (double) atof(optarg);
}
break;
case 't':
{
delta_t = (double) atof(optarg);
if (delta_t <= 0)
{
fprintf(stderr,"delta_t should be > 0.\n");
return -1;
}
}
break;
case 'n':
{
n_obs = (size_t) atoi(optarg);
if (n_obs <= 0)
{
fprintf(stderr,"n_obs should be > 0.\n");
return -1;
}
}
break;
case 'r':
{
n_rep = (size_t) atoi(optarg);
if (n_rep <= 0)
{
fprintf(stderr,"n_rep should be > 0.\n");
return -1;
}
}
break;
case 'e':
{
n_bin = (size_t) atoi(optarg);
if (n_bin <= 0)
{
fprintf(stderr,"n_bin should be > 0.\n");
return -1;
}
}
break;
case 'm':
{
max_iter = (size_t) atoi(optarg);
if (max_iter <= 0)
{
fprintf(stderr,"max_iter should be > 0.\n");
return -1;
}
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
}
The code block is almost identical to <<expl_beta_est-fdf-tilde>>
, the only difference is the last assignment of the latter, fdf_tilde.params = &d;
, that occurs now within the loop since we want to fit new simulated data at each iteration.
gsl_multifit_nlinear_fdf fdf_tilde;
fdf_tilde.f = tilde_f;
fdf_tilde.df = tilde_df; /* set to NULL for finite-difference Jacobian */
fdf_tilde.fvv = NULL; /* not using geodesic acceleration */
fdf_tilde.n = n_obs;
fdf_tilde.p = 1;
The corresponding comment to the one of the previous code block applies here also.
gsl_multifit_nlinear_fdf fdf_hat;
fdf_hat.f = hat_f;
fdf_hat.df = hat_df; /* set to NULL for finite-difference Jacobian */
fdf_hat.fvv = NULL; /* not using geodesic acceleration */
fdf_hat.n = n_obs;
fdf_hat.p = 1;
Again, a code block almost identical to <<expl_beta_est-setup-tilde-solver>>
. The last statement has been removed since it must be renewed at each iteration:
const gsl_multifit_nlinear_type *T_tilde = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_tilde =
gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_tilde = gsl_multifit_nlinear_alloc (T_tilde,
&fdf_params_tilde,
n_obs, 1);
See previous point.
const gsl_multifit_nlinear_type *T_hat = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_hat =
gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_hat = gsl_multifit_nlinear_alloc (T_hat,
&fdf_params_hat,
n_obs, 1);
We just allocate heap space to store the
double *beta_tilde = malloc(n_rep*sizeof(double));
double *beta_hat = malloc(n_rep*sizeof(double));
We free the heap memory taken by our beta
arrays:
free(beta_tilde);
free(beta_hat);
This is the “workhorse” of the program for the first estimator. The new data are “assigned” to the fdf_tilde
structure, the w_tilde
solver is reinitialized and the optimization is performed. This time the maximal number of iterations is a user controlled parameter and no callback
function is called (we are going to generate
fdf_tilde.params = &d;
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_tilde, w_tilde);
/* solve the system with a maximum of max_iter iterations */
int info_tilde;
int status_tilde = gsl_multifit_nlinear_driver(max_iter, xtol, gtol, ftol,
NULL, NULL, &info_tilde,
w_tilde);
See the previous point.
fdf_hat.params = &d;
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_hat, w_hat);
/* solve the system with a maximum of max_iter iterations */
int info_hat;
int status_hat = gsl_multifit_nlinear_driver(max_iter, xtol, gtol, ftol,
NULL, NULL, &info_hat,
w_hat);
For each replicate (that is, each iteration of the loop), we check that convergence has been achieved and we store the estimated beta
at the proper place:
if (info_tilde == 0) {
fprintf(stderr,"On replicate %d the beta_tilde estimator did not converge.\n",rep_idx);
beta_tilde[rep_idx] = NAN;
} else {
beta_tilde[rep_idx] = gsl_vector_get(w_tilde->x, 0);
}
if (info_hat == 0) {
fprintf(stderr,"On replicate %d the beta_hat estimator did not converge.\n",rep_idx);
beta_hat[rep_idx] = NAN;
} else {
beta_hat[rep_idx] = gsl_vector_get(w_hat->x, 0);
}
We get the range of the estimated beta
in order to define properly the limits of our histograms:
gsl_vector_view beta_tilde_v = gsl_vector_view_array (beta_tilde, n_rep);
double beta_max, beta_min;
gsl_vector_minmax(&beta_tilde_v.vector,&beta_min,&beta_max);
gsl_vector_view beta_hat_v = gsl_vector_view_array (beta_hat, n_rep);
if (gsl_vector_max(&beta_hat_v.vector) > beta_max)
beta_max = gsl_vector_max(&beta_hat_v.vector);
if (gsl_vector_min(&beta_hat_v.vector) < beta_min)
beta_min = gsl_vector_min(&beta_hat_v.vector);
We allocate two histograms covering the same range with the same number of bins (controlled by the user):
gsl_histogram * h_tilde = gsl_histogram_alloc (n_bin);
gsl_histogram_set_ranges_uniform (h_tilde, beta_min, beta_max);
gsl_histogram * h_hat = gsl_histogram_alloc (n_bin);
gsl_histogram_set_ranges_uniform (h_hat, beta_min, beta_max);
The histograms are freed once we are done with them:
gsl_histogram_free (h_tilde);
gsl_histogram_free (h_hat);
The histograms are “updated” for each estimated beta
. Once all the beta
have been considered, the histograms are scaled in order to yield a proper density estimation:
for (size_t rep_idx=0; rep_idx<n_rep; rep_idx++) {
gsl_histogram_increment (h_tilde, beta_tilde[rep_idx]);
gsl_histogram_increment (h_hat, beta_hat[rep_idx]);
}
gsl_histogram_scale (h_tilde,(double)n_bin/(n_rep*(beta_max-beta_min)));
gsl_histogram_scale (h_hat,(double) n_bin/(n_rep*(beta_max-beta_min)));
Results are written to the standard output with the histogram containing the (estimated) sampling distribution of the first estimator first and, after two blank lines as required by gnuplot
, the (estimated) sampling distribution of the second estimator:
gsl_histogram_fprintf(stdout,h_tilde,"%g","%g");
fprintf(stdout,"\n\n");
gsl_histogram_fprintf(stdout,h_hat,"%g","%g");
Once the code has been properly “tangled” and stored in a file named beta_samp_dist.c
(in the code
sub-directory) we compile it with gcc (we could also do it with another compiler):
gcc -W -g -o code/beta_samp_dist code/beta_samp_dist.c -lgsl -lgslcblas -lm -std=gnu11
We can now run our computation on 100,000 replicates checking also the execution time with the POSIX time
function (this run-time is measured on a ThinkPad L530 with an Intel i3 CPU at 2.5 GHz):
time ./code/beta_samp_dist --F_inf=100 \
--Delta=900 --beta=1 --t_onset=0 \
--t_0=0.3 --delta_t=2.7 --n_obs=2\
--n_rep=100000 --n_bin=100 > beta_samp_dist.txt
real 0m1,360s user 0m1,360s sys 0m0,000s
We can make a quick valgrind check with:
valgrind ./code/beta_samp_dist --F_inf=100 --Delta=900 --beta=1\
--t_onset=0 --t_0=0.3 --delta_t=2.7 --n_obs=2 --n_rep=100\
--n_bin=100 > dump
==5216== Memcheck, a memory error detector ==5216== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==5216== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==5216== Command: ./code/beta_samp_dist --F_inf=100 --Delta=900 --beta=1 --t_onset=0 --t_0=0.3 --delta_t=2.7 --n_obs=2 --n_rep=100 --n_bin=100 ==5216== ==5216== ==5216== HEAP SUMMARY: ==5216== in use at exit: 0 bytes in 0 blocks ==5216== total heap usage: 167 allocs, 167 frees, 18,488 bytes allocated ==5216== ==5216== All heap blocks were freed -- no leaks are possible ==5216== ==5216== For counts of detected and suppressed errors, rerun with: -v ==5216== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We now use gnuplot as follows:
t1 = 0.3
t2 = 3
F(t) = 100 + 900*exp(-t)
dF(t) = -t*900*exp(-t)
var0 = (dF(t1)**2*F(t1)+dF(t2)**2*F(t2))/(dF(t1)**2+dF(t2)**2)**2
var1 = 1.0/(dF(t1)**2/F(t1)+dF(t2)**2/F(t2))
gogo0(x) = exp(-0.5*(x-1.0)**2/var0)/sqrt(2*pi*var0)
gogo1(x) = exp(-0.5*(x-1.0)**2/var1)/sqrt(2*pi*var1)
set xlabel "β"
set ylabel "Densities"
set key
plot gogo0(x) with lines linecolor rgb "black" \
linewidth 2 title "asymp. β tilde",\
gogo1(x) with lines linecolor rgb "grey" \
linewidth 2 title "asymp. β hat",\
"beta_samp_dist.txt" index 0 using 1:3 with steps \
linewidth 2 linecolor rgb "blue" title "β tilde",\
"" index 1 using 1:3 with steps linewidth 2\
linecolor rgb "orange" title "β hat"
We write now a program that returns the Cumulative Distribution Function (CDF) of a scaled Poisson random variable:
\begin{equation}\label{eq:scaled-Poisson}
Y_n = \frac{X_n-n}{\sqrt{n}}\, ,
\end{equation}
where stdout
a two columns text file where the first column contains the number numbers gnuplot
. The only “tricky” part of the code will be the user supplied parameters reading; we will allow users to pass the series of Poisson distribution parameters they want to explore like –n=5,50,500,5000 that is, several comma separated values can be given.
The skeleton of our program looks like:
<<scaled_poisson-include>>
int main(int argc, char *argv[])
{
<<scaled_poisson-usage>>
<<scaled_poisson-args>>
<<scaled_poisson-print-header>>
for (size_t i=0; i<nb_n; i++) {
// Loop over the Poisson parameters
<<scaled_poisson-get-and-print-CDF>>
}
<<scaled_poisson-free-n_seq>>
return 0;
}
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_cdf.h>
static char usage[] = \
"usage: %s [--n=int,int,int,...]\n\n"\
" --n <positive integers>: Poisson parameters separated by ','\n\n"
" The program writes to the stdout the CDF of a scaled Poisson random variable\n"
" that is, if X_n is Poisson with parameter n, the first column of the output\n"
" contains integers from 0 to n and the second column contains the corresponding\n"
" value of the CFD of Y_n = (X_n-n)/sqrt(n)\n\n";
Code block <<scaled_poisson-args>>
assigned to variable nb_n
the number of Poisson distributions the user wants to consider, it allocates the heap memory required to store the nb_n
parameters and makes this memory accessible through the pointer n_seq
. The allocated memory must be freed before the program returns.
int *n_seq;
size_t nb_n=0;
{int opt;
static struct option long_options[] = {
{"n",required_argument,NULL,'n'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hn:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'n':
{
char *start = strdup(optarg); // duplicate optarg content
char *running;
running = start;
char *token = strsep(&running, ",");
while (token != NULL) {
token = strsep (&running, ","); // split d_name at each ","
nb_n++;
}
free(start);
n_seq = malloc(nb_n*sizeof(int));
start = strdup(optarg); // duplicate optarg content again
running = start;
// Get the parameter of each Poisson distribution
for (size_t i=0; i<nb_n; i++) {
token = strsep (&running, ",");
int para = atoi(token);
if (para <= 0) {
fprintf(stderr,"A Poisson parameter should be > 0.\n");
free(start);
free(n_seq);
return -1;
} else {
n_seq[i] = para;
}
}
free(start);
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
}
free(n_seq);
printf("# %d Scaled Poisson distributions "
"with parameters:\n# %d",(int) nb_n, n_seq[0]);
if (nb_n > 1) {
for (size_t i=1; i<nb_n; i++)
printf(", %d",n_seq[i]);
}
printf("\n#\n#\n");
int n = n_seq[i];
double sn = sqrt(n);
double inv_sn = 1.0/sn;
unsigned int from = n-4*sn > 0 ? n-4*sn : 0;
unsigned int to = n+5*sn;
printf("# Scaled Poisson CDF with parameter: %d\n",n);
for (unsigned int k=from; k <= to; k++) {
printf("%g %g\n", ((double) k-n)*inv_sn, gsl_cdf_poisson_P(k,n));
}
printf("\n\n");
Once the code has been properly “tangled” and stored in a file named scaled_poisson.c
(in the code
sub-directory) we compile it with gcc (we could also do it with another compiler):
gcc -W -g -o code/scaled_poisson code/scaled_poisson.c -lgsl -lgslcblas -lm -std=gnu11
We test our code with a single parameter (5) and we don’t save the result in a file:
./code/scaled_poisson --n=5
# 1 Scaled Poisson distributions with parameters: # 5 # # # Scaled Poisson CDF with parameter: 5 -2.23607 0.00673795 -1.78885 0.0404277 -1.34164 0.124652 -0.894427 0.265026 -0.447214 0.440493 0 0.615961 0.447214 0.762183 0.894427 0.866628 1.34164 0.931906 1.78885 0.968172 2.23607 0.986305 2.68328 0.994547 3.1305 0.997981 3.57771 0.999302 4.02492 0.999774 4.47214 0.999931 4.91935 0.99998
We can use valgrind on it:
valgrind ./code/scaled_poisson --n=5 > dump
==4495== Memcheck, a memory error detector ==4495== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==4495== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==4495== Command: ./code/scaled_poisson --n=5 ==4495== ==4495== ==4495== HEAP SUMMARY: ==4495== in use at exit: 0 bytes in 0 blocks ==4495== total heap usage: 4 allocs, 4 frees, 4,104 bytes allocated ==4495== ==4495== All heap blocks were freed -- no leaks are possible ==4495== ==4495== For counts of detected and suppressed errors, rerun with: -v ==4495== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Now we get the 4 values of interest and store the results into file scaled_poisson_cdf.txt
:
./code/scaled_poisson --n=5,50,500,5000 > scaled_poisson_cdf.txt
We define a general code block <<scaled_poisson_cdf_gnuplot>>
that contains an unspecified variable idx
that we can set (to 0,1,2,3) to generate the plots with the different parameters of the scaled Poisson distributions:
set xlabel "Z"
set ylabel "CDF"
unset key
plot [-3:3][0:1] norm(x) with lines linecolor rgb "red" linewidth 2,\
"scaled_poisson_cdf.txt" index idx using 1:2 with steps\
linecolor rgb "black" linewidth 2
The calibration data are located at the following address: http://xtof.disque.math.cnrs.fr/data/CCD_calibration.hdf5
.
They can be downloaded with:
wget http://xtof.disque.math.cnrs.fr/data/CCD_calibration.hdf5
We get a quick outline of the file content with the h5ls
program:
h5ls -r CCD_calibration.hdf5
/ Group /100ms Group /100ms/stack Dataset {60, 80, 100} /100ms/time Dataset {100} /10ms Group /10ms/stack Dataset {60, 80, 100} /10ms/time Dataset {100} /20ms Group /20ms/stack Dataset {60, 80, 100} /20ms/time Dataset {100} /30ms Group /30ms/stack Dataset {60, 80, 100} /30ms/time Dataset {100} /40ms Group /40ms/stack Dataset {60, 80, 100} /40ms/time Dataset {100} /50ms Group /50ms/stack Dataset {60, 80, 100} /50ms/time Dataset {100} /60ms Group /60ms/stack Dataset {60, 80, 100} /60ms/time Dataset {100} /70ms Group /70ms/stack Dataset {60, 80, 100} /70ms/time Dataset {100} /80ms Group /80ms/stack Dataset {60, 80, 100} /80ms/time Dataset {100} /90ms Group /90ms/stack Dataset {60, 80, 100} /90ms/time Dataset {100}
We see that the file contains 10 Groups (10ms, 20ms,…,100ms) and that each group contains 2 DataSets “stack” and “time”. The “stack” data set is a cube (3D object 60x80x100) containing the data. We can get details by looking at the “README” attribute of the file with (result not shown):
h5dump -N "README" CCD_calibration.hdf5
We now want to write a short C
code that takes a group name (like 10ms
) an image index in the stack and that prints to the stdout
the matrix (the image or exposure) containing the data so that it can be used by gnuplot
.
<<print-exposure-include>>
int main(int argc, char *argv[])
{
<<print-exposure-usage>>
<<print-exposure-args>>
<<print-exposure-open-FILE-and-read-DSET>>
<<print-exposure-print-results>>
<<print-exposure-close-file-free-data>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <string.h>
#include <getopt.h>
static char usage[] = \
"usage: %s [--group=string] [--frame=idx] [--file=string]\n\n"\
" --group <character string>: the name of the group (eg, 10ms the default)\n"
" --frame <positive integer>: the frame index (from 0 to 99, default at 0)\n"
" --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
" The program opens 'file', gets to dataset 'group/stack' and prints to\n"
" the stdout the exposure whose index is given by frame.\n\n";
This code block is probably more sophisticated than necessary given what we want to do with this program. It assigns the frame index to the frame
variable, the complete dataset name to dset
and makes filename
point to a string containing the filename. It moreover checks that the Group name is among the known group names.
size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
{int opt;
size_t n_groups = 10;
char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
"60ms","70ms","80ms","90ms","100ms"};
char *groupname;
static struct option long_options[] = {
{"group",optional_argument,NULL,'g'},
{"file",optional_argument,NULL,'f'},
{"frame",optional_argument,NULL,'i'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
int file_name_set = 0, group_name_set = 0;
while ((opt = getopt_long(argc,argv,
"hf:g:i:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'i':
{
frame = (size_t) atoi(optarg);
if (frame > 99)
{
fprintf(stderr,"frame should between 0 and 99 inclusive.\n");
return -1;
}
}
break;
case 'f':
{
filename = optarg;
file_name_set = 1;
}
break;
case 'g':
{
groupname = optarg;
size_t j;
for (j=0; j<n_groups; j++)
if (strcmp(groupname,groups[j])==0)
break;
if (j == n_groups) {
fprintf(stderr,"Unknown group\n");
return -1;
}
group_name_set = 1;
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
if (group_name_set == 0) groupname = groups[0];
if (file_name_set == 0) filename = default_file_name;
strcat(dset,groupname);
strcat(dset,"/stack");
}
In this case we are using the more general way of allocating space to store the data. We use an explicit malloc
with int *data = malloc(nrow*ncol*nsamp*sizeof(int));
that allocates memory in the heap (that we must free later on) instead of the int data[nrow][ncol][nsamp];
that we used in our first HDF5 reading code (that allocates memory in the stack, meaning we do not have to free it ourselves).
// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);
Here we close the HDF5 file and we free the memory space taken up by data
.
free(data);
H5Fclose (file_id);
The only tricky part here is due to the way we allocated the memory space taken by data
, we cannot use data[i][j][frame]
as we would have done previously, we must use data[(i*ncol+j)*nsamp+frame]
instead:
for (size_t i=0; i<nrow; i++) {
for (size_t j=0; j<ncol; j++)
printf("%d ",data[(i*ncol+j)*nsamp+frame]);
printf("\n");
}
We compile the code with:
gcc -W -g -o code/print-exposure code/print-exposure.c -lhdf5 -lhdf5_hl -lm -std=gnu11
We use the code with:
./code/print-exposure > toto
We use valgrind on it with:
valgrind ./code/print-exposure > toto
set palette grey
set cblabel "ADU"
set xrange [0:79]
set yrange [0:59]
set view map
splot [0:79][0:59] "<./code/print-exposure" matrix with image
If we just want to plot a few ADU sequences, we can make use of the options of h5dump
that allow us to print to the stdout
part of an HDF5 file, like here if we want to look at the ADU sequence of row 31, column 41 (starting counting at 0) for the 10ms Group:
h5dump -d "/10ms/stack[31,41,0;;1,1,100;]" -A 0 CCD_calibration.hdf5
HDF5 "CCD_calibration.hdf5" { DATASET "/10ms/stack" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 60, 80, 100 ) / ( 60, 80, 100 ) } SUBSET { START ( 31, 41, 0 ); STRIDE ( 1, 1, 1 ); COUNT ( 1, 1, 100 ); BLOCK ( 1, 1, 1 ); DATA { (31,41,0): 419, 423, 422, 411, 412, 412, 407, 423, 424, 427, 419, 409, (31,41,12): 409, 410, 413, 420, 412, 440, 413, 416, 413, 416, 408, 417, (31,41,24): 410, 418, 405, 420, 415, 424, 417, 422, 405, 397, 412, 413, (31,41,36): 409, 408, 412, 421, 420, 416, 400, 425, 408, 405, 408, 419, (31,41,48): 405, 406, 416, 418, 413, 412, 421, 418, 422, 414, 426, 414, (31,41,60): 403, 418, 406, 414, 395, 400, 416, 411, 408, 414, 412, 425, (31,41,72): 415, 418, 414, 415, 401, 409, 421, 412, 415, 417, 424, 426, (31,41,84): 422, 412, 411, 420, 413, 431, 420, 426, 411, 416, 408, 398, (31,41,96): 411, 412, 417, 413 } } } }
Now we are just interested in the data part so we extract everything that comes of a regular expression looking like the following pattern: (number,number,number):
using the sed
program (this command sequence was built interactively from the shell):
h5dump -d "/10ms/stack[31,41,0;;1,1,100;]" -A 0 CCD_calibration.hdf5 | \
grep "([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):" | \
sed -e "s/[[:space:]]*([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):[[:space:]]*//g" |\
sed -e "s/,$//g" | sed -e "s/,[[:space:]]*/\n/g"
419 423 422 411 412 412 407 423 424 427 419 409 409 410 413 420 412 440 413 416 413 416 408 417 410 418 405 420 415 424 417 422 405 397 412 413 409 408 412 421 420 416 400 425 408 405 408 419 405 406 416 418 413 412 421 418 422 414 426 414 403 418 406 414 395 400 416 411 408 414 412 425 415 418 414 415 401 409 421 412 415 417 424 426 422 412 411 420 413 431 420 426 411 416 408 398 411 412 417 413
We can then write a short shell script taking parameters and dumping the ADU sequence we are interested in:
#!/bin/bash
# Usage: adu_sequence row column
# Dumps the corresponding adu sequence
h5dump -d "/10ms/stack[$1,$2,0;;1,1,100;]" -A 0 CCD_calibration.hdf5 | \
grep "([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):" | \
sed -e "s/[[:space:]]*([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):[[:space:]]*//g" |\
sed -e "s/,$//g" | sed -e "s/,[[:space:]]*/\n/g"
We have to change the mode of our script to be able to use it with:
chmod u+x code/adu_sequence.sh
We can then use our code with (result not shown):
./code/adu_sequence.sh 31 41
We can now make our figure using the =multiplot= feature of gnuplot
:
unset key
set grid
set multiplot layout 3,1 margins 0.1,0.9,0.1,0.95 spacing 0,0
set xtics 20 format ""
set ylabel "ADU"
set xlabel ""
plot [1:100][390:440] "<./code/adu_sequence.sh 31 41" using 0:1\
with lines linewidth 2 linecolor rgb "blue"
set xtics 20 format ""
set ylabel "ADU"
set xlabel ""
plot [1:100][390:440] "<./code/adu_sequence.sh 31 40" using 0:1\
with lines linewidth 2 linecolor rgb "blue"
set format x "%g"
set xlabel "Frame index"
set ylabel "ADU"
plot [1:100][390:440] "<./code/adu_sequence.sh 31 39" using 0:1\
with lines linewidth 2 linecolor rgb "blue"
unset grid
unset multiplot
We now fit a straight line to a single ADU sequence like the ones that are dumped to the stdout
by our adu_sequence.sh
function. We will write a program doing this job, adu_sequence_fit
. This program will read the data from the stdin
and will print to the sdtout
a gnuplot
compatible dataset with fitting results as comments followed by four columns containing the frame index, the fitted ADU value, its 95% lower bound and its 95% upper bound.
<<adu_sequence_fit-include>>
int main(int argc, char *argv[])
{
<<adu_sequence_fit-usage>>
<<adu_sequence_fit-args>>
<<adu_sequence_fit-read-data>>
<<adu_sequence_fit-fit>>
<<adu_sequence_fit-print-results>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_fit.h>
#include <gsl/gsl_cdf.h>
static char usage[] = \
"usage: %s [--nb_frames=integer] [--conf_level=real]\n\n"\
" --nb_frames <positive integer>: the number of frames (default at 100)\n"
" --conf_level <positive real>: confidence level (between 0 and 1, default 0.95)\n\n"
" The program reads data from the stdin and performs a linear fit.\n"
" It prints to the stdout data compatible with gnuplot with the\n"
" fitting results as comments followed by 4 columns:\n"
" - the frame index\n"
" - the fitted ADU\n"
" - the lower bound of the fitted ADU\n"
" - the upper bound of the fitted ADU.\n\n";
This code block initializes two variables: n
(size_t) that contains the number of ADU in the sequence and cl
(a double) that contains the requested confidence level.
size_t n = 100;
double cl = 0.95;
{int opt;
static struct option long_options[] = {
{"nb_frames",optional_argument,NULL,'n'},
{"conf_level",optional_argument,NULL,'c'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hn:c:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'n':
{
n = (size_t) atoi(optarg);
}
break;
case 'c':
{
cl = (double) atof(optarg);
if (cl <= 0 | cl >= 1)
{
fprintf(stderr,"A confidence interval should be between 0 and 1.\n");
return -1;
}
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
}
double adu[n];
{
char input[256];
size_t i=0;
while (fgets(input,sizeof(input),stdin) != NULL & i < n) {
adu[i] = (double) atoi(input);
i++;
}
if (i != n)
fprintf(stderr,"Fewer ADU than planed.\n");
}
double x[n];
for (size_t i=0; i<n; i++)
x[i] = (double) i;
double c0, c1, cov00, cov01, cov11, sumsq;
gsl_fit_linear (x, 1, adu, 1, n, &c0, &c1,
&cov00, &cov01, &cov11,
&sumsq);
printf ("# best fit: ADU = %g + %g F\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n# %g, %g]\n",
cov00, cov01, cov01, cov11);
printf ("# Residual sum of squares = %g\n", sumsq);
double slope_se = sqrt(cov11)*gsl_cdf_tdist_Pinv(1.0-(1.0-cl)*0.5,(double)n-3.0);
printf ("# A %g confidence interval for the slope is (%g,%g)\n",
cl,c1-slope_se,c1+slope_se);
printf ("#\n");
printf ("# Frame index Fitted value Lower bound Upper bound\n");
double se_f = gsl_cdf_gaussian_Pinv(1.0-(1-cl)*0.5,1.0);
for (size_t i=0; i<n; i++) {
double aduf, aduf_err;
gsl_fit_linear_est(x[i],c0,c1,cov00, cov01,
cov11,&aduf,&aduf_err);
printf("%g %g %g %g\n",
x[i],aduf,aduf-se_f*aduf_err,
aduf+se_f*aduf_err);
}
We compile the code with:
gcc -W -g -o code/adu_sequence_fit code/adu_sequence_fit.c -lgsl -lgslcblas -lm -std=gnu11
We use it piping the output of our shell function with:
./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit > middle_adu_seq_fit
head middle_adu_seq_fit
# best fit: ADU = 413.134 + 0.0320432 F # covariance matrix: # [ 2.07911, -0.0313433 # -0.0313433, 0.000633199] # Residual sum of squares = 5170.6 # A 0.95 confidence interval for the slope is (-0.0178928,0.0819792) # # Frame index Fitted value Lower bound Upper bound 0 413.134 410.308 415.96 1 413.166 410.382 415.95
We can check in two steps with valgrind as follows:
./code/adu_sequence.sh 31 40 > middle_adu_seq
valgrind cat middle_adu_seq | ./code/adu_sequence_fit > middle_adu_seq_fit
==5596== Memcheck, a memory error detector ==5596== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==5596== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==5596== Command: cat middle_adu_seq ==5596== ==5596== ==5596== HEAP SUMMARY: ==5596== in use at exit: 0 bytes in 0 blocks ==5596== total heap usage: 31 allocs, 31 frees, 138,848 bytes allocated ==5596== ==5596== All heap blocks were freed -- no leaks are possible ==5596== ==5596== For counts of detected and suppressed errors, rerun with: -v ==5596== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
set xlabel "Frame index"
set ylabel "ADU"
set grid
unset key
plot [1:100][395:435] "<./code/adu_sequence.sh 31 40" using 0:1\
with lines linewidth 2 linecolor rgb "blue", \
"<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:2\
with lines linewidth 2 linecolor rgb "red", \
"<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:3\
with lines linewidth 2 linecolor rgb "grey", \
"<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:4\
with lines linewidth 2 linecolor rgb "grey"
We now want to fit a straight line to the ADU sequence of each of the 60x80 pixels and to return the ratio of the slope to its standard error. The latter should be the realization of a random variable having a t distribution with 97 degrees of freedom (we have 100 observations in each sequence and we estimate three parameters: the intercept at the origin, the slope and the variance).
<<adu_drift_test-include>>
int main(int argc, char *argv[])
{
<<adu_drift_test-usage>>
<<adu_drift_test-args>>
<<adu_drift_test-read-data>>
<<adu_drift_test-fit>>
<<adu_drift_test-print-results>>
<<adu_drift_test-close-file>>
return 0;
}
This time, we are dealing with the HDF5 file so we need the corresponding headers:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_fit.h>
#include <hdf5.h>
#include <hdf5_hl.h>
static char usage[] = \
"usage: %s [--group=string] [--file=string]\n\n"\
" --group <character string>: the name of the group (eg, 10ms the default)\n"
" --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
" The program opens 'file', gets to dataset 'group/stack', fits a straight\n"
" line to the ADU sequence of each pixel and prints to the stdout the slope\n"
" divided by its standard error (something that should follow a t distribution\n"
" with 97 degrees of freedom).\n\n";
Only a minor adaptation of <<print-exposure-args>>
is required here (we do not pass the frame
):
size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
{int opt;
size_t n_groups = 10;
char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
"60ms","70ms","80ms","90ms","100ms"};
char *groupname;
static struct option long_options[] = {
{"group",optional_argument,NULL,'g'},
{"file",optional_argument,NULL,'f'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
int file_name_set = 0, group_name_set = 0;
while ((opt = getopt_long(argc,argv,
"hf:g:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
filename = optarg;
file_name_set = 1;
}
break;
case 'g':
{
groupname = optarg;
size_t j;
for (j=0; j<n_groups; j++)
if (strcmp(groupname,groups[j])==0)
break;
if (j == n_groups) {
fprintf(stderr,"Unknown group\n");
return -1;
}
group_name_set = 1;
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
if (group_name_set == 0) groupname = groups[0];
if (file_name_set == 0) filename = default_file_name;
strcat(dset,groupname);
strcat(dset,"/stack");
}
We can directly reuse <<print-exposure-open-FILE-and-read-DSET>>
here:
<<print-exposure-open-FILE-and-read-DSET>>
Same as above…
<<print-exposure-close-file-free-data>>
double stat[nrow*ncol];
size_t stat_idx=0;
double x[nsamp];
for (size_t i=0; i<nsamp; i++)
x[i] = (double) i;
double c0, c1, cov00, cov01, cov11, sumsq;
for (size_t i=0; i<nrow; i++) {
for (size_t j=0; j<ncol; j++) {
double adu[nsamp];
for (size_t k=0; k<nsamp; k++)
adu[k] = (double) data[(i*ncol+j)*nsamp+k];
gsl_fit_linear (x, 1, adu, 1, nsamp, &c0,
&c1, &cov00, &cov01, &cov11,
&sumsq);
stat[stat_idx] = c1/sqrt(cov11);
stat_idx++;
}
}
for (size_t i=0; i<nrow*ncol; i++)
printf("%g\n",stat[i]);
We compile the code with:
gcc -W -g -o code/adu_drift_test code/adu_drift_test.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We use adu_drift_test
on the first group with:
./code/adu_drift_test --group=10ms > adu_drift_test_10ms
head adu_drift_test_10ms
0.218779 -1.93477 0.124839 -0.641004 0.976495 -1.53843 -0.366836 1.5394 -2.03652 0.0194629
We can use valgrind on it with:
valgrind ./code/adu_drift_test --group=20ms > adu_drift_test_20ms
==7985== Memcheck, a memory error detector ==7985== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==7985== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==7985== Command: ./code/adu_drift_test --group=20ms ==7985== ==7985== ==7985== HEAP SUMMARY: ==7985== in use at exit: 0 bytes in 0 blocks ==7985== total heap usage: 6,560 allocs, 6,560 frees, 13,435,720 bytes allocated ==7985== ==7985== All heap blocks were freed -- no leaks are possible ==7985== ==7985== For counts of detected and suppressed errors, rerun with: -v ==7985== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We are using a loop from the shell
touch drift_for_all
for grp in {10..100..10}ms ; do
./code/adu_drift_test --group=$grp >> drift_for_all
done
It is handy to have a program able to return an histogram if necessary, we now write one adapted from the gsl-histogram
program (that comes with the gsl
library and whose code is given in the histogram examples). The program reads data from the stdin
and prints the resulting histogram to the sdtout
.
<<mk_histogram-include>>
int main(int argc, char *argv[])
{
<<mk_histogram-usage>>
<<mk_histogram-args>>
<<mk_histogram-read-data>>
<<mk_histogram-make-histo>>
<<mk_histogram-print-results>>
<<mk_histogram-free-memory>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_histogram.h>
static char usage[] = \
"usage: %s --sample-size=integer [--min=real] [--max=real] ...\n"
" ... [--n_bins=integer] [--unormalize]\n\n"
" --sample-size <positive integer>: the sample size\n"
" --min <double>: the left boundary of the histogram (default: min value of the data)\n"
" --max <double>: the right boundary of the histogram (default: max value of the data)\n"
" --n_bins <integer>: the number of bins (default: max-min/sample size/10)\n"
" --unormalize: if set the histogram is not normalized\n\n"
" The program reads data from the stdin builts the histogram and prints it\n"
" to the stdout.\n\n";
size_t sample_size = 0;
size_t n_bins;
double min,max;
int normalize=1,min_set=0,max_set=0,bins_set=0;
{int opt;
static struct option long_options[] = {
{"sample_size",required_argument,NULL,'s'},
{"min",optional_argument,NULL,'d'},
{"max",optional_argument,NULL,'m'},
{"n_bins",optional_argument,NULL,'b'},
{"unormalize",no_argument,NULL,'u'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hub:s:d:m:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'u':
{
normalize = 0;
}
break;
case 'b':
{
n_bins = (size_t) atoi(optarg);
bins_set = 1;
}
break;
case 's':
{
sample_size = (size_t) atoi(optarg);
}
break;
case 'd':
{
min = (double) atof(optarg);
min_set = 1;
}
break;
case 'm':
{
max = (double) atof(optarg);
max_set = 1;
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
if (min_set & max_set & min>max) {
fprintf(stderr,"min > max! Not allowed.\n");
return -1;
}
}
double data[sample_size];
size_t data_idx=0;
double x;
while (fscanf(stdin, "%lg", &x) == 1 & \
data_idx < sample_size) {
data[data_idx] = x;
data_idx++;
}
if (!min_set | !max_set) {
gsl_vector_view data_v = gsl_vector_view_array (data, sample_size);
double xmin,xmax;
gsl_vector_minmax(&data_v.vector,&xmin,&xmax);
if (!min_set)
min = xmin;
if (!max_set)
max = xmax;
}
if (!bins_set)
n_bins = (size_t) ceil((max-min)*sample_size/10);
gsl_histogram *h = gsl_histogram_alloc (n_bins);
gsl_histogram_set_ranges_uniform (h, min, max);
for (size_t i=0; i<sample_size; i++)
gsl_histogram_increment(h, data[i]);
if (normalize)
gsl_histogram_scale (h,(double)n_bins/(sample_size*(max-min)));
gsl_histogram_fprintf (stdout, h, "%g", "%g");
gsl_histogram_free (h);
We compile the code with:
gcc -W -g -o code/mk_histogram code/mk_histogram.c -lgsl -lgslcblas -lm -std=gnu11
We pass the code through valgrind. We get the only required parameter, the number of lines in the file, through a call to wc
:
file_name=drift_for_all
samp_size=`wc -l $file_name | sed -e "s/ $file_name//g"`
valgrind ./code/mk_histogram --sample_size=$samp_size \
< $file_name > dump
==3372== Memcheck, a memory error detector ==3372== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3372== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==3372== Command: ./code/mk_histogram --sample_size=48000 ==3372== ==3372== ==3372== HEAP SUMMARY: ==3372== in use at exit: 0 bytes in 0 blocks ==3372== total heap usage: 5 allocs, 5 frees, 741,088 bytes allocated ==3372== ==3372== All heap blocks were freed -- no leaks are possible ==3372== ==3372== For counts of detected and suppressed errors, rerun with: -v ==3372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We make the figure comparing the histogram of scaled slope coefficients with the theoretical gnuplot
a function returning the t-distribution probability density function. The number of bins is not the default one and was chosen after a few trials:
t_pdf(x,nu) = gamma(.5*(nu+1))*(1+x**2/nu)**(-.5*(nu+1))/sqrt(pi*nu)/gamma(.5*nu)
unset key
set grid
set xlabel "Scaled slope coefficient"
set ylabel "Density"
plot [-5:5][0:0.45] t_pdf(x,97) with lines linecolor rgb "orange" linewidth 3,\
"<./code/mk_histogram --sample_size=48000 --n_bins=200 < drift_for_all"\
using 1:3 with steps linecolor rgb "black" linewidth 2
The next global test we apply looks at correlation between the ADU sequences of pairs of pixels that are nearest neighbors.
<<adu_corr_test-include>>
int main(int argc, char *argv[])
{
<<adu_corr_test-usage>>
<<adu_drift_test-args>>
<<adu_drift_test-read-data>>
<<adu_corr_test-corr-and-print>>
<<adu_drift_test-close-file>>
return 0;
}
We are dealing with the HDF5 file so we need the corresponding headers and the gsl_stats_correlation
signature is in header gsl_statistics_double.h
:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_statistics_double.h>
#include <hdf5.h>
#include <hdf5_hl.h>
static char usage[] = \
"usage: %s [--group=string] [--file=string]\n\n"\
" --group <character string>: the name of the group (eg, 10ms the default)\n"
" --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
" The program opens 'file', gets to dataset 'group/stack', gets the correlation\n"
" coefficient of every pair of nearest neighbor ADU sequences and prints them to\n"
" the stdout.\n"
" Under the null hypothesis these correlation coefficients follow a Gaussian\n"
" distribution centered on 0 with a variance of 1/100.\n\n";
double corr;
for (size_t i=1; i<(nrow-1); i+=2) {
for (size_t j=1; j<(ncol-1); j+=2) {
double adu_ref[nsamp];
for (size_t idx=0; idx<nsamp; idx++)
adu_ref[idx] = (double) data[(i*ncol+j)*nsamp+idx];
double adu_test[nsamp];
for (size_t idx=0; idx<nsamp; idx++)
adu_test[idx] = (double) data[((i-1)*ncol+j)*nsamp+idx];
// look at pixel one row above
corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
printf("%g\n",corr);
for (size_t idx=0; idx<nsamp; idx++)
adu_test[idx] = (double) data[((i+1)*ncol+j)*nsamp+idx];
// look at pixel one row below
corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
printf("%g\n",corr);
for (size_t idx=0; idx<nsamp; idx++)
adu_test[idx] = (double) data[(i*ncol+(j-1))*nsamp+idx];
// look at pixel one column to the left
corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
printf("%g\n",corr);
for (size_t idx=0; idx<nsamp; idx++)
adu_test[idx] = (double) data[(i*ncol+(j+1))*nsamp+idx];
// look at pixel one column to the right
corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
printf("%g\n",corr);
}
}
We compile the code with:
gcc -W -g -o code/adu_corr_test code/adu_corr_test.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We check the code with valgrind as follows:
valgrind ./code/adu_corr_test --group=10ms > dump
==4243== Memcheck, a memory error detector ==4243== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==4243== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==4243== Command: ./code/adu_corr_test --group=10ms ==4243== ==4243== ==4243== HEAP SUMMARY: ==4243== in use at exit: 0 bytes in 0 blocks ==4243== total heap usage: 6,603 allocs, 6,603 frees, 13,987,160 bytes allocated ==4243== ==4243== All heap blocks were freed -- no leaks are possible ==4243== ==4243== For counts of detected and suppressed errors, rerun with: -v ==4243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We are using a loop from the shell
touch corr_for_all
for grp in {10..100..10}ms ; do
./code/adu_corr_test --group=$grp >> corr_for_all
done
We make the figure comparing the histogram of correlation coefficients with the theoretical normal density centered on zero with a variance of 1/100 as shown next. One trick is that we have to define within gnuplot
a function returning the normal probability density function. The number of bins is not the default one and was chosen after a few trials:
gogo(x,var) = exp(-0.5*x**2/var)/sqrt(2*pi*var)
unset key
set grid
set xlabel "Correlation coefficient between nearest neighbor pixels"
set ylabel "Density"
plot [-0.5:0.5][0:4.5] gogo(x,0.01) with lines linecolor rgb "orange" linewidth 3,\
"<./code/mk_histogram --sample_size=45240 --n_bins=200 < corr_for_all"\
using 1:3 with steps linecolor rgb "black" linewidth 2
Now that we are reasonably confident that our calibration data are usable, we write a program that returns the mean ADU and the ADU variance for each pixel and each exposure time. We make our program take as optional parameters the CCD gain and
<<adu_mu_s2-include>>
int main(int argc, char *argv[])
{
<<adu_mu_s2-usage>>
<<adu_mu_s2-args>>
<<adu_mu_s2-open-file>>
<<adu_mu_s2-get-mean-variance-and-print>>
<<adu_mu_s2-close-file-free-data>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_statistics_double.h>
static char usage[] = \
"usage: %s [--file=string] [--gain=real] [--variance=real]\n\n"\
" --file <character string>: data file name (default CCD_calibration.hdf5)\n"
" --gain <positive real>: CCD gain (default 1)\n"
" --variance <positive or null real>: CCD gain (default 0)\n\n"
" The program opens 'file', and for each dataset, for each pixel\n"
" computes the mean ADU and the ADU variance. If 'gain' and 'variance'\n"
" are diffferent from 1 and 0 repectively then variance stabilization is\n"
" performed, that is, the program works with 2*sqrt(ADU/gain+variance)\n"
" instead of working with the ADU directly.\n"
" The program prints its results to the stdout with the mean on the first\n"
" column and the variance on the second.\n\n";
size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
double gain = 1.0;
double read_out_var = 0.0;
{int opt;
static struct option long_options[] = {
{"file",optional_argument,NULL,'f'},
{"gain",optional_argument,NULL,'g'},
{"variance",optional_argument,NULL,'v'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
int file_name_set = 0;
while ((opt = getopt_long(argc,argv,
"hf:g:v:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
filename = optarg;
file_name_set = 1;
}
break;
case 'g':
{
gain = (double) atof(optarg);
if (gain <= 0) {
fprintf(stderr,"gain must be > 0\n");
return -1;
}
}
break;
case 'v':
{
read_out_var = (double) atof(optarg);
if (read_out_var < 0) {
fprintf(stderr,"variance must be >= 0\n");
return -1;
}
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
if (file_name_set == 0) filename = default_file_name;
}
// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
size_t n_groups = 10;
char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
"60ms","70ms","80ms","90ms","100ms"};
char *groupname = groups[0];
char dset[25];
dset[0] = '/';
dset[1] = '\0';
strcat(dset,groupname);
strcat(dset,"/stack");
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
for (size_t g_idx=0; g_idx<n_groups; g_idx++) {
char *groupname = groups[g_idx];
dset[0] = '/';
dset[1] = '\0';
strcat(dset,groupname);
strcat(dset,"/stack");
H5LTread_dataset_int(file_id,dset,data);
for (size_t i=0; i<nrow; i++) {
for (size_t j=0; j<ncol; j++) {
double y[nsamp];
if (gain == 1.0 & read_out_var == 0.0) {
for (size_t k=0; k<nsamp; k++)
// No variance stabilization
y[k] = (double) data[(i*ncol+j)*nsamp+k];
} else {
for (size_t k=0; k<nsamp; k++)
// Do variance stabilization
y[k] = 2*sqrt(((double) data[(i*ncol+j)*nsamp+k])/gain+read_out_var);
}
double mean = gsl_stats_mean(y, 1, nsamp);
double variance = gsl_stats_variance_m(y, 1, nsamp, mean);
printf("%g %g\n",mean,variance);
}
}
}
free(data);
H5Fclose (file_id);
We compile the code with:
gcc -W -g -o code/adu_mu_s2 code/adu_mu_s2.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We test the memory usage of the program with valgrind:
valgrind ./code/adu_mu_s2 > dump
==6471== Memcheck, a memory error detector ==6471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==6471== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==6471== Command: ./code/adu_mu_s2 ==6471== ==6471== ==6471== HEAP SUMMARY: ==6471== in use at exit: 0 bytes in 0 blocks ==6471== total heap usage: 18,263 allocs, 18,263 frees, 114,336,201 bytes allocated ==6471== ==6471== All heap blocks were freed -- no leaks are possible ==6471== ==6471== For counts of detected and suppressed errors, rerun with: -v ==6471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
set xlabel "Mean(ADU)"
set ylabel "Variance(ADU)"
set grid
unset key
plot "< ./code/adu_mu_s2" using 1:2 with points \
pointtype 0 linecolor rgb "black"
We write a program reading from the stdin
the output of adu_mu_s2
and doing a weighted linear regression on these data.
<<adu_mu_s2_fit-include>>
int main(int argc, char *argv[])
{
<<adu_mu_s2_fit-usage>>
<<adu_mu_s2_fit-args>>
<<adu_mu_s2_fit-read-data>>
<<adu_mu_s2_fit-fit>>
<<adu_mu_s2_fit-print-results>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_fit.h>
#include <gsl/gsl_cdf.h>
static char usage[] = \
"usage: %s --sample_size=integer\n\n"\
" --sample_size <positive integer>: the number of pixels times the number of exposures\n\n"
" The program reads data from the stdin and performs a weighter linear fit.\n"
" It prints to the stdout the results\n\n";
size_t sample_size;
{int opt;
static struct option long_options[] = {
{"sample_size",required_argument,NULL,'s'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hs:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 's':
{
sample_size = (size_t) atoi(optarg);
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
}
double mean[sample_size];
double variance[sample_size];
double w[sample_size];
size_t data_idx=0;
double mu,s2;
while (fscanf(stdin, "%lg %lg", &mu, &s2) == 2 & \
data_idx < sample_size) {
mean[data_idx] = mu;
variance[data_idx] = s2;
w[data_idx] = 0.5*99.0/s2/s2;
data_idx++;
}
double c0, c1, cov00, cov01, cov11, chisq;
gsl_fit_wlinear(mean, 1, w, 1, variance, 1, sample_size,
&c0, &c1, &cov00, &cov01, &cov11,
&chisq);
printf ("Best fit: Variance(ADU) = %g + %g Mean(ADU)\n", c0, c1);
printf ("Covariance matrix:\n");
printf ("[ %g, %g\n %g, %g]\n",
cov00, cov01, cov01, cov11);
printf ("Residual sum of squares = %g\n", chisq);
double gain_se = sqrt(cov11)*gsl_cdf_tdist_Pinv(0.975,(double)sample_size-3.0);
printf ("The estimated gain is: %g\n",c1);
printf ("A 0.95 confidence interval for the gain is:\n (%g,%g)\n",
c1-gain_se,c1+gain_se);
s2 = c0/gsl_pow_2(c1);
printf ("The estimated read-out variance is: %g\n",s2);
double s2_se = sqrt(cov00/gsl_pow_2(c1)+cov11*gsl_pow_2(2*c0/gsl_pow_3(c1)));
s2_se *= gsl_cdf_tdist_Pinv(0.975,(double)sample_size-3.0);
printf ("A 0.95 confidence interval for the read-out variance is:\n (%g,%g)\n",
s2-s2_se,s2+s2_se);
We compile the code with:
gcc -W -g -o code/adu_mu_s2_fit code/adu_mu_s2_fit.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We run it with:
./code/adu_mu_s2 | ./code/adu_mu_s2_fit --sample_size=48000
Best fit: Variance(ADU) = 5.61141 + 0.139838 Mean(ADU) Covariance matrix: [ 0.02125, -1.57125e-05 -1.57125e-05, 2.05361e-08] Residual sum of squares = 49764.5 The estimated gain is: 0.139838 A 0.95 confidence interval for the gain is: (0.139557,0.140119) The estimated read-out variance is: 286.959 A 0.95 confidence interval for the read-out variance is: (284.613,289.305)
We use valgrind with:
./code/adu_mu_s2 > dump
valgrind ./code/adu_mu_s2_fit --sample_size=48000 < dump > toto
==7471== Memcheck, a memory error detector ==7471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==7471== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==7471== Command: ./code/adu_mu_s2_fit --sample_size=48000 ==7471== ==7471== ==7471== HEAP SUMMARY: ==7471== in use at exit: 0 bytes in 0 blocks ==7471== total heap usage: 2 allocs, 2 frees, 8,192 bytes allocated ==7471== ==7471== All heap blocks were freed -- no leaks are possible ==7471== ==7471== For counts of detected and suppressed errors, rerun with: -v ==7471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
set xlabel "Mean(ADU)"
set ylabel "Variance(ADU)"
set grid
unset key
g = 0.14
s2 = 290
pred(x) = s2*g**2 + g*x
plot "< ./code/adu_mu_s2" using 1:2 with points \
pointtype 0 linecolor rgb "black",\
pred(x) with lines linewidth 2 linecolor rgb "red"
set xlabel "Fitted Value"
set ylabel "Normalized residuals"
set grid
unset key
g = 0.14
s2 = 290
pred(x) = s2*g**2 + g*x
zero(x) = 0
plot "< ./code/adu_mu_s2" using (pred($1)):((pred($1)-$2)*sqrt(99.0*0.5)/$2) \
with points pointtype 0 linecolor rgb "black",\
zero(x) with lines linewidth 2 linecolor rgb "red"
set xlabel "Mean of var. stab. ADU"
set ylabel "Variance of var. stab. ADU"
set grid
unset key
one(x) = 1
plot "< ./code/adu_mu_s2 --gain=0.14 --variance=290" using 1:2 with points \
pointtype 0 linecolor rgb "black",\
one(x) with lines linecolor rgb "red" linewidth 2
Now that we know how to stabilize the variance we can write a program that detects automatically the “Region Of Interest” (ROI), based on the simple idea that a ROI is a set of pixels whose fluorescence intensity is not constant in time (“something happens there”).
<<roi-include>>
int main(int argc, char *argv[])
{
<<roi-usage>>
<<roi-args>>
<<roi-open-file-and-read-dset>>
<<roi-stabilize-variance-and-get-pRSS>>
<<roi-stabilize-print-results>>
<<roi-close-file-free-memory>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_cdf.h>
#include <gsl/gsl_statistics_double.h>
static char usage[] = \
"usage: %s -f --file=string -d --dset=string -g --CCD_gain=real ...\n"
" ... -v --CCD_variance=real [-t --threshold=real] [-m --return_matrix]\n\n"
" -f --file <character string>: data file name (e.g. Data_POMC.hdf5)\n"
" -d --dset <character string>: data set name (e.g. 'stack') shoud be a 3D object\n"
" -g --CCD_gain <positive real>: CCD gain (e.g. 0.14)\n"
" -v --CCV_variance <positive real>: CCD read-out variance (e.g. 290)\n"
" -m --return_matrix if set a matrix whose values at pixel i,j are log(1-F(rss_i,j))\n"
" where F is the CDF of a chi square distribution with nsamp-1 degrees of freedome\n"
" nsamp being the number of measurements (exposures) made on each pixel. If unset\n"
" (default) the list of pixel in the ROI is returned as (i1,j1),(i2,j2),... \n"
" -t --threshold <negative real>: real number (e.g. -300 default), the level\n"
" under which the log(1-F(rss_i,j)) is considered significant\n\n"
" The program opens 'file' and gets dataset 'dset' (that should be a 3D array) containing\n"
" ADUs. For each pixel i,j at each time index k it stabilizes the variance:\n"
" adu_i,j,k -> z_i,j,k = 2*sqrt(adu_i,j,k/CCD_gain+CCD_variance)\n"
" the variance of the z_i,j,k should then be UNIFORMLY 1. It then computes the mean value\n"
" along time for each pixel:\n"
" z_i,j_m = (z_i,j,1 + ... + z_i,j,K)/K\n"
" and the residual sum of squares statistic:\n"
" rss_i,j = ((z_i,j,1-z_i,j_m)^2+...+(z_i,j,K-z_i,j_m)^2))\n"
" under the null hypothesis (no signal in the pixel) this statistic should have a chi-square\n"
" distribution with K-1 degrees of freedom. A matrix whose element i,j is:\n"
" log(1-F(rss_i,j))\n"
" where F is the CDF of a chi-square distribution with K-1 degrees of freedom is constructed.\n"
" If parameter '--return_matrix' is set (not the default) this matrix is printed to the stdout,\n"
" otherwise the value of the F(rss_i,j) is compared to the Bonferroni corrected threshold set\n"
" by '--threshold' (default 0.99) and if the value is larger, 'i j' is printed to the stdout.\n"
" This is done for each pixel and the printed pairs are separated by '\n'.\n\n";
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
double threshold = -300.;
double gain,variance;
int print_matrix=0;
{int opt;
static struct option long_options[] = {
{"file",required_argument,NULL,'f'},
{"dset",required_argument,NULL,'d'},
{"CCD_gain",required_argument,NULL,'g'},
{"CCD_variance",required_argument,NULL,'v'},
{"threshold",optional_argument,NULL,'t'},
{"return_matrix",no_argument,NULL,'m'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
char *dset_name;
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hmf:d:g:v:t:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
filename = optarg;
}
break;
case 'd':
{
dset_name = optarg;
}
break;
case 'g':
{
gain = (double) atof(optarg);
if (gain <= 0.0) {
fprintf(stderr,"CCD gain must be > 0\n");
return -1;
}
}
break;
case 'v':
{
variance = (double) atof(optarg);
if (variance < 0.0) {
fprintf(stderr,"CCD variance must be >= 0\n");
return -1;
}
}
break;
case 't':
{
threshold = (double) atof(optarg);
if (0.0 <= threshold) {
fprintf(stderr,"threshold should be\n");
return -1;
}
}
break;
case 'm':
{
print_matrix=1;
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
strcat(dset,dset_name);
}
// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);
gsl_matrix *pRSS = gsl_matrix_alloc(nrow,ncol);
free(data);
gsl_matrix_free(pRSS);
H5Fclose (file_id);
for (size_t i=0; i<nrow; i++) {
for (size_t j=0; j<ncol; j++) {
double z[nsamp];
for (size_t k=0; k<nsamp; k++) {
double adu = (double) data[(i*ncol+j)*nsamp+k];
z[k] = 2.0*sqrt(adu/gain+variance);
}
double rss = gsl_stats_variance(z,1,nsamp)*((double)nsamp-1.0);
gsl_matrix_set(pRSS,i,j,log(gsl_cdf_chisq_Q(rss, (double)nsamp-1.0)));
}
}
if (print_matrix) {
printf("# Matrix with %d rows and %d columns\n",
(int) nrow, (int) ncol);
for (size_t i=0; i<nrow; i++) {
printf("%g",gsl_matrix_get(pRSS,i,0));
for (size_t j=1; j<ncol; j++)
printf(", %g",gsl_matrix_get(pRSS,i,j));
printf("\n");
}
} else {
size_t n_pixel=0;
for (size_t i=0; i<nrow; i++) {
for (size_t j=1; j<ncol; j++) {
double stat = gsl_matrix_get(pRSS,i,j);
if (stat <= threshold) {
// Pixel part of ROI
if (n_pixel==0)
printf("%d %d", (int) i, (int) j);
else
printf("\n%d %d", (int) i, (int) j);
n_pixel++;
}
}
}
if (n_pixel > 0)
printf("\n");
fprintf(stderr,"%d pixels in the ROI\n", (int) n_pixel);
}
We compile the code with:
gcc -W -g -o code/roi code/roi.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We run it storing the output matrix in rss_stat
with:
./code/roi --file=Data_POMC.hdf5 --dset=stack --CCD_gain=0.14 \
--CCD_variance=290 -m > rss_stat
We can also use valgrind (and the short version of the arguments):
valgrind ./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 \
-v 290 -m > dump
==7239== Memcheck, a memory error detector ==7239== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==7239== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==7239== Command: ./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -m ==7239== ==7239== ==7239== HEAP SUMMARY: ==7239== in use at exit: 0 bytes in 0 blocks ==7239== total heap usage: 6,579 allocs, 6,579 frees, 20,302,218 bytes allocated ==7239== ==7239== All heap blocks were freed -- no leaks are possible ==7239== ==7239== For counts of detected and suppressed errors, rerun with: -v ==7239== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If instead of the matrix we want the indexes of the pixels making the ROI we use:
./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 > roi_pxl
cat roi_pxl
12 pixels in the ROI 26 36 26 37 26 38 26 39 27 36 27 37 27 38 27 39 28 36 28 37 28 38 28 39
set palette grey
set multiplot layout 1,2
set xrange [0:79]
set yrange [0:59]
set title "Full field"
plot 'rss_stat' matrix with image
set xrange [33:43]
set yrange [23:33]
set title "Cell body"
plot 'rss_stat' matrix with image
unset multiplot
We reached the last task! Having identfied automatically the pixels of the ROI, we now estimate our model.
- We have been (very) conservative and have kept as our ROI the pixels having an
$log\left(1 - F_{χ_{K-1}^2}(RSS)\right) ≤ -300$ . - We are left with 12 pixels.
- We are going to model the fluorescence intensity of each of these pixels by: \[S_{i,j}(t) = φ_{i,j} \, f(t) + b \; ,\] where
$f(t)$ is a signal time course to all pixels of the ROI,$φ_{i,j}$ is a pixel specific parameter and$b$ is a background fluorescence assumed identical for each pixel. - The time
$t$ is in fact a discrete variable,$t = δ \, k$ ($δ$ = 150 ms) and we are seeking a pointwise estimation:$\{f_1,f_2,\ldots,f_K\}$ ($K$ = 168) where$f_k = f(δ \, k)$ . - We end up with 12 (
$φ_{i,j}$ ) + 168 ($f_k$ ) + 1 ($b$ ) = 181 parameters for 12 x 168 = 2016 measurements. - We need to add a constraint since with our model specification: \[S_{i,j,k} = φ_{i,j} \, f_k + b \; ,\] we can multiply all the
$φ_{i,j}$ by 2 and divide all the$f_k$ by 2 and get the same prediction. - We are going to set
$f_k=1$ for the first 5 time points (the stimulation comes at the 11th) and our pointwise estimation relates to what is usually done with this type of data,$Δ S(t) / S_0$ (where$S_0$ is a baseline average) through: \[Δ S(t) / S_0 = \frac{S(t) - S_0}{S_0} = f(t) - 1 + \mathrm{noise}\, .\] - Notice that no independent background measurement is used.
- With variance stabilization we end up minimizing: \[RSS\left(b,(φ_{i,j}),(f_k)_{k=6,\ldots,168}\right) = ∑_{(i,j) ∈ \mathrm{ROI}} ∑_{k=1}^{168} \left(Z_{ijk}-F_{ijk}\right)^2 \, ,\] where \[Z_{ijk} = 2*\sqrt{ADU_{ijk}/\hat{G}+\hat{σ}_R^2}\] and \[F_{ijk} = 2*\sqrt{φ_{i,j} \, f_k + b + \hat{σ}_R^2}\, .\]
- If our model is correct we should have: \[RSS\left(\hat{b},(\hat{φ}_{i,j}),(\hat{f}_k)_{k=6,\ldots,168}\right) ∼ χ^2_{12 × 168 - 176}\, .\]
- The method also generates confidence intervals for the estimated parameters.
We will be using the gsl
nonlinear least-squares functions, we will have to define a residual returning function that takes as its first argument a (gsl-)vector of parameters. This was not an issue in Sec. Why does a proper noise model matter? since only a single parameter model was considered. Now we will be dealing with a 175 parameters model. We must therefore decide how these parameters are laid out in the parameter vector. We must also decide how the data are organized.
We have P=12 pixels in our ROI and to each of those an ADU sequence with K=168 elements is associated. So we can represent these data as PxK (gsl-)matrix.
Our
We get the initial guess for
We now write program fit_pwc_to_roi
where “pwc” is short for “piecewise constant”.
<<fit_pwc_to_roi-include>>
<<fit_pwc_to_roi-macro>>
<<fit_pwc_to_roi-data-structure>>
<<fit_pwc_to_roi-residual-fct>>
<<fit_pwc_to_roi-callback-fct>>
int main(int argc, char *argv[])
{
<<fit_pwc_to_roi-usage>>
<<fit_pwc_to_roi-args>>
<<fit_pwc_to_roi-print-info>>
<<fit_pwc_to_roi-read-data>>
<<fit_pwc_to_roi-initial-guess>>
<<fit_pwc_to_roi-initialize-data-structure>>
<<fit_pwc_to_roi-initialize-gsl_multifit_nlinear_fdf>>
<<fit_pwc_to_roi-allocate-and-initialize-solver>>
<<fit_pwc_to_roi-inital-rss>>
<<fit_pwc_to_roi-do-nls>>
<<fit_pwc_to_roi-print-info-final>>
<<fit_pwc_to_roi-print-results>>
<<fit_pwc_to_roi-free-memory>>
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_multifit_nlinear.h>
The baseline defined next is the duration (in sample points) during which the signal is assumed constant.
#define baseline 5
To compute the residuals (difference between observations and predictions) we need the data from the ROI. Member stab_pxl
of the following structure will contain a pointer to the (variance stabilized) data organized as a matrix with as many rows as pixels in the ROI and as many columns as sample points (as discussed above). Member variance
of the structure will contain the CCD variance.
struct data {
gsl_matrix * stab_pxl; // matrix of stabilized observations from ROI
double variance; // the CCD variance
};
static char usage[] = \
"usage: %s -f --file=string -d --dset=string -g --CCD_gain=real ...\n"
" ... -v --CCD_variance=real -p --pixels=string \n\n"
" -f --file <character string>: data file name (e.g. Data_POMC.hdf5)\n"
" -d --dset <character string>: data set name (e.g. 'stack') shoud be a 3D object\n"
" -g --CCD_gain <positive real>: CCD gain (e.g. 0.14)\n"
" -v --CCV_variance <positive real>: CCD read-out variance (e.g. 290)\n"
" -p --pixels <character string>: name of a file containing the row and column\n"
" indexes of the pixels belonging to the ROI on two columns\n\n"
" The program opens 'file' and gets dataset 'dset' (that should be a 3D array) containing\n"
" ADUs. For each pixel i,j in the list specified by pixels at each time index k it\n"
" stabilizes the variance:\n"
" adu_i,j,k -> z_i,j,k = 2*sqrt(adu_i,j,k/CCD_gain+CCD_variance)\n"
" the variance of the z_i,j,k should then be UNIFORMLY 1. It then fits by nonlinear\n"
" least-squares a model where the piecewise constant time course of the 'signal'\n"
" of each pixel of the ROI is assumed to be the same. This 'signal' is then\n"
" multiplied by a pixel specific factor (giving a predicted calcium induced)\n"
" fluorescence and a common autofluorescence is added to each pixel.\n\n"
" The program prints to the stderr fit diagnostics. It also prints to the stdout\n"
" the 'signal' values together with the lower and upper limits of a 95% pointwise\n"
" confidence intervals. Two blanck lines\n"
" follow and, one as many columns as there are pixels in the ROI, the model\n"
" predictions for each pixels are printed (a header gives the row and column\n"
" index of each pixel). After two more blank lines the stabilized observations\n"
" are printed in the same format.\n\n";
This code block reads the program parameters. It assigns (after allocating memory if necessary):
- filename
- pointer to a character string with the name of the data file.
- dset
- pointer to a character string with the name of the data set.
- gain, variance
- values of the CCD gain and read-out variance.
- pxl_row, pxl_row
- 1D arrays containing the row and column indexes of the pixels making the ROI.
- nb_pxl
- the number of pixels in the ROI.
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
double gain,variance;
size_t *pxl_row,*pxl_col;
size_t nb_pxl=0;
{int opt;
static struct option long_options[] = {
{"file",required_argument,NULL,'f'},
{"dset",required_argument,NULL,'d'},
{"CCD_gain",required_argument,NULL,'g'},
{"CCD_variance",required_argument,NULL,'v'},
{"pixels",optional_argument,NULL,'p'},
{"help",no_argument,NULL,'h'},
{NULL,0,NULL,0}
};
char *dset_name;
int long_index =0;
while ((opt = getopt_long(argc,argv,
"hf:d:g:v:p:",
long_options,
&long_index)) != -1) {
switch(opt) {
case 'f':
{
filename = optarg;
}
break;
case 'd':
{
dset_name = optarg;
}
break;
case 'g':
{
gain = (double) atof(optarg);
if (gain <= 0.0) {
fprintf(stderr,"CCD gain must be > 0\n");
return -1;
}
}
break;
case 'v':
{
variance = (double) atof(optarg);
if (variance < 0.0) {
fprintf(stderr,"CCD variance must be >= 0\n");
return -1;
}
}
break;
case 'p':
{
char *fname = optarg;
char line[BUFSIZ];
FILE *fp;
if (NULL == (fp = fopen(fname,"r"))) {
fprintf(stderr,"Could not open file %s\n",fname);
return -1;
}
while (fgets(line,BUFSIZ,fp))
nb_pxl++;
rewind(fp);
pxl_row = malloc(nb_pxl*sizeof(size_t));
pxl_col = malloc(nb_pxl*sizeof(size_t));
int row_idx,col_idx;
size_t pxl_idx=0;
while (fscanf(fp,"%d %d",&row_idx,&col_idx)==2) {
pxl_row[pxl_idx] = (size_t) row_idx;
pxl_col[pxl_idx] = (size_t) col_idx;
pxl_idx++;
}
fclose(fp);
}
break;
case 'h': printf(usage,argv[0]);
return -1;
default : fprintf(stderr,usage,argv[0]);
return -1;
}
}
strcat(dset,dset_name);
}
As usual this code block frees the allocated memory.
free(pxl_row);
free(pxl_col);
gsl_matrix_free(pxl_data);
gsl_vector_free(f);
gsl_vector_free(phi);
gsl_vector_free(par);
gsl_multifit_nlinear_free (w);
gsl_matrix_free(covar);
This code block prints to the stderr
some general information concerning the fit that will be performed next.
fprintf(stderr,"Fitting now piecewise constant signal within ROI for\n"
"dataset %s from file %s\n",dset,filename);
fprintf(stderr,"The row and column indexes of the pixels making the\n"
"ROI are:\n");
fprintf(stderr," (%d,%d)", (int) pxl_row[0],
(int) pxl_col[0]);
for (size_t pxl_idx=1; pxl_idx<nb_pxl; pxl_idx++)
if (pxl_row[pxl_idx] > pxl_row[pxl_idx-1])
fprintf(stderr,"\n (%d,%d)", (int) pxl_row[pxl_idx],
(int) pxl_col[pxl_idx]);
else
fprintf(stderr," (%d,%d)", (int) pxl_row[pxl_idx],
(int) pxl_col[pxl_idx]);
fprintf(stderr,"\nThe CCD gain and read-out variance are:\n"
" %g and %g\n\n",gain,variance);
This code block reads the data from the HDF5 file, allocates pxl_data
, a gsl_matrix
, and assigns ADU counts of the pixels making the ROI to it.
// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);
gsl_matrix *pxl_data = gsl_matrix_alloc(nb_pxl,nsamp);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
size_t i = pxl_row[pxl_idx];
size_t j = pxl_col[pxl_idx];
for (size_t k=0; k<nsamp; k++) {
double adu = (double) data[(i*ncol+j)*nsamp+k];
gsl_matrix_set(pxl_data,pxl_idx,k,adu);
}
}
free(data);
H5Fclose (file_id);
Initialization of a data
structure.
struct data d = { .stab_pxl = pxl_data,
.variance = variance};
Initial guesses are obtained here. Notice the use of the =vector_views=, phiphi
and ff
, they are very handy for making the match between “directly meaningful parameters” and their counter part on the parameter vector (par
here). We are also working on a log scale since every model parameter is positive (that’s a way of enforcing it). The two variables that are initialized here and are used later on are:
- n_par
- the number of model parameters.
- par
- a pointer to a
gsl_vector
containing the (log of the) model parameters.
gsl_vector *f = gsl_vector_calloc(nsamp);
gsl_vector *phi = gsl_vector_calloc(nb_pxl);
double b=100;
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
gsl_vector_const_view row = gsl_matrix_const_row(pxl_data,pxl_idx);
gsl_vector_add(f,&row.vector);
}
gsl_vector_scale(f, 1.0/(double) nb_pxl);
double baseline_mean=0;
for (size_t k=0; k<baseline; k++)
baseline_mean += gsl_vector_get(f,k);
baseline_mean /= (double) baseline;
gsl_vector_scale(f, 1.0/baseline_mean);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
double sum=0;
for (size_t k=0; k<baseline; k++)
sum += gsl_matrix_get(pxl_data,pxl_idx,k);
sum /= (double) baseline;
gsl_vector_set(phi,pxl_idx,sum-b);
}
size_t n_par = 1+nb_pxl+(nsamp-baseline);
gsl_vector *par = gsl_vector_alloc(n_par);
gsl_vector_set(par,0,log(b));
gsl_vector_view phiphi = gsl_vector_subvector(par,1,nb_pxl);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
gsl_vector_set(&phiphi.vector,pxl_idx,log(gsl_vector_get(phi,pxl_idx)));
gsl_vector_view ff = gsl_vector_subvector(par,1+nb_pxl,nsamp-baseline);
for (size_t k=baseline; k<nsamp; k++)
gsl_vector_set(&ff.vector,k-baseline,log(gsl_vector_get(f,k)));
// Stabilize variance of observations
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
for (size_t k=0; k<nsamp; k++) {
// stabilize variance
double z = 2*sqrt(gsl_matrix_get(pxl_data,pxl_idx,k)/gain+variance);
gsl_matrix_set(pxl_data,pxl_idx,k,z);
}
}
Definition of the “workhorse” of this program, the function that returns the vector of residuals (residual
) from the model parameters (log_par
) and the data (data
). Again, vector (and matrix) views are heavily used.
int
residual_fct(const gsl_vector *log_par, void *data,
gsl_vector * residual)
{
gsl_matrix *obs = ((struct data *)data)->stab_pxl;
double variance = ((struct data *)data)->variance;
size_t n_pxl = obs->size1;
size_t n_obs = obs->size2;
size_t log_par_length = log_par->size;
double auto_fluo = exp(gsl_vector_get(log_par,0));
gsl_vector_const_view phi = gsl_vector_const_subvector(log_par, 1, n_pxl);
gsl_vector_const_view f = gsl_vector_const_subvector(log_par, 1+n_pxl,n_obs-baseline);
for (size_t pxl_idx=0; pxl_idx<n_pxl; pxl_idx++) {
gsl_vector_const_view row = gsl_matrix_const_row(obs,pxl_idx);
gsl_vector_view resid = gsl_vector_subvector(residual, pxl_idx*n_obs, n_obs);
double phi_val = exp(gsl_vector_get(&phi.vector,pxl_idx));
for (size_t k=0; k<baseline; k++) {
double z = gsl_vector_get(&row.vector,k);
double sF = 2*sqrt(auto_fluo+phi_val+variance);
gsl_vector_set(&resid.vector,k,z-sF);
}
for (size_t k=baseline; k<n_obs; k++) {
double z = gsl_vector_get(&row.vector,k);
double s = exp(gsl_vector_get(&f.vector,k-baseline));
double sF = 2*sqrt(auto_fluo+phi_val*s+variance);
gsl_vector_set(&resid.vector,k,z-sF);
}
}
return GSL_SUCCESS;
}
A simple callback function definition that prints to the stderr
the iteration number together with the current RSS value.
void
callback(const size_t iter, void *params,
const gsl_multifit_nlinear_workspace *w)
{
gsl_vector *f = gsl_multifit_nlinear_residual(w);
fprintf(stderr, "iter %2zu: RSS = %.4f\n",
iter,
gsl_pow_2(gsl_blas_dnrm2(f)));
}
As its name says, initialization of the structure required by the least-squares solver.
gsl_multifit_nlinear_fdf fdf;
fdf.f = residual_fct;
fdf.df = NULL; /* set to NULL for finite-difference Jacobian */
fdf.fvv = NULL; /* not using geodesic acceleration */
fdf.n = nsamp*nb_pxl;
fdf.p = n_par;
fdf.params = &d;
Now the initialization of the solver itself.
const gsl_multifit_nlinear_type *T = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params = gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w = gsl_multifit_nlinear_alloc(T,
&fdf_params,
nsamp*nb_pxl,
n_par);
/* initialize solver with starting point */
gsl_multifit_nlinear_init (par, &fdf, w);
We get the initial RSS value (the one given by the initial guess of the model parameters).
gsl_vector *residual = gsl_multifit_nlinear_residual(w);
double chisq0;
gsl_blas_ddot(residual, residual, &chisq0);
Now we run the solver.
const double xtol = 1e-10;
const double gtol = 1e-10;
const double ftol = 0.0;
/* solve the system with a maximum of 200 iterations */
int info;
int status = gsl_multifit_nlinear_driver(200, xtol, gtol, ftol,
callback, NULL, &info,
w);
/* compute covariance of best fit parameters */
gsl_matrix *J = gsl_multifit_nlinear_jac(w);
gsl_matrix *covar = gsl_matrix_alloc (1+nb_pxl+nsamp-baseline,
1+nb_pxl+nsamp-baseline);
gsl_multifit_nlinear_covar (J, 0.0, covar);
/* compute final cost */
double chisq;
gsl_blas_ddot(residual, residual, &chisq);
We print to the stderr
general information of the fit. In particular the reason for stopping the iterations and the RSS per degree of freedom (should be close to 1 for a good fit).
fprintf(stderr, "\n\nsummary from method '%s/%s'\n",
gsl_multifit_nlinear_name(w),
gsl_multifit_nlinear_trs_name(w));
fprintf(stderr, "number of iterations: %zu\n",
gsl_multifit_nlinear_niter(w));
fprintf(stderr, "function evaluations: %zu\n", fdf.nevalf);
fprintf(stderr, "Jacobian evaluations: %zu\n", fdf.nevaldf);
fprintf(stderr, "reason for stopping: %s\n",
(info == 1) ? "small step size" : "small gradient");
fprintf(stderr, "initial RSS = %f\n", chisq0);
fprintf(stderr, "final RSS = %f\n", chisq);
double dof = nsamp*nb_pxl - n_par;
fprintf(stderr, "RSS/dof = %g\n", chisq / dof);
fprintf (stderr, "status = %s\n\n", gsl_strerror (status));
We print to the stdout
what was announced in our usage
code block.
#define FIT(i) gsl_vector_get(w->x, i)
#define ERR(i) sqrt(gsl_matrix_get(covar,i,i))
printf("# Common background estimate: %g [%g,%g]\n\n",
exp(FIT(0)),exp(FIT(0)-1.96*ERR(0)),
exp(FIT(0)+1.96*ERR(0)));
// Print f estimation as a first data set
for (size_t k=0; k<baseline; k++)
printf("%g %g %g\n",1.0,1.0,1.0);
for (size_t k=baseline; k<nsamp; k++) {
double est = FIT(1+nb_pxl+k-baseline);
double err = ERR(1+nb_pxl+k-baseline);
printf("%g %g %g\n",
exp(est), exp(est-1.96*err),
exp(est+1.96*err));
}
// Print intensity estimations one column per pixel
printf("\n\n# Predicted variance stabilized signals\n");
double auto_fluo = exp(FIT(0));
printf("#");
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
printf(" (%d,%d)",(int) pxl_row[pxl_idx],(int) pxl_col[pxl_idx]);
printf("\n");
for (size_t k=0; k<baseline; k++) {
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
printf("%g ", 2*sqrt(exp(FIT(pxl_idx+1))+auto_fluo+variance));
printf("\n");
}
for (size_t k=baseline; k<nsamp; k++) {
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
printf("%g ", 2*sqrt(exp(FIT(pxl_idx+1))*exp(FIT(1+nb_pxl+k-baseline))+auto_fluo+variance));
printf("\n");
}
// Print observations one column per pixel
printf("\n\n# Observed stabilized signals\n");
printf("#");
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
printf(" (%d,%d)",(int) pxl_row[pxl_idx],(int) pxl_col[pxl_idx]);
printf("\n");
for (size_t k=0; k<nsamp; k++) {
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
printf("%g ", gsl_matrix_get(pxl_data,pxl_idx,k));
printf("\n");
}
We compile the code with:
gcc -W -g -o code/fit_pwc_to_roi code/fit_pwc_to_roi.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11
We run it and store the stdout
output to a file called roi_fit
:
./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl > roi_fit
Fitting now piecewise constant signal within ROI for dataset /stack from file Data_POMC.hdf5 The row and column indexes of the pixels making the ROI are: (26,36) (26,37) (26,38) (26,39) (27,36) (27,37) (27,38) (27,39) (28,36) (28,37) (28,38) (28,39) The CCD gain and read-out variance are: 0.14 and 290 iter 0: RSS = 17256272.3301 iter 1: RSS = 13998117.4429 iter 2: RSS = 7593455.7961 iter 3: RSS = 1663548.7841 iter 4: RSS = 143000.0538 iter 5: RSS = 54906.0470 iter 6: RSS = 40397.0454 iter 7: RSS = 23941.9594 iter 8: RSS = 12454.3227 iter 9: RSS = 6562.5716 iter 10: RSS = 3755.3448 iter 11: RSS = 2717.3812 iter 12: RSS = 2447.4664 iter 13: RSS = 2364.2423 iter 14: RSS = 2281.7851 iter 15: RSS = 2210.0332 iter 16: RSS = 2157.7101 iter 17: RSS = 2116.6896 iter 18: RSS = 2086.1365 iter 19: RSS = 2062.0695 iter 20: RSS = 2043.4441 iter 21: RSS = 2028.4456 iter 22: RSS = 2016.4435 iter 23: RSS = 2006.5716 iter 24: RSS = 1998.4771 iter 25: RSS = 1991.7092 iter 26: RSS = 1986.0608 iter 27: RSS = 1981.2822 iter 28: RSS = 1977.2434 iter 29: RSS = 1973.7948 iter 30: RSS = 1970.8489 iter 31: RSS = 1968.3371 iter 32: RSS = 1966.1716 iter 33: RSS = 1964.3094 iter 34: RSS = 1962.6948 iter 35: RSS = 1961.2984 iter 36: RSS = 1960.0846 iter 37: RSS = 1959.0317 iter 38: RSS = 1958.1156 iter 39: RSS = 1957.3199 iter 40: RSS = 1956.6276 iter 41: RSS = 1956.0261 iter 42: RSS = 1955.5034 iter 43: RSS = 1955.0498 iter 44: RSS = 1954.6563 iter 45: RSS = 1954.3157 iter 46: RSS = 1954.0212 iter 47: RSS = 1953.7672 iter 48: RSS = 1953.5486 iter 49: RSS = 1953.3612 iter 50: RSS = 1953.2009 iter 51: RSS = 1953.0645 iter 52: RSS = 1952.9489 iter 53: RSS = 1952.8516 iter 54: RSS = 1952.7701 iter 55: RSS = 1952.7024 iter 56: RSS = 1952.6468 iter 57: RSS = 1952.6015 iter 58: RSS = 1952.5651 iter 59: RSS = 1952.5363 iter 60: RSS = 1952.5140 iter 61: RSS = 1952.4970 iter 62: RSS = 1952.4846 iter 63: RSS = 1952.4758 iter 64: RSS = 1952.4700 iter 65: RSS = 1952.4663 iter 66: RSS = 1952.4643 iter 67: RSS = 1952.4633 iter 68: RSS = 1952.4630 iter 69: RSS = 1952.4629 iter 70: RSS = 1952.4629 iter 71: RSS = 1952.4629 iter 72: RSS = 1952.4629 iter 73: RSS = 1952.4629 iter 74: RSS = 1952.4629 iter 75: RSS = 1952.4629 iter 76: RSS = 1952.4629 iter 77: RSS = 1952.4629 iter 78: RSS = 1952.4629 iter 79: RSS = 1952.4629 iter 80: RSS = 1952.4629 iter 81: RSS = 1952.4629 iter 82: RSS = 1952.4629 iter 83: RSS = 1952.4629 iter 84: RSS = 1952.4629 iter 85: RSS = 1952.4629 iter 86: RSS = 1952.4629 iter 87: RSS = 1952.4629 iter 88: RSS = 1952.4629 iter 89: RSS = 1952.4629 iter 90: RSS = 1952.4629 iter 91: RSS = 1952.4629 iter 92: RSS = 1952.4629 iter 93: RSS = 1952.4629 iter 94: RSS = 1952.4629 summary from method 'trust-region/levenberg-marquardt' number of iterations: 94 function evaluations: 16668 Jacobian evaluations: 0 reason for stopping: small step size initial RSS = 17256272.330096 final RSS = 1952.462926 RSS/dof = 1.06112 status = success
We see that our RSS (or
valgrind ./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl > dump
==7270== Memcheck, a memory error detector ==7270== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==7270== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==7270== Command: ./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl ==7270== Fitting now piecewise constant signal within ROI for dataset /stack from file Data_POMC.hdf5 The row and column indexes of the pixels making the ROI are: (26,36) (26,37) (26,38) (26,39) (27,36) (27,37) (27,38) (27,39) (28,36) (28,37) (28,38) (28,39) The CCD gain and read-out variance are: 0.14 and 290 iter 0: RSS = 17256272.3301 iter 1: RSS = 13998117.4429 iter 2: RSS = 7593455.7961 iter 3: RSS = 1663548.7841 iter 4: RSS = 143000.0538 iter 5: RSS = 54906.0470 iter 6: RSS = 40397.0454 iter 7: RSS = 23941.9594 iter 8: RSS = 12454.3227 iter 9: RSS = 6562.5716 iter 10: RSS = 3755.3448 iter 11: RSS = 2717.3812 iter 12: RSS = 2447.4664 iter 13: RSS = 2364.2423 iter 14: RSS = 2281.7851 iter 15: RSS = 2210.0332 iter 16: RSS = 2157.7101 iter 17: RSS = 2116.6896 iter 18: RSS = 2086.1365 iter 19: RSS = 2062.0695 iter 20: RSS = 2043.4441 iter 21: RSS = 2028.4456 iter 22: RSS = 2016.4435 iter 23: RSS = 2006.5716 iter 24: RSS = 1998.4771 iter 25: RSS = 1991.7092 iter 26: RSS = 1986.0608 iter 27: RSS = 1981.2822 iter 28: RSS = 1977.2434 iter 29: RSS = 1973.7948 iter 30: RSS = 1970.8489 iter 31: RSS = 1968.3371 iter 32: RSS = 1966.1716 iter 33: RSS = 1964.3094 iter 34: RSS = 1962.6948 iter 35: RSS = 1961.2984 iter 36: RSS = 1960.0846 iter 37: RSS = 1959.0317 iter 38: RSS = 1958.1156 iter 39: RSS = 1957.3199 iter 40: RSS = 1956.6276 iter 41: RSS = 1956.0261 iter 42: RSS = 1955.5034 iter 43: RSS = 1955.0498 iter 44: RSS = 1954.6563 iter 45: RSS = 1954.3157 iter 46: RSS = 1954.0212 iter 47: RSS = 1953.7672 iter 48: RSS = 1953.5486 iter 49: RSS = 1953.3612 iter 50: RSS = 1953.2009 iter 51: RSS = 1953.0645 iter 52: RSS = 1952.9489 iter 53: RSS = 1952.8516 iter 54: RSS = 1952.7701 iter 55: RSS = 1952.7024 iter 56: RSS = 1952.6468 iter 57: RSS = 1952.6015 iter 58: RSS = 1952.5651 iter 59: RSS = 1952.5363 iter 60: RSS = 1952.5140 iter 61: RSS = 1952.4970 iter 62: RSS = 1952.4846 iter 63: RSS = 1952.4758 iter 64: RSS = 1952.4700 iter 65: RSS = 1952.4663 iter 66: RSS = 1952.4643 iter 67: RSS = 1952.4633 iter 68: RSS = 1952.4630 iter 69: RSS = 1952.4629 iter 70: RSS = 1952.4629 iter 71: RSS = 1952.4629 iter 72: RSS = 1952.4629 iter 73: RSS = 1952.4629 iter 74: RSS = 1952.4629 iter 75: RSS = 1952.4629 iter 76: RSS = 1952.4629 iter 77: RSS = 1952.4629 iter 78: RSS = 1952.4629 iter 79: RSS = 1952.4629 iter 80: RSS = 1952.4629 iter 81: RSS = 1952.4629 iter 82: RSS = 1952.4629 iter 83: RSS = 1952.4629 iter 84: RSS = 1952.4629 iter 85: RSS = 1952.4629 iter 86: RSS = 1952.4629 iter 87: RSS = 1952.4629 iter 88: RSS = 1952.4629 iter 89: RSS = 1952.4629 iter 90: RSS = 1952.4629 iter 91: RSS = 1952.4629 iter 92: RSS = 1952.4629 iter 93: RSS = 1952.4629 iter 94: RSS = 1952.4629 summary from method 'trust-region/levenberg-marquardt' number of iterations: 94 function evaluations: 16668 Jacobian evaluations: 0 reason for stopping: small step size initial RSS = 17256272.330096 final RSS = 1952.462926 RSS/dof = 1.06112 status = success ==7270== ==7270== HEAP SUMMARY: ==7270== in use at exit: 0 bytes in 0 blocks ==7270== total heap usage: 6,684 allocs, 6,684 frees, 29,471,706 bytes allocated ==7270== ==7270== All heap blocks were freed -- no leaks are possible ==7270== ==7270== For counts of detected and suppressed errors, rerun with: -v ==7270== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
We are happy to see that all our allocated memory was freed before returning.
We plot our
set grid
set xlabel "Time (s)"
set ylabel "f estimate"
unset key
plot [0:25][] 'roi_fit' index 0 using ($0*0.15):1 with lines\
linecolor rgb 'red' linewidth 2,\
'' index 0 using ($0*0.15):2 with lines\
linecolor rgb 'grey' linewidth 2,\
'' index 0 using ($0*0.15):3 with lines\
linecolor rgb 'grey' linewidth 2
We now build a figure showing the (variance stabilized) data of each of the 12 pixels of the ROI (in black) together with the fitted values (in red).
set xlabel ""; set ylabel ""
set yrange [135:185]; set xrange [0:25]
set xtics format ""; set ytics format ""
unset key; set grid
set multiplot layout 3,4 margins 0.1,0.9,0.1,0.95 spacing 0,0
do for[k=1:12] {\
plot 'roi_fit' index 1 using ($0*0.15):k with lines\
linecolor rgb 'red' linewidth 2,\
'' index 2 using ($0*0.15):k with lines\
linecolor rgb 'black' linewidth 1}
unset multiplot