Technical notes for the ENP 2017 course

1 Setup

Set variables for LaTeX export:

Redirect stderr output to stdoout so that it gets printed correctly (found on the Kitchin’s blog):

(setq org-babel-default-header-args:sh
      '((:prologue . "exec 2>&1") (:epilogue . ":"))
      )
(setq org-babel-use-quick-and-dirty-noweb-expansion t)

2 Introduction

What follows is a detailed description of the analysis presented during the course. The analysis is carried out with C programs, gnuplot (for the figures) and some shell commands and scripts. The same analysis (or almost the same) is “available” using R on zenodo and using Python on GitHub (the IPython notebook can be visualized directly with nbviewer).

2.1 Why `C`, `gnuplot` and the `shell`?

This document is the exploration of an idea: use the shell (bash or zsh) instead of the Python or R command line for interactive analysis and write the short functions performing the actual analysis in C. The motivation for this exploration comes from two books by Ben Klemens: Modeling With Data and 21st Century C. The main advantages of C compared to the other two languages are:

Its stability (the programs written here are very likely to run unchanged in 20 years from now; what can be sure that this won’t be true with Python).
The development tools that come with it are just spectacular (see the very short and very clear book of Brian Gough An Introduction to GCC to understand what I mean by that).

2.2 Required software and libraries

Since a Bash or a Z shell are going to be used, Windows users will have to install Cygwin, Linux and MacOS users should have the bash shell by default and the zsh shell readily available from their package manager. To dig deeper into the amazing possibilities (and spectacular editorial support) of these tools, check From Bash to Z Shell. Conquering the Command Line by Kiddle, Peek and Stephenson.

The no-shell codes are going to be written in C, meaning that a C compiler together with the “classical” development tools (make, etc) are required. I’m going to use =gcc= here.

The heavy computational work is going to be performed mainly by the gsl (the GNU Scientific Library) that is easily installed through your package manager (from now one, for windows users, the “package manager” refers to the one of Cygwin). The graphs are be generated with gnuplot; for a quick tutorial check http://physicspmb.ukzn.ac.za/index.php/Gnuplot_tutorial, for an easy to navigate set of (sophisticated) recipes check http://www.gnuplotting.org/. The data sets are in HDF5 format and the C library, as well as the command line tools, developed by the HDF5 group are going to be heavily used here.

2.3 A remark on the code presentation

The literate programming approach is used here. This means that the code is broken into “manageable” pieces that are individually explained (when just reading the code is not enough), they are then pasted together to give the code that will actually get compiled. These manageable pieces are called blocks and each block gets a name like: <<name-of-the-block>> upon definition. It is then referred to by this name when used in subsequent codes. See Schulte, Davison, Dye and Dominik (2010) A Multi-Language Computing Environment for Literate Programming and Reproducible Research for further explanations.

3 Getting the data

The data we will start working with can be found at the following address: http://xtof.disque.math.cnrs.fr/data/Data_POMC.hdf5 and are in HDF5 format. We will start by downloading them. We will then rewrite them in a FITS format. Readers might be interest in this format since it is simpler than HDF5 and its C library is easier to use than the one of HDF5. Both formats are (essentially) fully supported by Python libraries (=h5py= and =pyFits=). HDF5 is fully supported in R by =rhdf5= while the support for FITS is not as good (especially for writting) and is provided by =FITSio=

3.1 Downloading the data

The data can be download from the command line using wget as follows:

wget http://xtof.disque.math.cnrs.fr/data/Data_POMC.hdf5

3.2 `Python` code converting `HDF5` to `FITS`

Doing the conversion with Python is rather straightforward but requires, in addition to numpy, to modules: =h5py= and =PyFITS=. Once these two modules have been intalled–the =anaconda= distribution include h5py and astropy that is very similar to PyFITS–write the following in a file called h5_to_fits.py (see the =PyFITS= documentation for details):

#!/usr/bin/env python3
import h5py
import pyfits
import numpy as np

hdf_data = h5py.File("Data_POMC.hdf5",'r')
time_hdf = hdf_data['time'][...]
stack_hdf = hdf_data['stack'][...]
hdf_data.close()

pyfits.writeto('Data_POMC.fits', stack_hdf.swapaxes(0,2).swapaxes(1,2))
pyfits.append('Data_POMC.fits',time_hdf)

hdulist = pyfits.open('Data_POMC.fits',mode='update')
prihdr = hdulist[0].header
prihdr.set('AUTHOR','Andreas Pippow')
prihdr.set('LENGTH',340,'Excitation wavelength (nm)')
prihdr.set('DYE','Fura-2')
prihdr.set('UNITS','ADU')
prihdr.set('REF','Joucla et al (2013) Cell Calcium. 54(2):71-85.')
prihdr['comment'] = 'A stim. (depol. induced calcium entry)\
comes at time 527'
prihdr = hdulist[1].header
prihdr.set('UNITS','second')
prihdr.set('TIME',527,'Stimulation time')
hdulist.flush()
hdulist.close()

Once this code has been saved into a file called h5_to_fits.py, change its file permission to make executable and use it:

chmod u+x h5_to_fits.py
./h5_to_fits.py

After running this short code you will have file Data_POMC.fits in your working directory. You can quickly visualize it with ImageJ.

4 POMC Data visualization

4.1 Data summaries with the `HDF5` tools

We can start by using some of the “tools” that come with the HDF5 library in order to get a quick idea on the data file structure. We can do that with h5ls that lists (by default) the first level of a file:

h5ls Data_POMC.hdf5

stack                    Dataset {60, 80, 168}
time                     Dataset {168}

We see that two “Datasets” are included in the file, one named stack (a 3-dimensional dataset) and the other named time (one dimensional).

The more sophisticated program h5dump gives us more details with:

h5dump -n 1 Data_POMC.hdf5

HDF5 "Data_POMC.hdf5" {
FILE_CONTENTS {
 group      /
 attribute  /README
 dataset    /stack
 attribute  /stack/CCD chip dimensions
 attribute  /stack/Dye
 attribute  /stack/Excitation wavelength
 attribute  /stack/Recordings performed by
 attribute  /stack/Reference
 attribute  /stack/Units
 dataset    /time
 attribute  /time/Stimulation time
 attribute  /time/Units
 }
}

We can visualize the content of the README attribute with (result not shown):

h5dump -a README Data_POMC.hdf5

The content of attribute CCD chip dimensions of Dataset stack is most easily visualized with:

h5dump -N "CCD chip dimensions" Data_POMC.hdf5

HDF5 "Data_POMC.hdf5" {
ATTRIBUTE "CCD chip dimensions" {
   DATATYPE  H5T_STRING {
      STRSIZE H5T_VARIABLE;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SCALAR
   DATA {
   (0): "60 x 80 pixels"
   }
}
}

4.2 `C` code printing to `STDOUT` part of the data set for `gnuplot`

We will regenerate the first figure of the course with =gnuplot= and for that we need to extract the “interesting part” of the data file Data_POMC.hdf5. This will be done with a short C code that opens the file, gets the stack Dataset. By the “interesting part” we mean the specification of ranges of rows and columns of the CCD chip. The display we want to generate is a matrix of ADU time series (with the lowest interesting row of the CCD chip at the bottom and the lowest interesting column at the left). Each element of the matrix will correspond a pixel of the interesting part. The ADU time series will all be display using the same horizontal and vertical scales. Since we will display all these time series on a single plot, we will have to scale them. Our program will output data tailored for gnuplot display in ASCII format to the standard output. The data will be organized in two columns with the scaled time (first column) and scaled ADU (second column). Measurements corresponding to successive pixels are separated by a blank line (this allows us to plot the whole set of time series with a single and simple plot command in gnuplot, see pp 85-86 of the gnuplot manual). The output starts with “general information” on the data set entered as a gnuplot comment, that is, following a “#”. Our presentation of the code follows what is properly called literate programming, that is, the program is broken into logical pieces—code blocks whose names appear like <<code-block-name>>—and only its skeleton is presented first as here for our program read_POMC_stack that should be saved in a file named read_POMC_stack.c.

<<read_POMC_stack-include>>
<<read_POMC_stack-define>>

int main(int argc, char *argv[])
{
  <<read_POMC_stack-read-args>>
  <<read_POMC_stack-open-FILE-and-read-DSET>>
  <<read_POMC_stack-find-min-and-max-in-DSET>>
  <<read_POMC_stack-print-results>>
  <<read_POMC_stack-close-file>>
  return 0;

}

The first code block <<read_POMC_stack-include>> contains the necessary include directives. We need <stdio.h> to call the printing functions, <stdlib.h> for the atoi function and the other two <hdf5.h> and <hdf5_hl.h> to get the hdf5 library functions.

#include <stdio.h>
#include <stdlib.h>
#include <hdf5.h>
#include <hdf5_hl.h>

Code block <<read_POMC_stack-define>> contains expansions definitions. This is where the parameters of the program are defined (to keep the code short we pass only the limits of the interesting part as arguments): the file we are going to work on (FILE), the Dataset of interest within the file (DSET):

#define FILE "Data_POMC.hdf5"
#define DSET "stack"

Code block <<read_POMC_stack-read-args>> reads the four arguments of the program: first_row, last_row, first_col, last_col. The only check that is made is that four argument are given. This is very rudimentary, we should name the argument to get a proper code and make more checks, but this program is not meant for general use.

if (argc != 5) {
  fprintf(stderr,"Expecting four arguments\n");
  return -1;
}
size_t first_row = atoi(argv[1]);
size_t last_row = atoi(argv[2]);
size_t first_col = atoi(argv[3]);
size_t last_col = atoi(argv[4]);

Code block <<read_POMC_stack-open-FILE-and-read-DSET>> opens the FILE, gets information on the dimensions, initializes the number of rows (nrow), the number of columns (ncol) and the number of time points (nsamp) before reading the data, storing the result in the 3D array data:

// Open FILE
hid_t file_id = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,DSET,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read DSET in DATA
int data[nrow][ncol][nsamp];
H5LTread_dataset_int(file_id,"/stack",data);

The strategy used in the last code block will work fine for “small” datasets since the line int data[nrow][ncol][nsamp]; automatically allocates memory in the stack. If you run into problems running a similar code you would have to allocate “by hand” memory in the heap with something like:

int *data = malloc(nrow*ncol*nsamp*sizeof(int));

Since such an allocation is done by hand you would have to free the memory yourself before the return statement with:

free(data);

Within the code, statements like: data[i][j][k] should then be replaced by: data[(i*ncol+j)*nsamp+k].

Code block <<read_POMC_stack-find-min-and-max-in-DSET>> looks at each measurement (ADU) of the interesting part and find the smallest and largest values stored in variables adu_min and adu_max:

// Find out the smallest and largest observations
// in the selected part of DSET
int adu_max=0;
int adu_min=10000;
for (size_t i=first_row; i<last_row; i++) {
  for (size_t j=first_col; j<last_col; j++)  {
    for (size_t k=0; k<nsamp; k++) {
      double adu = data[i][j][k];
      if (adu < adu_min) adu_min=adu;
      if (adu > adu_max) adu_max=adu;
    }
  }
}

Code block <<read_POMC_stack-print-results>> prints some general information on the dataset and on the interesting part of it (as gnuplot comments) before printing the scaled time and adu:

// Print some info to STDOUT
printf("# Data set stack from file: %s\n",FILE);
printf("# Data set dimensions: (%d,%d,%d)\n",nrow,ncol,nsamp);
printf("# Using rows from %d (inclusive) to %d (exclusive)\n",first_row,last_row);
printf("# Using columns from %d (inclusive) to %d (exclusive)\n",first_col,last_col);
printf("# Minimal ADU in this range: %d; maximal value: %d\n",adu_min,adu_max);
double adu_delta = (double) (adu_max-adu_min);
// Write the DATA in a 2 columns format with time in the first and normalized
// ADU in the second
for (size_t i=first_row; i<last_row; i++) {
  double y_min = i-first_row;
  for (size_t j=first_col; j<last_col; j++)  {
    for (size_t k=0; k<nsamp; k++) {
      double adu = (data[i][j][k]-adu_min)/adu_delta+y_min;
      printf("%g %g\n",((double) k/nsamp+j-first_col),adu);
    }
    printf("\n");
  }
}

Code block <<read_POMC_stack-close-file>> close the hdf5 file:

H5Fclose (file_id);

Once the code has been properly “tangled” and stored in a file named read_POMC_stack.c (in the code sub-directory) we compile it with gcc (we could also do it with another compiler):

gcc -W -g -o code/read_POMC_stack code/read_POMC_stack.c -lhdf5 -lhdf5_hl -lm -std=gnu11

We then call our code and redirect the output to a file called stack.txt. Row 23 is the first row of interest while row 34 is the last of interest. Column 33 is the first of interest while column 43 is the last one (don’t forget we start counting at 0 and we are using Python convention):

./code/read_POMC_stack 23 35 33 44 > stack.txt

We can look at the first lines of stack.txt with the head program:

head stack.txt

# Data set stack from file: Data_POMC.hdf5
# Data set dimensions: (60,80,168)
# Using rows from 23 (inclusive) to 35 (exclusive)
# Using columns from 33 (inclusive) to 44 (exclusive)
# Minimal ADU in this range: 261; maximal value: 1118
0 0.0536756
0.00595238 0.0793466
0.0119048 0.0548425
0.0178571 0.0700117
0.0238095 0.0723454

Our figure is then simply generated with the following gnuplot commands:

unset key 
unset tics
set xlabel ""
set ylabel ""
plot 'stack.txt' using 1:2 with lines\
     linecolor rgb "black"

We get the second figure that shows the ADU time series of pixel (27,39) by first dumping the data to a file:

./code/read_POMC_stack 27 28 39 40 > sgl_pixel.txt

Then, within gnuplot:

set xlabel "Time (s)" 
set ylabel "ADU count"
set grid
set tics
unset key
plot 'sgl_pixel.txt' using (525+25*$1):(776+$2*(937-776)) \
     with lines linewidth 2 linecolor rgb "blue"

4.3 `C` code doing variance stabilization

We are going to use the =CFITSIO= library that should therefore be installed in order to compile the following code.

/** @file var_stab_fits.c
 *  @brief Stabilizes the variance given a CCD gain and read-out  
 *         variance of data contained in a FITS file.
 *
 *  The code is an adaptation of imarith from the set of  
 *  exemples of the CFITSIO library.
 *  @author C Pouzat <christophe.pouzat@parisdescartes.fr>
 */

#include <string.h>
#include <stdio.h>
#include <math.h>
#include "fitsio.h"

int main(int argc, char *argv[])
{
  if (argc != 5) { 
    printf("Usage: var_stab_fits infile gain s2 outfile \n");
    printf("\n");
    printf("Stablizes at 1 the variance of infile that is:\n"
	   "each pixel value v is replaced by 2*sqrt(v/g+s2)\n"
	   "where g is the gain and s2 the read-out variance\n"
	   "of the CCD chip.\n");
    printf("The variance stabilized version is" 
	   "stored in outfile.\n\n");
    return(0);
  }
  
  fitsfile *infptr; /* FITS file pointers */
  int status = 0;  /* CFITSIO status value MUST be initialized to zero! */
  fits_open_file(&infptr, argv[1], READONLY, &status); /* open input images */                     
  if (status) {
    /* print error message */    
    fits_report_error(stderr, status); 
    return(status);
  } else {
    printf("File %s opened.\n",argv[1]);
  }
  int innaxis;
  /* read dimensions */
  fits_get_img_dim(infptr, &innaxis, &status);  
  long innaxes[3] = {1,1,1};
  fits_get_img_size(infptr, 3, innaxes, &status);
  
  if (status) {
    /* print error message */
    fits_report_error(stderr, status); 
    return(status);
  }
  int check=1;
  if (innaxis > 3) {
    fprintf(stderr,"Error: images with > 3" 
	    "dimensions are not supported\n");
    check = 0;
  }
  
  double gain = atof(argv[2]);
  if (gain <= 0) {
    fprintf(stderr,"The CCD gain must be > 0\n");
    check = 0;
  }
  printf("The CCD chip gain is: %g\n",gain);
  double s2 = atof(argv[3]);
  if (s2 < 0) {
    fprintf(stderr,"The CCD read-out variance must be >= 0\n");
    check = 0;
  }
  printf("The CCD chip read-out variance is: %g\n",s2);
  
  fitsfile *outfptr;  /* FITS file pointers */
  /* create the new empty output file if the above checks are OK */
  if (check && !fits_create_file(&outfptr, argv[4], &status) )
  {
    printf("File %s opened.\n",argv[4]);
    /* copy all the header keywords from first image to new output file */
    fits_copy_header(infptr, outfptr, &status);
	  
    size_t npixels = innaxes[0];  /* no. of pixels to read in each row */
    /* mem for 1 row */
    double *inpix = malloc(npixels * sizeof(double)); 
    if (inpix == NULL) {
      fprintf(stderr,"Memory allocation error\n");
      return(1);
    }
    
    /* loop over all planes of the cube (2D images have 1 plane) */
    long firstpix[3] = {1,1,1};
    for (firstpix[2] = 1; firstpix[2] <= innaxes[2]; firstpix[2]++) {
      /* loop over all rows of the plane */
      for (firstpix[1] = 1; firstpix[1] <= innaxes[1]; firstpix[1]++) {
	/* Read both images as doubles, regardless of 
	   actual datatype.  Give starting pixel 
	   coordinate and no. of pixels to read. This 
	   version does not support undefined pixels 
	   in the image. */
	if (fits_read_pix(infptr, TDOUBLE, firstpix,
			  npixels, NULL, inpix, NULL, &status))
	  break;   /* jump out of loop on error */
	
	for(size_t ii=0; ii < npixels; ii++)
	  inpix[ii] = 2*sqrt(inpix[ii]/gain+s2);
	
	/* write new values to output image */
	fits_write_pix(outfptr, TDOUBLE, firstpix, npixels,
		       inpix, &status); 
      }
    }    /* end of loop over planes */

    fits_close_file(outfptr, &status);
    printf("File %s closed.\n",argv[4]);
    free(inpix);
  }

  fits_close_file(infptr, &status);
  printf("File %s closed.\n",argv[1]);
  /* print any error message */
  if (status) fits_report_error(stderr, status); 
  return(status);
}

If this code has been written in a file named var_stab_fits.c the compilation is done with:

gcc -Wall -g -o var_stab_fits -lcfitsio -lm var_stab_fits.c

And the function is called with:

./var_stab_fits Data_POMC.fits 0.14 290 Data_POMC_stab.fits

File Data_POMC.fits opened.
The CCD chip gain is: 0.14
The CCD chip read-out variance is: 290
File Data_POMC_stab.fits opened.
File Data_POMC_stab.fits closed.
File Data_POMC.fits closed.

5 Why does a proper noise model matter?

5.1 Figure `data = deterministic part + noise`

5.1.1 `C` code requirements

We write a short C code that takes:

F_inf: an asymptotic value whose default is 800.
Delta: a jump size whose default is 150.
beta: an inverse time constant whose default is 0.1.
t_onset: an onset time whose default is 5.
t_0: the time at which the observation starts (default of 0).
delta_t: the interval between successive observations (default of 0.15).
n_obs: the number of observations (default of 168).

The program returns a three columns text file containing:

first column: the observation times.
second column: the corresponding deterministic part.
third column: the observations (deterministic part + noise).

Here the deterministic part is given by: \begin{equation}\label{eq:deterministic-mono-exp} \mathtt{F}(t) = \mathtt{F\_\mathtt{inf}} + H(t-\mathtt{t\_\mathtt{onset}}) \mathtt{Delta} exp \left(-\mathtt{beta} (t-\mathtt{t\_\mathtt{onset}})\right) \, , \end{equation} where $H(\;)$ stands for the Heaviside step function, a function that is null for negative values of its argument and 1 otherwise. Since we are working in discrete time, t takes only a finite set of values given by t = t_0 + i x delta_t and we have: \begin{equation}\label{eq:deterministic-mono-exp-discrete} \mathtt{F}_i = \mathtt{F\_\mathtt{inf}} + H(\mathtt{t\_0}+i \, \mathtt{delta\_t}-\mathtt{t\_\mathtt{onset}}) \mathtt{Delta} exp \left(-\mathtt{beta} (\mathtt{t\_0}+i \, \mathtt{delta\_t}-\mathtt{t\_\mathtt{onset}})\right) \quad \mathrm{for} \quad i=0,1,\ldots,\mathtt{n\_\mathtt{obs}}-1 \, . \end{equation} The observations are generated by drawing a random number from a Poisson distribution whose parameter is the deterministic part at the time considered: \begin{equation}\label{eq:obs-mono-exp-discrete} \mathtt{Y}_i ∼ \mathtt{Pois}(\mathtt{F}_i) \, . \end{equation}

Compared with our previous program, read_POMC_stack, we have to do more efforts at the argument reading stage and we have to do more “interesting” computations since we have to draw random numbers, compute exponential, etc. All this technical part will be carried out by calling functions from the GSL (GNU Scientific Library). Its Reference manual is clearly written and contains many examples that are the best way to learn how to use the library’s functions.

5.1.2 Program `sim_mono_exp`

We now write a program sim_mono_exp (that we store in a file called sim_mono_exp.c in the code sub-directory) fulfilling the requirements of the previous section. The code skeleton is:

<<sim_mono_exp-include>>
int main(int argc, char *argv[])
{
  <<sim_mono_exp-usage>>
  <<sim_mono_exp-read-args>>
  <<sim_mono_exp-setup-rng>>
  <<sim_mono_exp-simulate-and-print-results>>
  <<sim_mono_exp-free-rng>>
  return 0;
}

Every program that does some (half) decent command line argument reading, like sim_mono_exp, will have a usage output printed to the standard output (stdout) with a short description of what the program arguments are and how to use them. This string is here defined in code block <<sim_mono_exp-usage>>.

Code block <<sim_mono_exp-include>> contains the include directives, getopt is required for reading the command line arguments, math for the exp function and two gsl directives are required for the random number generation functions.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>

Code block <<sim_mono_exp-usage>> declares string usage (making use of the string literal concatenation). That is what gets printed to the stdout when something “wrong” is entered by the user:

static char usage[] = \
  "usage: %s [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ...\n" \
  "          ... [--t_0=real] [--delta_t=real] [--n_obs=int]\n\n"\
  "  --F_inf <positive real>: asymptotic value (default 800)\n"\
  "  --Delta <positive real>: signal jump amplitude (default 150)\n"\
  "  --beta <positive real>: inverse time constant (default 0.1)\n"\
  "  --t_onset <real>: the stimulus onset time (default 5)\n"	   \
  "  --t_0 <real>: observations start time (default 0)\n"\
  "  --delta_t <positive real>: sampling period (default 0.15)\n"\
  "  --n_obs <positive integer>: number of observations (default 168)\n\n";

Code block <<sim_mono_exp-read-args>> reads the command line arguments, checks that they have acceptable values and initializes the following variables: F_inf, Delta, beta, t_onset, t_0, delta_t, n_obs. Here the =getop_long= function is heavily used. This is the longest code block of the program…

double F_inf = 800;
double Delta = 150;
double beta = 0.1;
double t_onset = 5;
double t_0 = 0;
double delta_t = 0.15;
size_t n_obs = 168;
{int opt;
  static struct option long_options[] = {
    {"F_inf",optional_argument,NULL,'f'},
    {"Delta",optional_argument,NULL,'d'},
    {"beta",optional_argument,NULL,'b'},
    {"t_onset",optional_argument,NULL,'o'},
    {"t_0",optional_argument,NULL,'s'},
    {"delta_t",optional_argument,NULL,'t'},
    {"n_obs",optional_argument,NULL,'n'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
			    "hf:d:b:o:s:t:n",
			    long_options,	
			    &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      F_inf = (double) atof(optarg);
      if (F_inf <= 0)
      {
	fprintf(stderr,"The asymptotic value should be > 0.\n");
	return -1;
      }
    }
    break;
    case 'd':
    {
      Delta = (double) atof(optarg);
      if (Delta <= 0)
      {
	fprintf(stderr,"Delta should be > 0.\n");
	return -1;
      }
    }
    break;
    case 'b':
    {
      beta = (double) atof(optarg);
      if (beta <= 0)
      {
	fprintf(stderr,"beta should be > 0.\n");
	return -1;
      } 
    }
    break;
    case 'o':
    {
      t_onset = (double) atof(optarg); 
    }
    break;
    case 's':
    {
      t_0 = (double) atof(optarg); 
    }
    break;
    case 't':
    {
      delta_t = (double) atof(optarg);
      if (delta_t <= 0)
      {
	fprintf(stderr,"delta_t should be > 0.\n");
	return -1;
      } 
    }
    break;
    case 'n':
    {
      n_obs = (size_t) atoi(optarg);
      if (n_obs <= 0)
      {
	fprintf(stderr,"n_obs should be > 0.\n");
	return -1;
      }  
    }
    break;  
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
}

Code block <<sim_mono_exp-setup-rng>> sets up the a random number generator named r:

gsl_rng * r;
gsl_rng_env_setup();
r = gsl_rng_alloc(gsl_rng_default);

Code block <<sim_mono_exp-free-rng>> frees the memory occupied by the rng r:

gsl_rng_free(r);

Code block <<sim_mono_exp-simulate-and-print-results>> computes the deterministic part (variable ideal in the code) and generates the observations by calling gsl_ran_poisson before writing the results to the stdout:

for (size_t i=0; i<n_obs; i++) {
  double t = t_0 + i*delta_t;
  double ideal = F_inf;
  if (t >= t_onset) ideal += Delta*exp(-beta*(t-t_onset));
  double obs = gsl_ran_poisson(r, ideal);
  printf("%g %g %g\n",t,ideal,obs);
}

Once the code has been properly “tangled” and stored in a file named sim_mono_exp.c (in the code sub-directory) we compile it with gcc (we could also do it with another compiler):

gcc -W -g -o code/sim_mono_exp code/sim_mono_exp.c -lgsl -lgslcblas -lm -std=gnu11

5.1.3 `sim_mono_exp` usage

We can now check that our new program prints some hint when a wrong parameter value is given at the command line:

./code/sim_mono_exp --F_inf=-10

The asymptotic value should be > 0.

We can also get help with:

./code/sim_mono_exp --help

usage: ./code/sim_mono_exp [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ...
          ... [--t_0=real] [--delta_t=real] [--n_obs=int]

  --F_inf <positive real>: asymptotic value (default 800)
  --Delta <positive real>: signal jump amplitude (default 150)
  --beta <positive real>: inverse time constant (default 0.1)
  --t_onset <real>: the stimulus onset time (default 5)
  --t_0 <real>: observations start time (default 0)
  --delta_t <positive real>: sampling period (default 0.15)
  --n_obs <positive integer>: number of observations (default 168)

We get a simulation using parameters parameter values leading results similar to the previous courses with:

./code/sim_mono_exp --F_inf=100 --Delta=900 --beta=1 --n_obs=51 \
		    --t_onset=0 --t_0=0 --delta_t=0.1 > sim_mono.txt

By default, the random number generator used by the GSL and therefore by our sim_mono_exp code is the Mersenne-Twister of Matsumoto and Nishimura (version of 2002) and the default seed is zero (if you have no clue about random number generators, check the chapter “Random Number Generation” by Pierre L’Ecuyer in the Handbook of Computational Statistics). We can change both of these features, setting the seed at 123 and using the “taus2” generator of L’Ecuyer by setting environment variables like this:

GSL_RNG_SEED=123 GSL_RNG_TYPE=taus2 ./code/sim_mono_exp --F_inf=100 \
	    --Delta=900 --beta=1 --n_obs=51 --t_onset=0 --t_0=0 \
	    --delta_t=0.1 > sim_mono2.txt

5.1.4 Making the figure

We get a plot of the simulated data together with the deterministic curve and the two selected points with the command sequence that follows. We are making use of the substitution of system commands in backquotes capability of gnuplot (see the section with that name on p. 42 of the User manual). In order to show our two chosen points–whose time and amplitude coordinates are located on lines 37 and 102 of the file we just generated, sim_mono.txt–we extract the first column on line 37 to get the time of the first point with t1 = "`sed -n '37,37p' sim_mono.txt | awk '{print $1}'`" (the =sed= command extracts the line and pipes the result to the =awk= command that extracts the first column) and the third column of the same line to get the amplitude (we repeat this two commands sequences to get the coordinates of the second point). The second data set that gets plotted is again obtained by piping the result of the execution of a shell command: "<sed -n '37,37p' sim_mono.txt". Since gnuplot has limited capabilities for manipulating data files, we are using general POSIX programs like sed and awk to do it.

set grid
set xlabel "Time (s)"
set ylabel "Observations"
unset key
t1 = "`sed -n '4,4p' sim_mono.txt | awk '{print $1}'`"
y1 = "`sed -n '4,4p' sim_mono.txt | awk '{print $3}'`"
t2 = "`sed -n '31,31p' sim_mono.txt | awk '{print $1}'`"
y2 = "`sed -n '31,31p' sim_mono.txt | awk '{print $3}'`"
set arrow 1 from t1,0 to t1,y1 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 1 "t1" at t1+0.1,25
set arrow 2 from 0,y1 to t1,y1 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 2 "y1" at 0.1,y1-25
set arrow 3 from t2,0 to t2,y2 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 3 "t2" at t2+0.1,25
set arrow 4 from 0,y2 to t2,y2 nohead dashtype '.' linecolor rgb "black" linewidth 1
set label 4 "y2" at 0.1,y2+25
plot [0:5][0:1100] 'sim_mono.txt' using 1:3 with points pointtype 6 linecolor rgb "black",\
     "" using 1:2 with lines linecolor rgb "red" linewidth 2,\
     "<sed -n '4,4p' sim_mono.txt" using 1:3 with points pointtype 7 linecolor rgb "blue",\
     "<sed -n '31,31p' sim_mono.txt" using 1:3 with points pointtype 7 linecolor rgb "blue"

5.2 Simple case simulation

We explore here in a simple setting (only one model parameter to estimate) the consequences at the inference level of a wrong noise model. So we start from the “simple” mono-exponential decay of the previous section, assume that only the inverse time constant, beta, is unknown and that only two observations, like the two points of the previous figure, are available. We then consider two estimators, first the classical residual sum of square (RSS) minimizer: \begin{equation}\label{eq:classical-RSS} ˜{β} ≡ arg min_{β} ∑_{i=1}^2\left(y_i -\mathtt{F\_\mathtt{inf}} - H(t_i-\mathtt{t\_\mathtt{onset}})\, \mathtt{Delta}\, exp \left(-β (t_i-\mathtt{t\_\mathtt{onset}})\right) \right)^2 \end{equation} and the minimizer of a residual sum computed from the square root of the observations and the predictions: \begin{equation}\label{eq:sqrt-RSS} \hat{β} ≡ arg min_{β} ∑_{i=1}^2\left(\sqrt{y_i} -\sqrt{\mathtt{F\_\mathtt{inf}} + H(t_i-\mathtt{t\_\mathtt{onset}})\, \mathtt{Delta}\, exp \left(-β (t_i-\mathtt{t\_\mathtt{onset}})\right)} \right)^2 \, . \end{equation}

5.2.1 What do we want to do?

We want to study the sampling distribution of the two estimators. To that end we are going to simulate a large number of data sets $\{(t_1,y1),(t_2,y_2)\}$ according to the model specified by Eq. \ref{eq:deterministic-mono-exp-discrete} and \ref{eq:obs-mono-exp-discrete}. The two estimators will be obtained from each simulated data set and the empirical distribution of each estimator will be approximated by constructing an histogram from the set of estimators. In other words, we are going to do what is technically called a parametric bootstrap.

5.2.2 `C` implementation for a single case

Minimizing a residual sum of squares can be done with linear or, depending on the problem, non-linear regression routines. The predictors in the two cases above (Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS}) are non-linear functions of the parameter of interest ($β$). We are going to use here the routines described in the Nonlinear Least-Squares Fitting chapter of the gsl manual. We will moreover start with a single case (the one we just simulated) before embarking in the full scale simulation study. We can then just adapt the first example of the gsl manual to suit our needs.

5.2.2.1 A note on the nonlinear least-squares routines

The functions that the non-linear regression routines of the gsl minimize must return a vector of residuals not the sum of squared residuals. They must take three formal parameters:

x: a (pointer to a gsl) vector of model parameters (in the case of Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS}, this vector contains a single element, the parameter $β$).
params: a pointer to an adequate structure (in general) containing everything fixed (or known) and required to compute the residual (in the case of Eq. \ref{eq:classical-RSS} and \ref{eq:sqrt-RSS} it should contain $t_1,t_2,y_1,y_2,\mathtt{F\_\mathtt{inf}},\mathtt{t\_\mathtt{onset}},\mathtt{Delta}$).
f: a (pointer to a gsl) vector that will contain the residuals.

In addition to the required function returning the residuals, we can provide a function returning the “Jacobian” matrix with has many rows as there are observations and as many column as there are parameters to estimate. Element $(i,j)$ of this matrix should contain the derivative of the $i$th residual with respect to $j$th model parameter. We will define such functions in the sequel. If they are not provided, a numerical estimate is obtained by the gsl routines using finite differences.

5.2.2.2 Program skeleton

We will therefore construct our new program following the lines of the previous one =sim_mono_exp=, essentially changing the end part of if in order to estimate beta instead of writing the simulated data to the standard output. We therefore define a program expl_beta_est (that we store in a file called expl_beta_est.c) whose somewhat “heavy” skeleton is:

<<sim_mono_exp-include>>
<<expl_beta_est-extra-include>>
<<expl_beta_est-print-result-macro>>
<<expl_beta_est-data-structure>>
<<expl_beta_est-tilde_f>>
<<expl_beta_est-tilde_df>>
<<expl_beta_est-hat_f>>
<<expl_beta_est-hat_df>>
<<expl_beta_est-callback>>
int main(int argc, char *argv[])
{
  <<sim_mono_exp-usage>>
  <<sim_mono_exp-read-args>>
  <<sim_mono_exp-setup-rng>>
  <<expl_beta_est-simulate>>
  <<expl_beta_est-set-tolerance>>

  <<expl_beta_est-fdf-tilde>>
  <<expl_beta_est-setup-tilde-solver>>
  <<expl_beta_est-inital-cost-tilde>>
  <<expl_beta_est-get-beta-tilde>>
  PRINT(tilde)

  <<expl_beta_est-fdf-hat>>
  <<expl_beta_est-setup-hat-solver>>
  <<expl_beta_est-inital-cost-hat>>
  <<expl_beta_est-get-beta-hat>>
  PRINT(hat)

  <<sim_mono_exp-free-rng>>
  <<expl_beta_est-free-solver>>
  <<expl_beta_est-free-covar>>
  return 0;
}

5.2.2.3 Headers and macros

We first recycle with <<sim_mono_exp-include>> the header of our previous program and we extend it with <<expl_beta_est-extra-include>> in order to have the signatures of the non-linear fitting routines as well as gsl_vector and gsl_matrix types and functions definitions:

#include <gsl/gsl_matrix.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_multifit_nlinear.h>

We define next a PRINT macro in code block <<expl_beta_est-print-result-macro>>. This is to save code writing since it allows us to write “code that generates code” since text substitution of macros is performed by the preprocessor. Calling following macro with what set to tilde will result in a third line having w_tilde where there is w_##what in the macro definition.

#define PRINT(what) {							\
  fprintf(stderr, "summary from method '%s/%s'\n",			\
	  gsl_multifit_nlinear_name(w_##what),				\
          gsl_multifit_nlinear_trs_name(w_##what));			\
  fprintf(stderr, "number of iterations: %zu\n",			\
          gsl_multifit_nlinear_niter(w_##what));			\
  fprintf(stderr, "function evaluations: %zu\n", fdf_##what.nevalf);	\
  fprintf(stderr, "Jacobian evaluations: %zu\n", fdf_##what.nevaldf);	\
  fprintf(stderr, "reason for stopping: %s\n",				\
          (info_##what == 1) ? "small step size" : "small gradient");	\
  fprintf(stderr, "initial |f(x)| = %f\n", sqrt(chisq0_##what));	\
  fprintf(stderr, "final   |f(x)| = %f\n", sqrt(chisq_##what));		\
  fprintf (stderr, "beta = %.5f +/- %.5f\n",				\
	   gsl_vector_get(w_##what->x, 0),				\
	   1.96*sqrt(gsl_matrix_get(covar_##what,0,0)));		\
  fprintf (stderr, "status = %s\n", gsl_strerror (status_##what));	\
}

5.2.2.4 A structure holding “what is fixed”

Code block <<expl_beta_est-data-structure>> contains the definition of a structure, names data that will hold what is fixed and required to compute the residuals.

struct data {
  size_t n; // number of observations
  double * t; // pointer to an array (of size n) holding t_1 and t_2
  double * y; // pointer to an array (of size n) holding y_1 and y_2
  double F_inf,Delta,t_onset; // the known model parameters
};

5.2.2.5 Residual returning function definition for the first estimator

Code block <<expl_beta_est-tilde_f>> contains the definition of the function returning the residuals corresponding to Eq. \ref{eq:classical-RSS}.

int
tilde_f (const gsl_vector * x, void *data, 
	 gsl_vector * f)
{
  size_t n = ((struct data *)data)->n;
  double *y = ((struct data *)data)->y;
  double *t = ((struct data *)data)->t;
  double F_inf = ((struct data *)data)->F_inf;
  double Delta = ((struct data *)data)->Delta;
  double t_onset = ((struct data *)data)->t_onset;
  double beta = gsl_vector_get (x, 0);

  for (size_t i = 0; i < n; i++) {
    /* Model Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t) */
    double Yi = F_inf;
    double delta_t = t[i]-t_onset;
    if (delta_t >= 0.) 
      Yi += Delta * exp (-beta * delta_t);
    gsl_vector_set (f, i, Yi - y[i]);
  }

  return GSL_SUCCESS;
}

5.2.2.6 Jacobian returning function definition for the first estimator

Code block <<expl_beta_est-tilde_df>> contains the definition of the function returning the Jacobian corresponding to Eq. \ref{eq:classical-RSS}.

int
tilde_df (const gsl_vector * x, void *data, 
	  gsl_matrix * J)
{
  size_t n = ((struct data *)data)->n;
  double *y = ((struct data *)data)->y;
  double *t = ((struct data *)data)->t;
  double F_inf = ((struct data *)data)->F_inf;
  double Delta = ((struct data *)data)->Delta;
  double t_onset = ((struct data *)data)->t_onset;
  double beta = gsl_vector_get (x, 0);

  for (size_t i = 0; i < n; i++) {
    /* Jacobian matrix J(i,j) = dfi / dxj, */
    /* where fi = (Yi - yi)/sigma[i],      */
    /*       Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t)  */
    /* and xj is the model parameter beta */
    double delta_t = t[i] - t_onset;
    if (delta_t >= 0) {
      double e = exp(-beta * delta_t);
      gsl_matrix_set (J, i, 0, -delta_t * Delta * e);
    } else {
      gsl_matrix_set (J, i, 0, 0);
    }
  }
  return GSL_SUCCESS;
}

5.2.2.7 Residual returning function definition for the second estimator

Code block <<expl_beta_est-hat_f>> contains the definition of the function returning the residuals corresponding to Eq. \ref{eq:sqrt-RSS}.

int
hat_f (const gsl_vector * x, void *data, 
       gsl_vector * f)
{
  size_t n = ((struct data *)data)->n;
  double *y = ((struct data *)data)->y;
  double *t = ((struct data *)data)->t;
  double F_inf = ((struct data *)data)->F_inf;
  double Delta = ((struct data *)data)->Delta;
  double t_onset = ((struct data *)data)->t_onset;
  double beta = gsl_vector_get (x, 0);

  for (size_t i = 0; i < n; i++) {
    /* Model Yi = sqrt(F_inf + H(t-t_onset) * Delta * exp(-beta * t)) */
    double Yi = F_inf;
    double delta_t = t[i]-t_onset;
    if (delta_t >= 0) 
      Yi += Delta * exp (-beta * delta_t);
    gsl_vector_set (f, i, sqrt(Yi) - sqrt(y[i]));
  }

  return GSL_SUCCESS;
}

5.2.2.8 Jacobian returning function definition for the second estimator

Code block <<expl_beta_est-hat_df>> contains the definition of the function returning the Jacobian corresponding to Eq. \ref{eq:sqrt-RSS}.

int
hat_df (const gsl_vector * x, void *data, 
	gsl_matrix * J)
{
  size_t n = ((struct data *)data)->n;
  double *y = ((struct data *)data)->y;
  double *t = ((struct data *)data)->t;
  double F_inf = ((struct data *)data)->F_inf;
  double Delta = ((struct data *)data)->Delta;
  double t_onset = ((struct data *)data)->t_onset;
  double beta = gsl_vector_get (x, 0);

  for (size_t i = 0; i < n; i++) {
    /* Jacobian matrix J(i,j) = dfi / dxj, */
    /* where fi = (Yi - yi)/sigma[i],      */
    /*       Yi = F_inf + H(t-t_onset) * Delta * exp(-beta * t)  */
    /* and xj is the model parameter beta */
    double delta_t = t[i] - t_onset;
    if (delta_t >= 0) {
      double e = exp(-beta * delta_t);
      double Yi = F_inf+Delta*e;
	gsl_matrix_set (J, i, 0, -0.5 * delta_t * Delta * e / sqrt(Yi));
    } else {
      gsl_matrix_set (J, i, 0, 0);
    }
  }
  return GSL_SUCCESS;
}

5.2.2.9 “callback” function definition

We optionally pass to gsl routine doing the actual optimization =gsl_multifit_nlinear_driver= a “callback” function that is called after each iteration of the solver and that can be use to print information on the progress of the procedure. This is what code black <<expl_beta_est-callback>> does here:

void
callback(const size_t iter, void *params,
         const gsl_multifit_nlinear_workspace *w)
{
  gsl_vector *f = gsl_multifit_nlinear_residual(w);
  gsl_vector *x = gsl_multifit_nlinear_position(w);
  
fprintf(stderr, "iter %2zu: beta = %.4f, |f(x)| = %.4f\n",
          iter,
          gsl_vector_get(x, 0),
          gsl_blas_dnrm2(f));
}

5.2.2.10 Data simulation and “fixed parameters” initialization

Code block <<expl_beta_est-simulate>> simulates the data after declaring vectors y and t to hold them. A data structure, called d, is then declared and initialized by designated initialization. A vector (of length 1) holding the initial guess (set to the value used to simulate the data, that is the true value) is declared and initialized and a gsl_vector_view associated to it is declared and initialized (this is required for using the fitting routines):

double y[n_obs], t[n_obs];
for (size_t i=0; i<n_obs; i++) {
  t[i] = t_0 + i*delta_t;
  double ideal = F_inf;
  if (t[i] >= t_onset) ideal += Delta*exp(-beta*(t[i]-t_onset));
  y[i] = gsl_ran_poisson(r, ideal);
}
struct data d = { .n = n_obs, .t = t, .y = y,
		  .F_inf = F_inf, .Delta = Delta,
		  .t_onset = t_onset};
double x_init[1] = { beta }; /* starting values */
gsl_vector_view x = gsl_vector_view_array (x_init, 1);

5.2.2.11 Definitions of convergence / stopping conditions

Code block <<expl_beta_est-set-tolerance>> contains the declaration and initialization of the tolerances used to stop the iterative fitting procedure. Check the manual to see what they mean (you should really check that).

const double xtol = 1e-8;
const double gtol = 1e-8;
const double ftol = 0.0;

5.2.2.12 `<<expl_beta_est-setup-tilde-solver>>`: solver memory allocation and initialization for the first estimator

const gsl_multifit_nlinear_type *T_tilde = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_tilde =
  gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_tilde = gsl_multifit_nlinear_alloc (T_tilde,
								      &fdf_params_tilde,
								      n_obs, 1);
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_tilde, w_tilde);

5.2.2.13 `<<expl_beta_est-setup-hat-solver>>`: solver memory allocation and initialization for the second estimator

const gsl_multifit_nlinear_type *T_hat = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_hat =
  gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_hat = gsl_multifit_nlinear_alloc (T_hat,
								    &fdf_params_hat,
								    n_obs, 1);
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_hat, w_hat);

5.2.2.14 `<<expl_beta_est-free-solver>>`: free solvers’ space

gsl_multifit_nlinear_free (w_tilde);
gsl_multifit_nlinear_free (w_hat);

5.2.2.15 `<<expl_beta_est-fdf-tilde>>`: function to minimize for the first estimator

gsl_multifit_nlinear_fdf fdf_tilde;
fdf_tilde.f = tilde_f;
fdf_tilde.df = tilde_df;   /* set to NULL for finite-difference Jacobian */
fdf_tilde.fvv = NULL;     /* not using geodesic acceleration */
fdf_tilde.n = n_obs;
fdf_tilde.p = 1;
fdf_tilde.params = &d;

5.2.2.16 `<<expl_beta_est-fdf-hat>>`: function to minimize for the second estimator

gsl_multifit_nlinear_fdf fdf_hat;
fdf_hat.f = hat_f;
fdf_hat.df = hat_df;   /* set to NULL for finite-difference Jacobian */
fdf_hat.fvv = NULL;     /* not using geodesic acceleration */
fdf_hat.n = n_obs;
fdf_hat.p = 1;
fdf_hat.params = &d;

5.2.2.17 `<<expl_beta_est-inital-cost-tilde>>`

We get the initial residual and the initial RSS value for the first estimator:

gsl_vector *f_tilde = gsl_multifit_nlinear_residual(w_tilde);
double chisq0_tilde;
gsl_blas_ddot(f_tilde, f_tilde, &chisq0_tilde);

5.2.2.18 `<<expl_beta_est-inital-cost-hat>>`

We get the initial residual and the initial RSS value for the second estimator:

gsl_vector *f_hat = gsl_multifit_nlinear_residual(w_hat);
double chisq0_hat;
gsl_blas_ddot(f_hat, f_hat, &chisq0_hat);

5.2.2.19 `<<expl_beta_est-get-beta-tilde>>`

We perform the least squares optimization, get the standard error and the final RSS value for the first estimator:

/* solve the system with a maximum of 20 iterations */
int info_tilde;
int status_tilde = gsl_multifit_nlinear_driver(20, xtol, gtol, ftol,
					       callback, NULL, &info_tilde,
					       w_tilde);

  /* compute covariance of best fit parameters */
gsl_matrix *J_tilde = gsl_multifit_nlinear_jac(w_tilde);
gsl_matrix *covar_tilde = gsl_matrix_alloc (1, 1);
gsl_multifit_nlinear_covar (J_tilde, 0.0, covar_tilde);

/* compute final cost */
double chisq_tilde;
gsl_blas_ddot(f_tilde, f_tilde, &chisq_tilde);

5.2.2.20 `<<expl_beta_est-get-beta-hat>>`

We perform the least squares optimization, get the standard error and the final RSS value for the second estimator:

/* solve the system with a maximum of 20 iterations */
int info_hat;
int status_hat = gsl_multifit_nlinear_driver(20, xtol, gtol, ftol,
					       callback, NULL, &info_hat,
					       w_hat);

  /* compute covariance of best fit parameters */
gsl_matrix *J_hat = gsl_multifit_nlinear_jac(w_hat);
gsl_matrix *covar_hat = gsl_matrix_alloc (1, 1);
gsl_multifit_nlinear_covar (J_hat, 0.0, covar_hat);

/* compute final cost */
double chisq_hat;
gsl_blas_ddot(f_hat, f_hat, &chisq_hat);

5.2.2.21 `<<expl_beta_est-free-covar>>`: frees space taken by covariance matrices

gsl_matrix_free (covar_tilde);
gsl_matrix_free (covar_hat);

5.2.2.22 Code compilation

Once the code has been properly “tangled” and stored in a file named expl_beta_est.c (in the code sub-directory) we compile it with gcc (we could also do it with another compiler):

gcc -W -g -o code/expl_beta_est code/expl_beta_est.c -lgsl -lgslcblas -lm -std=gnu11

5.2.2.23 Code use

We can run the code using a data set made of two observations made at the same times as the two points of the previous figure with:

./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 \
		    --t_onset=0 --t_0=0.3 --delta_t=2.5

iter  0: beta = 1.0000, |f(x)| = 10.9021
iter  1: beta = 0.9957, |f(x)| = 10.8452
iter  2: beta = 0.9955, |f(x)| = 10.8451
iter  3: beta = 0.9955, |f(x)| = 10.8451
iter  4: beta = 0.9955, |f(x)| = 10.8451
iter  5: beta = 0.9955, |f(x)| = 10.8451
iter  6: beta = 0.9955, |f(x)| = 10.8451
summary from method 'trust-region/levenberg-marquardt'
number of iterations: 6
function evaluations: 10
Jacobian evaluations: 7
reason for stopping: small step size
initial |f(x)| = 10.902140
final   |f(x)| = 10.845086
beta = 0.99547 +/- 0.00774
status = success
iter  0: beta = 1.0000, |f(x)| = 0.3816
iter  1: beta = 0.9630, |f(x)| = 0.2707
iter  2: beta = 0.9630, |f(x)| = 0.2707
iter  3: beta = 0.9629, |f(x)| = 0.2707
iter  4: beta = 0.9629, |f(x)| = 0.2707
iter  5: beta = 0.9629, |f(x)| = 0.2707
summary from method 'trust-region/levenberg-marquardt'
number of iterations: 5
function evaluations: 6
Jacobian evaluations: 6
reason for stopping: small step size
initial |f(x)| = 0.381617
final   |f(x)| = 0.270744
beta = 0.96295 +/- 0.25700
status = success

5.2.2.24 Using valgrind

Since our code makes several allocations (in the heap) it is a could idea to check that we did not forget to free of this allocated memory space when we quit the program. This is easily done with valgrind as illustrated next:

valgrind ./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 \
		    --t_onset=0 --t_0=0.3 --delta_t=2.5

==4656== Memcheck, a memory error detector
==4656== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4656== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==4656== Command: ./code/expl_beta_est --F_inf=100 --Delta=900 --beta=1 --n_obs=2 --t_onset=0 --t_0=0.3 --delta_t=2.5
==4656== 
iter  0: beta = 1.0000, |f(x)| = 10.9021
iter  1: beta = 0.9957, |f(x)| = 10.8452
iter  2: beta = 0.9955, |f(x)| = 10.8451
iter  3: beta = 0.9955, |f(x)| = 10.8451
iter  4: beta = 0.9955, |f(x)| = 10.8451
iter  5: beta = 0.9955, |f(x)| = 10.8451
iter  6: beta = 0.9955, |f(x)| = 10.8451
summary from method 'trust-region/levenberg-marquardt'
number of iterations: 6
function evaluations: 10
Jacobian evaluations: 7
reason for stopping: small step size
initial |f(x)| = 10.902140
final   |f(x)| = 10.845086
beta = 0.99547 +/- 0.00774
status = success
iter  0: beta = 1.0000, |f(x)| = 0.3816
iter  1: beta = 0.9630, |f(x)| = 0.2707
iter  2: beta = 0.9630, |f(x)| = 0.2707
iter  3: beta = 0.9629, |f(x)| = 0.2707
iter  4: beta = 0.9629, |f(x)| = 0.2707
iter  5: beta = 0.9629, |f(x)| = 0.2707
summary from method 'trust-region/levenberg-marquardt'
number of iterations: 5
function evaluations: 6
Jacobian evaluations: 6
reason for stopping: small step size
initial |f(x)| = 0.381617
final   |f(x)| = 0.270744
beta = 0.96295 +/- 0.25700
status = success
==4656== 
==4656== HEAP SUMMARY:
==4656==     in use at exit: 0 bytes in 0 blocks
==4656==   total heap usage: 186 allocs, 186 frees, 10,136 bytes allocated
==4656== 
==4656== All heap blocks were freed -- no leaks are possible
==4656== 
==4656== For counts of detected and suppressed errors, rerun with: -v
==4656== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

5.2.3 `C` implementation of a systematic simulation

We can now code a systematic simulation where a sample is generated nrep times according to the model specified by Eq. \ref{eq:deterministic-mono-exp-discrete} and \ref{eq:obs-mono-exp-discrete}, the two estimators $˜{β}$ (Eq. \ref{eq:classical-RSS}) and $\hat{β}$ (Eq. \ref{eq:sqrt-RSS}) are obtained for each simulated dataset and a histogram of the $˜{β}$ and the $\hat{β}$ are built to estimate the sampling distribution of each estimator.

5.2.3.1 `<<beta_samp_dist>>` code skeleton

The first major difference with the previous code is the loop (appearing explicitly) through which the different bootstrap replicates are generated and fitted. Array have to be allocated and later freed for storing the estimates $β$. The last part of the code between code block <<beta_samp_dist-beta-range>> and <<beta_samp_dist-write-hist>> generates the histograms.

<<beta_samp_dist-include>>
<<expl_beta_est-data-structure>>
<<expl_beta_est-tilde_f>>
<<expl_beta_est-tilde_df>>
<<expl_beta_est-hat_f>>
<<expl_beta_est-hat_df>>
int main(int argc, char *argv[])
{
    <<beta_samp_dist-usage>>
    <<beta_samp_dist-args>>
    <<sim_mono_exp-setup-rng>>
    <<expl_beta_est-set-tolerance>>
    <<beta_samp_dist-fdf-tilde>>
    <<beta_samp_dist-setup-tilde-solver>>
    <<beta_samp_dist-fdf-hat>>
    <<beta_samp_dist-setup-hat-solver>>
    <<beta_samp_dist-allocate-beta-vectors>>
    for (size_t rep_idx=0; rep_idx<n_rep; rep_idx++) {
      <<expl_beta_est-simulate>>
      <<beta_samp_dist-get-beta-tilde>>
      <<beta_samp_dist-get-beta-hat>>
      <<beta_samp_dist-store-results>>
    }
    <<beta_samp_dist-beta-range>>
    <<beta_samp_dist-allocate-hist>>
    <<beta_samp_dist-fill-hist>>
    <<beta_samp_dist-write-hist>>
    <<beta_samp_dist-free-beta-vectors>>
    <<beta_samp_dist-free-hist>>
    <<sim_mono_exp-free-rng>>
    <<expl_beta_est-free-solver>>
    return 0;
}

5.2.3.2 `<<beta_samp_dist-include>>` header

We need to had #include <gsl/gsl_histogram.h> compared to our last header so that we get access to the gsl functions and structures dealing with histograms:

<<sim_mono_exp-include>>
<<expl_beta_est-extra-include>>
#include <gsl/gsl_histogram.h>

5.2.3.3 `<<beta_samp_dist-usage>>`

Our new code takes three additional formal parameters: n_rep, n_bin and max_iter:

static char usage[] = \
  "usage: %s [--F_inf=real] [--Delta=real] [--beta=real] [--t_onset=real] ...\n" \
  "          ... [--t_0=real] [--delta_t=real] [--n_obs=int] [--n_rep=int] ...\n"\
  "          ... [--n_bin=int] [--max_iter=int]\n\n"\
  "  --F_inf <positive real>: asymptotic value (default 800)\n"\
  "  --Delta <positive real>: signal jump amplitude (default 150)\n"\
  "  --beta <positive real>: inverse time constant (default 0.1)\n"\
  "  --t_onset <real>: the stimulus onset time (default 5)\n"	   \
  "  --t_0 <real>: observations start time (default 0)\n"\
  "  --delta_t <positive real>: sampling period (default 0.15)\n"\
  "  --n_obs <positive integer>: number of observations (default 168)\n"
  "  --n_rep <positive integer>: number of Monte Carlo replicates (default 10000)\n"
  "  --n_bin <positive integer>: number of bins for the histograms (default 50)\n"
  "  --max_iter <positive integer>: maximal number of optimization iterations (default 500)\n\n"
  " The program write to the stdout two gsl histograms containing the (estimated) sampling\n"
  " distribution of the two beta estimators separated by two blank lines\n\n";

5.2.3.4 `<<beta_samp_dist-args>>`

The formal parameters reading part is modified in order to deal with the three new parameters:

double F_inf = 800;
double Delta = 150;
double beta = 0.1;
double t_onset = 5;
double t_0 = 0;
double delta_t = 0.15;
size_t n_obs = 168;
size_t n_rep = 10000;
size_t n_bin = 50;
size_t max_iter = 500;
{int opt;
  static struct option long_options[] = {
    {"F_inf",optional_argument,NULL,'f'},
    {"Delta",optional_argument,NULL,'d'},
    {"beta",optional_argument,NULL,'b'},
    {"t_onset",optional_argument,NULL,'o'},
    {"t_0",optional_argument,NULL,'s'},
    {"delta_t",optional_argument,NULL,'t'},
    {"n_obs",optional_argument,NULL,'n'},
    {"n_rep",optional_argument,NULL,'r'},
    {"n_bin",optional_argument,NULL,'e'},
    {"max_iter",optional_argument,NULL,'m'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
			    "hf:d:b:o:s:t:n:r:e:m:",
			    long_options,	
			    &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      F_inf = (double) atof(optarg);
      if (F_inf <= 0)
      {
	fprintf(stderr,"The asymptotic value should be > 0.\n");
	return -1;
      }
    }
    break;
    case 'd':
    {
      Delta = (double) atof(optarg);
      if (Delta <= 0)
      {
	fprintf(stderr,"Delta should be > 0.\n");
	return -1;
      }
    }
    break;
    case 'b':
    {
      beta = (double) atof(optarg);
      if (beta <= 0)
      {
	fprintf(stderr,"beta should be > 0.\n");
	return -1;
      } 
    }
    break;
    case 'o':
    {
      t_onset = (double) atof(optarg); 
    }
    break;
    case 's':
    {
      t_0 = (double) atof(optarg); 
    }
    break;
    case 't':
    {
      delta_t = (double) atof(optarg);
      if (delta_t <= 0)
      {
	fprintf(stderr,"delta_t should be > 0.\n");
	return -1;
      } 
    }
    break;
    case 'n':
    {
      n_obs = (size_t) atoi(optarg);
      if (n_obs <= 0)
      {
	fprintf(stderr,"n_obs should be > 0.\n");
	return -1;
      }  
    }
    break;
    case 'r':
    {
      n_rep = (size_t) atoi(optarg);
      if (n_rep <= 0)
      {
	fprintf(stderr,"n_rep should be > 0.\n");
	return -1;
      }  
    }
    break;
    case 'e':
    {
      n_bin = (size_t) atoi(optarg);
      if (n_bin <= 0)
      {
	fprintf(stderr,"n_bin should be > 0.\n");
	return -1;
      }  
    }
    break;
    case 'm':
    {
      max_iter = (size_t) atoi(optarg);
      if (max_iter <= 0)
      {
	fprintf(stderr,"max_iter should be > 0.\n");
	return -1;
      }  
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
}

5.2.3.5 `<<beta_samp_dist-fdf-tilde>>`

The code block is almost identical to <<expl_beta_est-fdf-tilde>>, the only difference is the last assignment of the latter, fdf_tilde.params = &d;, that occurs now within the loop since we want to fit new simulated data at each iteration.

gsl_multifit_nlinear_fdf fdf_tilde;
fdf_tilde.f = tilde_f;
fdf_tilde.df = tilde_df;   /* set to NULL for finite-difference Jacobian */
fdf_tilde.fvv = NULL;     /* not using geodesic acceleration */
fdf_tilde.n = n_obs;
fdf_tilde.p = 1;

5.2.3.6 `<<beta_samp_dist-fdf-hat>>`

The corresponding comment to the one of the previous code block applies here also.

gsl_multifit_nlinear_fdf fdf_hat;
fdf_hat.f = hat_f;
fdf_hat.df = hat_df;   /* set to NULL for finite-difference Jacobian */
fdf_hat.fvv = NULL;     /* not using geodesic acceleration */
fdf_hat.n = n_obs;
fdf_hat.p = 1;

5.2.3.7 `<<beta_samp_dist-setup-tilde-solver>>`

Again, a code block almost identical to <<expl_beta_est-setup-tilde-solver>>. The last statement has been removed since it must be renewed at each iteration:

const gsl_multifit_nlinear_type *T_tilde = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_tilde =
  gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_tilde = gsl_multifit_nlinear_alloc (T_tilde,
								      &fdf_params_tilde,
								      n_obs, 1);

5.2.3.8 `<<beta_samp_dist-setup-hat-solver>>`

See previous point.

const gsl_multifit_nlinear_type *T_hat = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params_hat =
  gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w_hat = gsl_multifit_nlinear_alloc (T_hat,
								      &fdf_params_hat,
								      n_obs, 1);

5.2.3.9 `<<beta_samp_dist-allocate-beta-vectors>>`

We just allocate heap space to store the $β$ estimates obtained at each iteration:

double *beta_tilde = malloc(n_rep*sizeof(double));
double *beta_hat = malloc(n_rep*sizeof(double));

5.2.3.10 `<<beta_samp_dist-free-beta-vectors>>`

We free the heap memory taken by our beta arrays:

free(beta_tilde);
free(beta_hat);

5.2.3.11 `<<beta_samp_dist-get-beta-tilde>>`

This is the “workhorse” of the program for the first estimator. The new data are “assigned” to the fdf_tilde structure, the w_tilde solver is reinitialized and the optimization is performed. This time the maximal number of iterations is a user controlled parameter and no callback function is called (we are going to generate $10^5$ replicates!):

fdf_tilde.params = &d;
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_tilde, w_tilde);
/* solve the system with a maximum of max_iter iterations */
int info_tilde;
int status_tilde = gsl_multifit_nlinear_driver(max_iter, xtol, gtol, ftol,
					       NULL, NULL, &info_tilde,
					       w_tilde);

5.2.3.12 `<<beta_samp_dist-get-beta-hat>>`

See the previous point.

fdf_hat.params = &d;
/* initialize solver with starting point and weights */
gsl_multifit_nlinear_init (&x.vector, &fdf_hat, w_hat);
/* solve the system with a maximum of max_iter iterations */
int info_hat;
int status_hat = gsl_multifit_nlinear_driver(max_iter, xtol, gtol, ftol,
					     NULL, NULL, &info_hat,
					     w_hat);

5.2.3.13 `<<beta_samp_dist-store-results>>`

For each replicate (that is, each iteration of the loop), we check that convergence has been achieved and we store the estimated beta at the proper place:

if (info_tilde == 0) {
  fprintf(stderr,"On replicate %d the beta_tilde estimator did not converge.\n",rep_idx);
  beta_tilde[rep_idx] = NAN;
} else {
  beta_tilde[rep_idx] = gsl_vector_get(w_tilde->x, 0);
}
if (info_hat == 0) {
  fprintf(stderr,"On replicate %d the beta_hat estimator did not converge.\n",rep_idx);
  beta_hat[rep_idx] = NAN;
} else {
  beta_hat[rep_idx] = gsl_vector_get(w_hat->x, 0);
}

5.2.3.14 `<<beta_samp_dist-beta-range>>`

We get the range of the estimated beta in order to define properly the limits of our histograms:

gsl_vector_view beta_tilde_v = gsl_vector_view_array (beta_tilde, n_rep);
double beta_max, beta_min;
gsl_vector_minmax(&beta_tilde_v.vector,&beta_min,&beta_max);
gsl_vector_view beta_hat_v = gsl_vector_view_array (beta_hat, n_rep);
if (gsl_vector_max(&beta_hat_v.vector) > beta_max)
  beta_max = gsl_vector_max(&beta_hat_v.vector);
if (gsl_vector_min(&beta_hat_v.vector) < beta_min)
  beta_min = gsl_vector_min(&beta_hat_v.vector);

5.2.3.15 `<<beta_samp_dist-allocate-hist>>`

We allocate two histograms covering the same range with the same number of bins (controlled by the user):

gsl_histogram * h_tilde = gsl_histogram_alloc (n_bin);
gsl_histogram_set_ranges_uniform (h_tilde, beta_min, beta_max);
gsl_histogram * h_hat = gsl_histogram_alloc (n_bin);
gsl_histogram_set_ranges_uniform (h_hat, beta_min, beta_max);

5.2.3.16 `<<beta_samp_dist-free-hist>>`

The histograms are freed once we are done with them:

gsl_histogram_free (h_tilde);
gsl_histogram_free (h_hat);

5.2.3.17 `<<beta_samp_dist-fill-hist>>`

The histograms are “updated” for each estimated beta. Once all the beta have been considered, the histograms are scaled in order to yield a proper density estimation:

for (size_t rep_idx=0; rep_idx<n_rep; rep_idx++) {
  gsl_histogram_increment (h_tilde, beta_tilde[rep_idx]);
  gsl_histogram_increment (h_hat, beta_hat[rep_idx]);
}
gsl_histogram_scale (h_tilde,(double)n_bin/(n_rep*(beta_max-beta_min)));
gsl_histogram_scale (h_hat,(double) n_bin/(n_rep*(beta_max-beta_min)));

5.2.3.18 `<<beta_samp_dist-write-hist>>`

Results are written to the standard output with the histogram containing the (estimated) sampling distribution of the first estimator first and, after two blank lines as required by gnuplot, the (estimated) sampling distribution of the second estimator:

gsl_histogram_fprintf(stdout,h_tilde,"%g","%g");
fprintf(stdout,"\n\n");
gsl_histogram_fprintf(stdout,h_hat,"%g","%g");

5.2.3.19 Code compilation

Once the code has been properly “tangled” and stored in a file named beta_samp_dist.c (in the code sub-directory) we compile it with gcc (we could also do it with another compiler):

gcc -W -g -o code/beta_samp_dist code/beta_samp_dist.c -lgsl -lgslcblas -lm -std=gnu11

5.2.3.20 Code use

We can now run our computation on 100,000 replicates checking also the execution time with the POSIX time function (this run-time is measured on a ThinkPad L530 with an Intel i3 CPU at 2.5 GHz):

time ./code/beta_samp_dist  --F_inf=100 \
     --Delta=900 --beta=1 --t_onset=0 \
     --t_0=0.3 --delta_t=2.7 --n_obs=2\
     --n_rep=100000 --n_bin=100 > beta_samp_dist.txt

real	0m1,360s
user	0m1,360s
sys	0m0,000s

We can make a quick valgrind check with:

valgrind ./code/beta_samp_dist  --F_inf=100 --Delta=900 --beta=1\
	 --t_onset=0 --t_0=0.3 --delta_t=2.7 --n_obs=2 --n_rep=100\
	 --n_bin=100 > dump

==5216== Memcheck, a memory error detector
==5216== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5216== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==5216== Command: ./code/beta_samp_dist --F_inf=100 --Delta=900 --beta=1 --t_onset=0 --t_0=0.3 --delta_t=2.7 --n_obs=2 --n_rep=100 --n_bin=100
==5216== 
==5216== 
==5216== HEAP SUMMARY:
==5216==     in use at exit: 0 bytes in 0 blocks
==5216==   total heap usage: 167 allocs, 167 frees, 18,488 bytes allocated
==5216== 
==5216== All heap blocks were freed -- no leaks are possible
==5216== 
==5216== For counts of detected and suppressed errors, rerun with: -v
==5216== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

5.2.3.21 Figure construction

We now use gnuplot as follows:

t1 = 0.3
t2 = 3
F(t) = 100 + 900*exp(-t)
dF(t) = -t*900*exp(-t)
var0 = (dF(t1)**2*F(t1)+dF(t2)**2*F(t2))/(dF(t1)**2+dF(t2)**2)**2
var1 = 1.0/(dF(t1)**2/F(t1)+dF(t2)**2/F(t2))
gogo0(x) = exp(-0.5*(x-1.0)**2/var0)/sqrt(2*pi*var0)
gogo1(x) = exp(-0.5*(x-1.0)**2/var1)/sqrt(2*pi*var1)
set xlabel "β"
set ylabel "Densities"
set key
plot gogo0(x) with lines linecolor rgb "black" \
     linewidth 2 title "asymp. β tilde",\
     gogo1(x) with lines linecolor rgb "grey" \
     linewidth 2 title "asymp. β hat",\
     "beta_samp_dist.txt" index 0 using 1:3 with steps \
     linewidth 2 linecolor rgb "blue" title "β tilde",\
     "" index 1 using 1:3 with steps linewidth 2\
     linecolor rgb "orange" title "β hat"

6 Convergence in law of a Poisson random variable towards a normal one

6.1 Scaled Poisson CDF

We write now a program that returns the Cumulative Distribution Function (CDF) of a scaled Poisson random variable: \begin{equation}\label{eq:scaled-Poisson} Y_n = \frac{X_n-n}{\sqrt{n}}\, , \end{equation} where $X_n$ is a Poisson random variable with parameter $n$. We will write a program that return the CDF of $Y_n$ for user defined values of $n$ and that returns to the stdout a two columns text file where the first column contains the number numbers $0,1,\ldots,n$ and the second column contains the corresponding CDF. When several values of $n$ are requested, the different datasets will be separated by two blank lines as requested by gnuplot. The only “tricky” part of the code will be the user supplied parameters reading; we will allow users to pass the series of Poisson distribution parameters they want to explore like –n=5,50,500,5000 that is, several comma separated values can be given.

6.1.0.1 `<<scaled_poisson>>` skeleton

The skeleton of our program looks like:

<<scaled_poisson-include>>
int main(int argc, char *argv[])
{
  <<scaled_poisson-usage>>
  <<scaled_poisson-args>>
  <<scaled_poisson-print-header>>
  for (size_t i=0; i<nb_n; i++) {
    // Loop over the Poisson parameters
    <<scaled_poisson-get-and-print-CDF>>  
  }
  <<scaled_poisson-free-n_seq>>
  return 0;
}

6.1.0.2 `<<scaled_poisson-include>>`

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_cdf.h>

6.1.0.3 `<<scaled_poisson-usage>>`

static char usage[] = \
  "usage: %s [--n=int,int,int,...]\n\n"\
  "  --n <positive integers>: Poisson parameters separated by ','\n\n"
  " The program writes to the stdout the CDF of a scaled Poisson random variable\n"
  " that is, if X_n is Poisson with parameter n, the first column of the output\n"
  " contains integers from 0 to n and the second column contains the corresponding\n"
  " value of the CFD of Y_n = (X_n-n)/sqrt(n)\n\n";

6.1.0.4 `<<scaled_poisson-args>>`

Code block <<scaled_poisson-args>> assigned to variable nb_n the number of Poisson distributions the user wants to consider, it allocates the heap memory required to store the nb_n parameters and makes this memory accessible through the pointer n_seq. The allocated memory must be freed before the program returns.

int *n_seq;
size_t nb_n=0;
{int opt;
  static struct option long_options[] = {
    {"n",required_argument,NULL,'n'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hn:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'n':
    {
      char *start = strdup(optarg); // duplicate optarg content
      char *running;
      running = start;
      char *token  = strsep(&running, ",");
      while (token != NULL) {
	token = strsep (&running, ","); // split d_name at each ","
	nb_n++;
      }
      free(start);
      n_seq = malloc(nb_n*sizeof(int));
      start = strdup(optarg); // duplicate optarg content again
      running = start;
      // Get the parameter of each Poisson distribution
      for (size_t i=0; i<nb_n; i++) {
	token = strsep (&running, ",");
	int para = atoi(token);
	if (para <= 0) {
	  fprintf(stderr,"A Poisson parameter should be > 0.\n");
	  free(start);
	  free(n_seq);
	  return -1;
	} else {
	  n_seq[i] = para;
	}
      }
      free(start);
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
}

6.1.0.5 `<<scaled_poisson-free-n_seq>>`

free(n_seq);

6.1.0.6 `<<scaled_poisson-print-header>>`

printf("# %d Scaled Poisson distributions "
       "with parameters:\n#  %d",(int) nb_n, n_seq[0]);
if (nb_n > 1) {
  for (size_t i=1; i<nb_n; i++)
    printf(", %d",n_seq[i]);
}
printf("\n#\n#\n");

6.1.0.7 `<<scaled_poisson-get-and-print-CDF>>`

int n = n_seq[i];
double sn = sqrt(n);
double inv_sn = 1.0/sn;
unsigned int from = n-4*sn > 0 ? n-4*sn : 0;
unsigned int to = n+5*sn;
printf("# Scaled Poisson CDF with parameter: %d\n",n);
for (unsigned int k=from; k <= to; k++) {
  printf("%g %g\n", ((double) k-n)*inv_sn, gsl_cdf_poisson_P(k,n)); 
}
printf("\n\n");

6.1.0.8 Code compilation

Once the code has been properly “tangled” and stored in a file named scaled_poisson.c (in the code sub-directory) we compile it with gcc (we could also do it with another compiler):

gcc -W -g -o code/scaled_poisson code/scaled_poisson.c -lgsl -lgslcblas -lm -std=gnu11

6.1.0.9 Code use

We test our code with a single parameter (5) and we don’t save the result in a file:

./code/scaled_poisson --n=5

# 1 Scaled Poisson distributions with parameters:
#  5
#
#
# Scaled Poisson CDF with parameter: 5
-2.23607 0.00673795
-1.78885 0.0404277
-1.34164 0.124652
-0.894427 0.265026
-0.447214 0.440493
0 0.615961
0.447214 0.762183
0.894427 0.866628
1.34164 0.931906
1.78885 0.968172
2.23607 0.986305
2.68328 0.994547
3.1305 0.997981
3.57771 0.999302
4.02492 0.999774
4.47214 0.999931
4.91935 0.99998

We can use valgrind on it:

valgrind ./code/scaled_poisson --n=5 > dump

==4495== Memcheck, a memory error detector
==4495== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4495== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==4495== Command: ./code/scaled_poisson --n=5
==4495== 
==4495== 
==4495== HEAP SUMMARY:
==4495==     in use at exit: 0 bytes in 0 blocks
==4495==   total heap usage: 4 allocs, 4 frees, 4,104 bytes allocated
==4495== 
==4495== All heap blocks were freed -- no leaks are possible
==4495== 
==4495== For counts of detected and suppressed errors, rerun with: -v
==4495== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Now we get the 4 values of interest and store the results into file scaled_poisson_cdf.txt:

./code/scaled_poisson --n=5,50,500,5000 > scaled_poisson_cdf.txt

6.1.0.10 Figures construction

We define a general code block <<scaled_poisson_cdf_gnuplot>> that contains an unspecified variable idx that we can set (to 0,1,2,3) to generate the plots with the different parameters of the scaled Poisson distributions:

set xlabel "Z"
set ylabel "CDF"
unset key
plot [-3:3][0:1] norm(x) with lines linecolor rgb "red" linewidth 2,\
     "scaled_poisson_cdf.txt" index idx using 1:2 with steps\
     linecolor rgb "black" linewidth 2

7 Calibration data analysis

7.1 Getting the data

The calibration data are located at the following address: http://xtof.disque.math.cnrs.fr/data/CCD_calibration.hdf5.

7.1.1 Donwlading the calibration data

They can be downloaded with:

wget http://xtof.disque.math.cnrs.fr/data/CCD_calibration.hdf5

7.1.2 Overview of the file content

We get a quick outline of the file content with the h5ls program:

h5ls -r CCD_calibration.hdf5

/                        Group
/100ms                   Group
/100ms/stack             Dataset {60, 80, 100}
/100ms/time              Dataset {100}
/10ms                    Group
/10ms/stack              Dataset {60, 80, 100}
/10ms/time               Dataset {100}
/20ms                    Group
/20ms/stack              Dataset {60, 80, 100}
/20ms/time               Dataset {100}
/30ms                    Group
/30ms/stack              Dataset {60, 80, 100}
/30ms/time               Dataset {100}
/40ms                    Group
/40ms/stack              Dataset {60, 80, 100}
/40ms/time               Dataset {100}
/50ms                    Group
/50ms/stack              Dataset {60, 80, 100}
/50ms/time               Dataset {100}
/60ms                    Group
/60ms/stack              Dataset {60, 80, 100}
/60ms/time               Dataset {100}
/70ms                    Group
/70ms/stack              Dataset {60, 80, 100}
/70ms/time               Dataset {100}
/80ms                    Group
/80ms/stack              Dataset {60, 80, 100}
/80ms/time               Dataset {100}
/90ms                    Group
/90ms/stack              Dataset {60, 80, 100}
/90ms/time               Dataset {100}

We see that the file contains 10 Groups (10ms, 20ms,…,100ms) and that each group contains 2 DataSets “stack” and “time”. The “stack” data set is a cube (3D object 60x80x100) containing the data. We can get details by looking at the “README” attribute of the file with (result not shown):

h5dump -N "README" CCD_calibration.hdf5

7.2 Dumping and plotting an image

We now want to write a short C code that takes a group name (like 10ms) an image index in the stack and that prints to the stdout the matrix (the image or exposure) containing the data so that it can be used by gnuplot.

7.2.0.1 `print-exposure` skeleton

<<print-exposure-include>>
int main(int argc, char *argv[])
{
  <<print-exposure-usage>>
  <<print-exposure-args>>
  <<print-exposure-open-FILE-and-read-DSET>>
  <<print-exposure-print-results>>
  <<print-exposure-close-file-free-data>> 
  return 0;
}

7.2.0.2 `<<print-exposure-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <string.h>
#include <getopt.h>

7.2.0.3 `<<print-exposure-usage>>`

static char usage[] = \
  "usage: %s [--group=string] [--frame=idx] [--file=string]\n\n"\
  "  --group <character string>: the name of the group (eg, 10ms the default)\n"
  "  --frame <positive integer>: the frame index (from 0 to 99, default at 0)\n"
  "  --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
  " The program opens 'file', gets to dataset 'group/stack' and prints to\n"
  " the stdout the exposure whose index is given by frame.\n\n";

7.2.0.4 `<<print-exposure-args>>`

This code block is probably more sophisticated than necessary given what we want to do with this program. It assigns the frame index to the frame variable, the complete dataset name to dset and makes filename point to a string containing the filename. It moreover checks that the Group name is among the known group names.

size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
{int opt;
  size_t n_groups = 10;
  char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
		    "60ms","70ms","80ms","90ms","100ms"};
  char *groupname;
  static struct option long_options[] = {
    {"group",optional_argument,NULL,'g'},
    {"file",optional_argument,NULL,'f'},
    {"frame",optional_argument,NULL,'i'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  int file_name_set = 0, group_name_set = 0;
  while ((opt = getopt_long(argc,argv,
                            "hf:g:i:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'i':
    {
      frame = (size_t) atoi(optarg);
      if (frame > 99)
      {
	fprintf(stderr,"frame should between 0 and 99 inclusive.\n");
	return -1;
      }  
    }
    break;
    case 'f':
    {
      filename = optarg;
      file_name_set = 1;
    }
    break;
    case 'g':
    {
      groupname = optarg;
      size_t j;
      for (j=0; j<n_groups; j++)
	if (strcmp(groupname,groups[j])==0)
	  break;
      if (j == n_groups) {
	fprintf(stderr,"Unknown group\n");
	return -1;
      }
      group_name_set = 1;
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  if (group_name_set == 0) groupname =  groups[0];
  if (file_name_set == 0) filename =  default_file_name;
  strcat(dset,groupname);
  strcat(dset,"/stack");
}

7.2.0.5 `<<print-exposure-open-FILE-and-read-DSET>>`

In this case we are using the more general way of allocating space to store the data. We use an explicit malloc with int *data = malloc(nrow*ncol*nsamp*sizeof(int)); that allocates memory in the heap (that we must free later on) instead of the int data[nrow][ncol][nsamp]; that we used in our first HDF5 reading code (that allocates memory in the stack, meaning we do not have to free it ourselves).

// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);

7.2.0.6 `<<print-exposure-close-file-free-data>>`

Here we close the HDF5 file and we free the memory space taken up by data.

free(data);
H5Fclose (file_id);

7.2.0.7 `<<print-exposure-print-results>>`

The only tricky part here is due to the way we allocated the memory space taken by data, we cannot use data[i][j][frame] as we would have done previously, we must use data[(i*ncol+j)*nsamp+frame] instead:

for (size_t i=0; i<nrow; i++) {
  for (size_t j=0; j<ncol; j++)
    printf("%d ",data[(i*ncol+j)*nsamp+frame]);
  printf("\n");
}

7.2.0.8 Compile `print-exposure`

We compile the code with:

gcc -W -g -o code/print-exposure code/print-exposure.c -lhdf5 -lhdf5_hl -lm -std=gnu11

7.2.0.9 Use and test

We use the code with:

./code/print-exposure > toto

We use valgrind on it with:

valgrind ./code/print-exposure > toto

7.2.0.10 Do the figure

set palette grey
set cblabel "ADU"

set xrange [0:79]
set yrange [0:59]
set view map
splot [0:79][0:59] "<./code/print-exposure" matrix with image

7.3 Dumping and plotting ADU sequences

7.3.1 Definition and use of a shell function to dump the data

If we just want to plot a few ADU sequences, we can make use of the options of h5dump that allow us to print to the stdout part of an HDF5 file, like here if we want to look at the ADU sequence of row 31, column 41 (starting counting at 0) for the 10ms Group:

h5dump -d "/10ms/stack[31,41,0;;1,1,100;]" -A 0 CCD_calibration.hdf5

HDF5 "CCD_calibration.hdf5" {
DATASET "/10ms/stack" {
   DATATYPE  H5T_STD_I32LE
   DATASPACE  SIMPLE { ( 60, 80, 100 ) / ( 60, 80, 100 ) }
   SUBSET {
      START ( 31, 41, 0 );
      STRIDE ( 1, 1, 1 );
      COUNT ( 1, 1, 100 );
      BLOCK ( 1, 1, 1 );
      DATA {
      (31,41,0): 419, 423, 422, 411, 412, 412, 407, 423, 424, 427, 419, 409,
      (31,41,12): 409, 410, 413, 420, 412, 440, 413, 416, 413, 416, 408, 417,
      (31,41,24): 410, 418, 405, 420, 415, 424, 417, 422, 405, 397, 412, 413,
      (31,41,36): 409, 408, 412, 421, 420, 416, 400, 425, 408, 405, 408, 419,
      (31,41,48): 405, 406, 416, 418, 413, 412, 421, 418, 422, 414, 426, 414,
      (31,41,60): 403, 418, 406, 414, 395, 400, 416, 411, 408, 414, 412, 425,
      (31,41,72): 415, 418, 414, 415, 401, 409, 421, 412, 415, 417, 424, 426,
      (31,41,84): 422, 412, 411, 420, 413, 431, 420, 426, 411, 416, 408, 398,
      (31,41,96): 411, 412, 417, 413
      }
   }
}
}

Now we are just interested in the data part so we extract everything that comes of a regular expression looking like the following pattern: (number,number,number): using the sed program (this command sequence was built interactively from the shell):

h5dump -d "/10ms/stack[31,41,0;;1,1,100;]" -A 0 CCD_calibration.hdf5 | \
    grep "([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):" | \
    sed -e "s/[[:space:]]*([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):[[:space:]]*//g" |\
    sed -e "s/,$//g" | sed -e "s/,[[:space:]]*/\n/g"

We can then write a short shell script taking parameters and dumping the ADU sequence we are interested in:

#!/bin/bash
# Usage:     adu_sequence row column
# Dumps the corresponding adu sequence 

h5dump -d "/10ms/stack[$1,$2,0;;1,1,100;]" -A 0 CCD_calibration.hdf5 | \
    grep "([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):" | \
    sed -e "s/[[:space:]]*([0-9][0-9],[0-9][0-9],\(0\|[0-9][0-9]\)):[[:space:]]*//g" |\
    sed -e "s/,$//g" | sed -e "s/,[[:space:]]*/\n/g"

We have to change the mode of our script to be able to use it with:

chmod u+x code/adu_sequence.sh

We can then use our code with (result not shown):

./code/adu_sequence.sh 31 41

7.3.2 Making the figure

We can now make our figure using the =multiplot= feature of gnuplot:

unset key
set grid
set multiplot layout 3,1 margins 0.1,0.9,0.1,0.95 spacing 0,0
set xtics 20 format ""
set ylabel "ADU"
set xlabel ""
plot [1:100][390:440] "<./code/adu_sequence.sh 31 41" using 0:1\
     with lines linewidth 2 linecolor rgb "blue"
set xtics 20 format ""
set ylabel "ADU"
set xlabel ""
plot [1:100][390:440] "<./code/adu_sequence.sh 31 40" using 0:1\
     with lines linewidth 2 linecolor rgb "blue"
set format x "%g"
set xlabel "Frame index"
set ylabel "ADU"
plot [1:100][390:440] "<./code/adu_sequence.sh 31 39" using 0:1\
     with lines linewidth 2 linecolor rgb "blue"
unset grid
unset multiplot

7.4 Linear fit of a single sequence

We now fit a straight line to a single ADU sequence like the ones that are dumped to the stdout by our adu_sequence.sh function. We will write a program doing this job, adu_sequence_fit. This program will read the data from the stdin and will print to the sdtout a gnuplot compatible dataset with fitting results as comments followed by four columns containing the frame index, the fitted ADU value, its 95% lower bound and its 95% upper bound.

7.4.0.1 `adu_sequence_fit` skeleton

<<adu_sequence_fit-include>>
int main(int argc, char *argv[])
{
  <<adu_sequence_fit-usage>>
  <<adu_sequence_fit-args>>
  <<adu_sequence_fit-read-data>>
  <<adu_sequence_fit-fit>>
  <<adu_sequence_fit-print-results>>  
  return 0;
}

7.4.0.2 `<<adu_sequence_fit-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_fit.h>
#include <gsl/gsl_cdf.h>

7.4.0.3 `<<adu_sequence_fit-usage>>`

static char usage[] = \
  "usage: %s [--nb_frames=integer] [--conf_level=real]\n\n"\
  "  --nb_frames <positive integer>: the number of frames (default at 100)\n"
  "  --conf_level <positive real>: confidence level (between 0 and 1, default 0.95)\n\n"
  " The program reads data from the stdin and performs a linear fit.\n"
  " It prints to the stdout data compatible with gnuplot with the\n"
  " fitting results as comments followed by 4 columns:\n"
  "   - the frame index\n"
  "   - the fitted ADU\n"
  "   - the lower bound of the fitted ADU\n"
  "   - the upper bound of the fitted ADU.\n\n";

7.4.0.4 `<<adu_sequence_fit-args>>`

This code block initializes two variables: n (size_t) that contains the number of ADU in the sequence and cl (a double) that contains the requested confidence level.

size_t n = 100;
double cl = 0.95;
{int opt;
  static struct option long_options[] = {
    {"nb_frames",optional_argument,NULL,'n'},
    {"conf_level",optional_argument,NULL,'c'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hn:c:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'n':
    {
      n = (size_t) atoi(optarg);
    }
    break;
    case 'c':
    {
      cl = (double) atof(optarg);
      if (cl <= 0 | cl >= 1)
      {
        fprintf(stderr,"A confidence interval should be between 0 and 1.\n");
        return -1;
      }
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
}

7.4.0.5 `<<adu_sequence_fit-read-data>>`

double adu[n];
{
  char input[256];
  size_t i=0;
  while (fgets(input,sizeof(input),stdin) != NULL & i < n) {
    adu[i] = (double) atoi(input);
    i++;
  }
  if (i != n)
    fprintf(stderr,"Fewer ADU than planed.\n");
}

7.4.0.6 `<<adu_sequence_fit-fit>>`

double x[n];
for (size_t i=0; i<n; i++)
  x[i] = (double) i;
double c0, c1, cov00, cov01, cov11, sumsq;
gsl_fit_linear (x, 1, adu, 1, n, &c0, &c1,
		&cov00, &cov01, &cov11, 
		&sumsq);

7.4.0.7 `<<adu_sequence_fit-print-results>>`

printf ("# best fit: ADU = %g + %g F\n", c0, c1);
printf ("# covariance matrix:\n");
printf ("# [ %g, %g\n#   %g, %g]\n", 
	cov00, cov01, cov01, cov11);
printf ("# Residual sum of squares = %g\n", sumsq);
double slope_se = sqrt(cov11)*gsl_cdf_tdist_Pinv(1.0-(1.0-cl)*0.5,(double)n-3.0);
printf ("# A %g confidence interval for the slope is (%g,%g)\n",
	cl,c1-slope_se,c1+slope_se);
printf ("#\n");
printf ("# Frame index  Fitted value  Lower bound  Upper bound\n"); 
double se_f = gsl_cdf_gaussian_Pinv(1.0-(1-cl)*0.5,1.0);
for (size_t i=0; i<n; i++) {
  double aduf, aduf_err;
  gsl_fit_linear_est(x[i],c0,c1,cov00, cov01,
		     cov11,&aduf,&aduf_err);
  printf("%g %g %g %g\n",
	 x[i],aduf,aduf-se_f*aduf_err,
	 aduf+se_f*aduf_err);
}

7.4.0.8 Compile `adu_sequence_fit`

We compile the code with:

gcc -W -g -o code/adu_sequence_fit code/adu_sequence_fit.c -lgsl -lgslcblas -lm -std=gnu11

7.4.0.9 Use and test `adu_sequence_fit`

We use it piping the output of our shell function with:

./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit > middle_adu_seq_fit
head middle_adu_seq_fit

# best fit: ADU = 413.134 + 0.0320432 F
# covariance matrix:
# [ 2.07911, -0.0313433
#   -0.0313433, 0.000633199]
# Residual sum of squares = 5170.6
# A 0.95 confidence interval for the slope is (-0.0178928,0.0819792)
#
# Frame index  Fitted value  Lower bound  Upper bound
0 413.134 410.308 415.96
1 413.166 410.382 415.95

We can check in two steps with valgrind as follows:

./code/adu_sequence.sh 31 40 > middle_adu_seq
valgrind cat middle_adu_seq | ./code/adu_sequence_fit > middle_adu_seq_fit

==5596== Memcheck, a memory error detector
==5596== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5596== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==5596== Command: cat middle_adu_seq
==5596== 
==5596== 
==5596== HEAP SUMMARY:
==5596==     in use at exit: 0 bytes in 0 blocks
==5596==   total heap usage: 31 allocs, 31 frees, 138,848 bytes allocated
==5596== 
==5596== All heap blocks were freed -- no leaks are possible
==5596== 
==5596== For counts of detected and suppressed errors, rerun with: -v
==5596== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.4.1 Making the figure

set xlabel "Frame index"
set ylabel "ADU"
set grid
unset key
plot [1:100][395:435] "<./code/adu_sequence.sh 31 40" using 0:1\
     with lines linewidth 2 linecolor rgb "blue", \
     "<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:2\
     with lines linewidth 2 linecolor rgb "red", \
     "<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:3\
     with lines linewidth 2 linecolor rgb "grey", \
     "<./code/adu_sequence.sh 31 40 | ./code/adu_sequence_fit" using 1:4\
     with lines linewidth 2 linecolor rgb "grey"

7.5 Linear fit of all sequences in a given group

We now want to fit a straight line to the ADU sequence of each of the 60x80 pixels and to return the ratio of the slope to its standard error. The latter should be the realization of a random variable having a t distribution with 97 degrees of freedom (we have 100 observations in each sequence and we estimate three parameters: the intercept at the origin, the slope and the variance).

7.5.1 Program `adu_drift_test` definition

7.5.1.1 `adu_drift_test` skeleton

<<adu_drift_test-include>>
int main(int argc, char *argv[])
{
  <<adu_drift_test-usage>>
  <<adu_drift_test-args>>
  <<adu_drift_test-read-data>>
  <<adu_drift_test-fit>>
  <<adu_drift_test-print-results>>
  <<adu_drift_test-close-file>>
  return 0;
}

7.5.1.2 `<<adu_drift_test-include>>`

This time, we are dealing with the HDF5 file so we need the corresponding headers:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_fit.h>
#include <hdf5.h>
#include <hdf5_hl.h>

7.5.1.3 `<<adu_drift_test-usage>>`

static char usage[] = \
  "usage: %s [--group=string] [--file=string]\n\n"\
  "  --group <character string>: the name of the group (eg, 10ms the default)\n"
  "  --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
  " The program opens 'file', gets to dataset 'group/stack', fits a straight\n"
  " line to the ADU sequence of each pixel and prints to the stdout the slope\n"
  " divided by its standard error (something that should follow a t distribution\n"
  " with 97 degrees of freedom).\n\n";

7.5.1.4 `<<adu_drift_test-args>>`

Only a minor adaptation of <<print-exposure-args>> is required here (we do not pass the frame):

size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
{int opt;
  size_t n_groups = 10;
  char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
                    "60ms","70ms","80ms","90ms","100ms"};
  char *groupname;
  static struct option long_options[] = {
    {"group",optional_argument,NULL,'g'},
    {"file",optional_argument,NULL,'f'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  int file_name_set = 0, group_name_set = 0;
  while ((opt = getopt_long(argc,argv,
                            "hf:g:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      filename = optarg;
      file_name_set = 1;
    }
    break;
    case 'g':
    {
      groupname = optarg;
      size_t j;
      for (j=0; j<n_groups; j++)
        if (strcmp(groupname,groups[j])==0)
          break;
      if (j == n_groups) {
        fprintf(stderr,"Unknown group\n");
        return -1;
      }
      group_name_set = 1;
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  if (group_name_set == 0) groupname =  groups[0];
  if (file_name_set == 0) filename =  default_file_name;
  strcat(dset,groupname);
  strcat(dset,"/stack");
}

7.5.1.5 `<<adu_drift_test-read-data>>`

We can directly reuse <<print-exposure-open-FILE-and-read-DSET>> here:

<<print-exposure-open-FILE-and-read-DSET>>

7.5.1.6 `<<adu_drift_test-close-file>>`

Same as above…

<<print-exposure-close-file-free-data>>

7.5.1.7 `<<adu_drift_test-fit>>`

double stat[nrow*ncol];
size_t stat_idx=0;
double x[nsamp];
for (size_t i=0; i<nsamp; i++)
  x[i] = (double) i;
double c0, c1, cov00, cov01, cov11, sumsq;
for (size_t i=0; i<nrow; i++) {
  for (size_t j=0; j<ncol; j++) {
    double adu[nsamp];
    for (size_t k=0; k<nsamp; k++)
      adu[k] = (double) data[(i*ncol+j)*nsamp+k];
    gsl_fit_linear (x, 1, adu, 1, nsamp, &c0,
		    &c1, &cov00, &cov01, &cov11, 
		    &sumsq);
    stat[stat_idx] = c1/sqrt(cov11);
    stat_idx++;
  }
}

7.5.1.8 `<<adu_drift_test-print-results>>`

for (size_t i=0; i<nrow*ncol; i++) 
  printf("%g\n",stat[i]);

7.5.1.9 Compile `adu_drift_test`

We compile the code with:

gcc -W -g -o code/adu_drift_test code/adu_drift_test.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

7.5.1.10 Use, test, etc of `adu_drift_test`

We use adu_drift_test on the first group with:

./code/adu_drift_test --group=10ms > adu_drift_test_10ms
head adu_drift_test_10ms

We can use valgrind on it with:

valgrind ./code/adu_drift_test --group=20ms > adu_drift_test_20ms

==7985== Memcheck, a memory error detector
==7985== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7985== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==7985== Command: ./code/adu_drift_test --group=20ms
==7985== 
==7985== 
==7985== HEAP SUMMARY:
==7985==     in use at exit: 0 bytes in 0 blocks
==7985==   total heap usage: 6,560 allocs, 6,560 frees, 13,435,720 bytes allocated
==7985== 
==7985== All heap blocks were freed -- no leaks are possible
==7985== 
==7985== For counts of detected and suppressed errors, rerun with: -v
==7985== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.5.1.11 Computation of the drift statistic for all the groups

We are using a loop from the shell

touch drift_for_all
for grp in {10..100..10}ms ; do
  ./code/adu_drift_test --group=$grp >> drift_for_all
done

7.5.2 Histogram building program

It is handy to have a program able to return an histogram if necessary, we now write one adapted from the gsl-histogram program (that comes with the gsl library and whose code is given in the histogram examples). The program reads data from the stdin and prints the resulting histogram to the sdtout.

7.5.2.1 `mk_histogram` skeleton

<<mk_histogram-include>>
int main(int argc, char *argv[])
{
  <<mk_histogram-usage>>
  <<mk_histogram-args>>
  <<mk_histogram-read-data>>
  <<mk_histogram-make-histo>>
  <<mk_histogram-print-results>>
  <<mk_histogram-free-memory>>
  return 0;
}

7.5.2.2 `<<mk_histogram-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_histogram.h>

7.5.2.3 `<<mk_histogram-usage>>`

static char usage[] = \
  "usage: %s --sample-size=integer [--min=real] [--max=real] ...\n"
  "          ... [--n_bins=integer] [--unormalize]\n\n"
  "  --sample-size <positive integer>: the sample size\n"									
  "  --min <double>: the left boundary of the histogram (default: min value of the data)\n"
  "  --max <double>: the right boundary of the histogram (default: max value of the data)\n"
  "  --n_bins <integer>: the number of bins (default: max-min/sample size/10)\n"
  "  --unormalize: if set the histogram is not normalized\n\n"
  " The program reads data from the stdin builts the histogram and prints it\n"
  " to the stdout.\n\n";

7.5.2.4 `<<mk_histogram-args>>`

size_t sample_size = 0;
size_t n_bins;
double min,max;
int normalize=1,min_set=0,max_set=0,bins_set=0;
{int opt;
  static struct option long_options[] = {
    {"sample_size",required_argument,NULL,'s'},
    {"min",optional_argument,NULL,'d'},
    {"max",optional_argument,NULL,'m'},
    {"n_bins",optional_argument,NULL,'b'},
    {"unormalize",no_argument,NULL,'u'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hub:s:d:m:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'u':
    {
      normalize = 0;
    }
    break;
    case 'b':
    {
      n_bins = (size_t) atoi(optarg);
      bins_set = 1;
    }
    break;
    case 's':
    {
      sample_size = (size_t) atoi(optarg);
    }
    break;
    case 'd':
    {
      min = (double) atof(optarg);
      min_set = 1;
    }
    break;
    case 'm':
    {
      max = (double) atof(optarg);
      max_set = 1;
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  if (min_set & max_set & min>max) {
    fprintf(stderr,"min > max! Not allowed.\n");
    return -1;
  }
}

7.5.2.5 `<<mk_histogram-read-data>>`

double data[sample_size];
size_t data_idx=0;
double x;
while (fscanf(stdin, "%lg", &x) == 1 & \
       data_idx < sample_size) {
  data[data_idx] = x;
  data_idx++;
}

7.5.2.6 `<<mk_histogram-make-histo>>`

if (!min_set | !max_set) {
  gsl_vector_view data_v = gsl_vector_view_array (data, sample_size);
  double xmin,xmax;
  gsl_vector_minmax(&data_v.vector,&xmin,&xmax);
  if (!min_set)
    min = xmin;
  if (!max_set)
    max = xmax;
}
if (!bins_set)
  n_bins = (size_t) ceil((max-min)*sample_size/10);
gsl_histogram *h = gsl_histogram_alloc (n_bins);
gsl_histogram_set_ranges_uniform (h, min, max);
for (size_t i=0; i<sample_size; i++)
  gsl_histogram_increment(h, data[i]);
if (normalize)
  gsl_histogram_scale (h,(double)n_bins/(sample_size*(max-min)));

7.5.2.7 `<<mk_histogram-print-results>>`

gsl_histogram_fprintf (stdout, h, "%g", "%g");

7.5.2.8 `<<mk_histogram-free-memory>>`

gsl_histogram_free (h);

7.5.2.9 Compile `mk_histogram`

We compile the code with:

gcc -W -g -o code/mk_histogram code/mk_histogram.c -lgsl -lgslcblas -lm -std=gnu11

7.5.2.10 Check `mk_histogram`

We pass the code through valgrind. We get the only required parameter, the number of lines in the file, through a call to wc:

file_name=drift_for_all
samp_size=`wc -l $file_name | sed -e "s/ $file_name//g"`
valgrind ./code/mk_histogram --sample_size=$samp_size \
	 < $file_name > dump

==3372== Memcheck, a memory error detector
==3372== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==3372== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==3372== Command: ./code/mk_histogram --sample_size=48000
==3372== 
==3372== 
==3372== HEAP SUMMARY:
==3372==     in use at exit: 0 bytes in 0 blocks
==3372==   total heap usage: 5 allocs, 5 frees, 741,088 bytes allocated
==3372== 
==3372== All heap blocks were freed -- no leaks are possible
==3372== 
==3372== For counts of detected and suppressed errors, rerun with: -v
==3372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.5.3 Making the figure

We make the figure comparing the histogram of scaled slope coefficients with the theoretical $t_{97}$ density as shown next. One trick is that we have to define within gnuplot a function returning the t-distribution probability density function. The number of bins is not the default one and was chosen after a few trials:

t_pdf(x,nu) = gamma(.5*(nu+1))*(1+x**2/nu)**(-.5*(nu+1))/sqrt(pi*nu)/gamma(.5*nu)
unset key
set grid
set xlabel "Scaled slope coefficient"
set ylabel "Density"
plot [-5:5][0:0.45] t_pdf(x,97) with lines linecolor rgb "orange" linewidth 3,\
     "<./code/mk_histogram --sample_size=48000 --n_bins=200 < drift_for_all"\
     using 1:3 with steps linecolor rgb "black" linewidth 2

7.6 Correlation coefficients between sequences of nearest neighbor pixels

The next global test we apply looks at correlation between the ADU sequences of pairs of pixels that are nearest neighbors.

7.6.1 Program `adu_corr_test` definition

7.6.1.1 `<<adu_corr_test>>` skeleton

<<adu_corr_test-include>>
int main(int argc, char *argv[])
{
  <<adu_corr_test-usage>>
  <<adu_drift_test-args>>
  <<adu_drift_test-read-data>>
  <<adu_corr_test-corr-and-print>>
  <<adu_drift_test-close-file>>
  return 0;
}

7.6.1.2 `<<adu_corr_test-include>>`

We are dealing with the HDF5 file so we need the corresponding headers and the gsl_stats_correlation signature is in header gsl_statistics_double.h:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <gsl/gsl_statistics_double.h>
#include <hdf5.h>
#include <hdf5_hl.h>

7.6.1.3 `<<adu_corr_test-usage>>`

static char usage[] = \
  "usage: %s [--group=string] [--file=string]\n\n"\
  "  --group <character string>: the name of the group (eg, 10ms the default)\n"
  "  --file <character string>: data file name (default CCD_calibration.hdf5)\n\n"
  " The program opens 'file', gets to dataset 'group/stack', gets the correlation\n"
  " coefficient of every pair of nearest neighbor ADU sequences and prints them to\n"
  " the stdout.\n"
  " Under the null hypothesis these correlation coefficients follow a Gaussian\n"
  " distribution centered on 0 with a variance of 1/100.\n\n";

7.6.1.4 `<<adu_corr_test-corr-and-print>>`

double corr;
for (size_t i=1; i<(nrow-1); i+=2) {
  for (size_t j=1; j<(ncol-1); j+=2) {
    double adu_ref[nsamp];
    for (size_t idx=0; idx<nsamp; idx++)
      adu_ref[idx] = (double) data[(i*ncol+j)*nsamp+idx];
    double adu_test[nsamp];
    for (size_t idx=0; idx<nsamp; idx++) 
      adu_test[idx] = (double) data[((i-1)*ncol+j)*nsamp+idx];
    // look at pixel one row above
    corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
    printf("%g\n",corr);
    for (size_t idx=0; idx<nsamp; idx++)
      adu_test[idx] = (double) data[((i+1)*ncol+j)*nsamp+idx];
    // look at pixel one row below
    corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
    printf("%g\n",corr);
    for (size_t idx=0; idx<nsamp; idx++)
      adu_test[idx] = (double) data[(i*ncol+(j-1))*nsamp+idx];
    // look at pixel one column to the left
    corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);
    printf("%g\n",corr);
    for (size_t idx=0; idx<nsamp; idx++)
      adu_test[idx] = (double) data[(i*ncol+(j+1))*nsamp+idx];
    // look at pixel one column to the right
    corr = gsl_stats_correlation(adu_ref, 1, adu_test, 1, nsamp);    
    printf("%g\n",corr);
  }
}

7.6.1.5 Compile and check `adu_corr_test`

We compile the code with:

gcc -W -g -o code/adu_corr_test code/adu_corr_test.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

We check the code with valgrind as follows:

valgrind ./code/adu_corr_test --group=10ms > dump

==4243== Memcheck, a memory error detector
==4243== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==4243== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==4243== Command: ./code/adu_corr_test --group=10ms
==4243== 
==4243== 
==4243== HEAP SUMMARY:
==4243==     in use at exit: 0 bytes in 0 blocks
==4243==   total heap usage: 6,603 allocs, 6,603 frees, 13,987,160 bytes allocated
==4243== 
==4243== All heap blocks were freed -- no leaks are possible
==4243== 
==4243== For counts of detected and suppressed errors, rerun with: -v
==4243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.6.1.6 Computation of the correlation statistic for all the groups

We are using a loop from the shell

touch corr_for_all
for grp in {10..100..10}ms ; do
  ./code/adu_corr_test --group=$grp >> corr_for_all
done

7.6.2 Making the figure

We make the figure comparing the histogram of correlation coefficients with the theoretical normal density centered on zero with a variance of 1/100 as shown next. One trick is that we have to define within gnuplot a function returning the normal probability density function. The number of bins is not the default one and was chosen after a few trials:

gogo(x,var) = exp(-0.5*x**2/var)/sqrt(2*pi*var)
unset key
set grid
set xlabel "Correlation coefficient between nearest neighbor pixels"
set ylabel "Density"
plot [-0.5:0.5][0:4.5] gogo(x,0.01) with lines linecolor rgb "orange" linewidth 3,\
     "<./code/mk_histogram --sample_size=45240 --n_bins=200 < corr_for_all"\
     using 1:3 with steps linecolor rgb "black" linewidth 2

7.7 Getting the CCD parameters: G and $σ^2_{ro}$

7.7.1 Obtaining the data for the regression analysis

Now that we are reasonably confident that our calibration data are usable, we write a program that returns the mean ADU and the ADU variance for each pixel and each exposure time. We make our program take as optional parameters the CCD gain and $σ^2_{ro}$. If they are set then variance stabilization is performed.

7.7.1.1 Program `adu_mu_s2` skeleton

<<adu_mu_s2-include>>
int main(int argc, char *argv[])
{
  <<adu_mu_s2-usage>>
  <<adu_mu_s2-args>>
  <<adu_mu_s2-open-file>>
  <<adu_mu_s2-get-mean-variance-and-print>>
  <<adu_mu_s2-close-file-free-data>>  
  return 0;
}

7.7.1.2 `<<adu_mu_s2-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_statistics_double.h>

7.7.1.3 `<<adu_mu_s2-usage>>`

static char usage[] = \
  "usage: %s [--file=string] [--gain=real] [--variance=real]\n\n"\
  "  --file <character string>: data file name (default CCD_calibration.hdf5)\n"
  "  --gain <positive real>: CCD gain (default 1)\n"
  "  --variance <positive or null real>: CCD gain (default 0)\n\n"
  " The program opens 'file', and for each dataset, for each pixel\n"
  " computes the mean ADU and the ADU variance. If 'gain' and 'variance'\n"
  " are diffferent from 1 and 0 repectively then variance stabilization is\n"
  " performed, that is, the program works with 2*sqrt(ADU/gain+variance)\n"
  " instead of working with the ADU directly.\n"
  " The program prints its results to the stdout with the mean on the first\n"
  " column and the variance on the second.\n\n";

7.7.1.4 `<<adu_mu_s2-args>>`

size_t frame = 0;
char default_file_name[] = "CCD_calibration.hdf5";
char *filename;
double gain = 1.0;
double read_out_var = 0.0;
{int opt;
  static struct option long_options[] = {
    {"file",optional_argument,NULL,'f'},
    {"gain",optional_argument,NULL,'g'},
    {"variance",optional_argument,NULL,'v'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  int file_name_set = 0;
  while ((opt = getopt_long(argc,argv,
                            "hf:g:v:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      filename = optarg;
      file_name_set = 1;
    }
    break;
    case 'g':
    {
      gain = (double) atof(optarg);
      if (gain <= 0) {
        fprintf(stderr,"gain must be > 0\n");
        return -1;
      }
    }
    break;
    case 'v':
    {
      read_out_var = (double) atof(optarg);
      if (read_out_var < 0) {
        fprintf(stderr,"variance must be >= 0\n");
        return -1;
      }
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  if (file_name_set == 0) filename =  default_file_name;
}

7.7.1.5 `<<adu_mu_s2-open-file>>`

// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
size_t n_groups = 10;
char *groups[] = {"10ms","20ms","30ms","40ms","50ms",
		  "60ms","70ms","80ms","90ms","100ms"};
char *groupname = groups[0];
char dset[25];
dset[0] = '/';
dset[1] = '\0';
strcat(dset,groupname);
strcat(dset,"/stack");
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));

7.7.1.6 `<<adu_mu_s2-get-mean-variance-and-print>>`

for (size_t g_idx=0; g_idx<n_groups; g_idx++) {
  char *groupname = groups[g_idx];
  dset[0] = '/';
  dset[1] = '\0';
  strcat(dset,groupname);
  strcat(dset,"/stack");
  H5LTread_dataset_int(file_id,dset,data);
  for (size_t i=0; i<nrow; i++) {
    for (size_t j=0; j<ncol; j++) {
      double y[nsamp];
      if (gain == 1.0 & read_out_var == 0.0) {
	for (size_t k=0; k<nsamp; k++)
	  // No variance stabilization
	  y[k] = (double) data[(i*ncol+j)*nsamp+k];
      } else {
	for (size_t k=0; k<nsamp; k++)
	  // Do variance stabilization
	  y[k] = 2*sqrt(((double) data[(i*ncol+j)*nsamp+k])/gain+read_out_var);
      }
      double mean = gsl_stats_mean(y, 1, nsamp);
      double variance = gsl_stats_variance_m(y, 1, nsamp, mean);
      printf("%g %g\n",mean,variance);
    }
  }
}

7.7.1.7 `<<adu_mu_s2-close-file-free-data>>`

free(data);
H5Fclose (file_id);

7.7.1.8 Compile and test `adu_mu_s2`

We compile the code with:

gcc -W -g -o code/adu_mu_s2 code/adu_mu_s2.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

We test the memory usage of the program with valgrind:

valgrind  ./code/adu_mu_s2 > dump

==6471== Memcheck, a memory error detector
==6471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6471== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6471== Command: ./code/adu_mu_s2
==6471== 
==6471== 
==6471== HEAP SUMMARY:
==6471==     in use at exit: 0 bytes in 0 blocks
==6471==   total heap usage: 18,263 allocs, 18,263 frees, 114,336,201 bytes allocated
==6471== 
==6471== All heap blocks were freed -- no leaks are possible
==6471== 
==6471== For counts of detected and suppressed errors, rerun with: -v
==6471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.7.2 Making the first figure

set xlabel "Mean(ADU)"
set ylabel "Variance(ADU)"
set grid
unset key
plot "< ./code/adu_mu_s2" using 1:2 with points \
     pointtype 0 linecolor rgb "black"

7.7.3 Doing the weighted linear regression

We write a program reading from the stdin the output of adu_mu_s2 and doing a weighted linear regression on these data.

7.7.3.1 `adu_mu_s2_fit` skeleton

<<adu_mu_s2_fit-include>>
int main(int argc, char *argv[])
{
  <<adu_mu_s2_fit-usage>>
  <<adu_mu_s2_fit-args>>
  <<adu_mu_s2_fit-read-data>>
  <<adu_mu_s2_fit-fit>>
  <<adu_mu_s2_fit-print-results>>
  return 0;
}

7.7.3.2 `<<adu_mu_s2_fit-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_fit.h>
#include <gsl/gsl_cdf.h>

7.7.3.3 `<<adu_mu_s2_fit-usage>>`

static char usage[] = \
  "usage: %s --sample_size=integer\n\n"\
  "  --sample_size <positive integer>: the number of pixels times the number of exposures\n\n"
  " The program reads data from the stdin and performs a weighter linear fit.\n"
  " It prints to the stdout the results\n\n";

7.7.3.4 `<<adu_mu_s2_fit-args>>`

size_t sample_size;
{int opt;
  static struct option long_options[] = {
    {"sample_size",required_argument,NULL,'s'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hs:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 's':
    {
      sample_size = (size_t) atoi(optarg);
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
}

7.7.3.5 `<<adu_mu_s2_fit-read-data>>`

double mean[sample_size];
double variance[sample_size];
double w[sample_size];
size_t data_idx=0;
double mu,s2;
while (fscanf(stdin, "%lg %lg", &mu, &s2) == 2 &	\
       data_idx < sample_size) {
  mean[data_idx] = mu;
  variance[data_idx] = s2;
  w[data_idx] = 0.5*99.0/s2/s2;
  data_idx++;
}

7.7.3.6 `<<adu_mu_s2_fit-fit>>`

double c0, c1, cov00, cov01, cov11, chisq;

gsl_fit_wlinear(mean, 1, w, 1, variance, 1, sample_size, 
		&c0, &c1, &cov00, &cov01, &cov11, 
		&chisq);

7.7.3.7 `<<adu_mu_s2_fit-print-results>>`

printf ("Best fit: Variance(ADU) = %g + %g Mean(ADU)\n", c0, c1);
printf ("Covariance matrix:\n");
printf ("[ %g, %g\n  %g, %g]\n", 
        cov00, cov01, cov01, cov11);
printf ("Residual sum of squares = %g\n", chisq);
double gain_se = sqrt(cov11)*gsl_cdf_tdist_Pinv(0.975,(double)sample_size-3.0);
printf ("The estimated gain is: %g\n",c1);
printf ("A 0.95 confidence interval for the gain is:\n (%g,%g)\n",
        c1-gain_se,c1+gain_se);
s2 = c0/gsl_pow_2(c1);
printf ("The estimated read-out variance is: %g\n",s2);
double s2_se = sqrt(cov00/gsl_pow_2(c1)+cov11*gsl_pow_2(2*c0/gsl_pow_3(c1)));
s2_se *= gsl_cdf_tdist_Pinv(0.975,(double)sample_size-3.0);
printf ("A 0.95 confidence interval for the read-out variance is:\n (%g,%g)\n",
        s2-s2_se,s2+s2_se);

7.7.3.8 Compile, run and test `adu_mu_s2_fit`

We compile the code with:

gcc -W -g -o code/adu_mu_s2_fit code/adu_mu_s2_fit.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

We run it with:

./code/adu_mu_s2 | ./code/adu_mu_s2_fit --sample_size=48000

Best fit: Variance(ADU) = 5.61141 + 0.139838 Mean(ADU)
Covariance matrix:
[ 0.02125, -1.57125e-05
  -1.57125e-05, 2.05361e-08]
Residual sum of squares = 49764.5
The estimated gain is: 0.139838
A 0.95 confidence interval for the gain is:
 (0.139557,0.140119)
The estimated read-out variance is: 286.959
A 0.95 confidence interval for the read-out variance is:
 (284.613,289.305)

We use valgrind with:

./code/adu_mu_s2 > dump 
valgrind ./code/adu_mu_s2_fit --sample_size=48000 < dump > toto

==7471== Memcheck, a memory error detector
==7471== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==7471== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==7471== Command: ./code/adu_mu_s2_fit --sample_size=48000
==7471== 
==7471== 
==7471== HEAP SUMMARY:
==7471==     in use at exit: 0 bytes in 0 blocks
==7471==   total heap usage: 2 allocs, 2 frees, 8,192 bytes allocated
==7471== 
==7471== All heap blocks were freed -- no leaks are possible
==7471== 
==7471== For counts of detected and suppressed errors, rerun with: -v
==7471== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

7.7.4 Doing the fit figures

set xlabel "Mean(ADU)"
set ylabel "Variance(ADU)"
set grid
unset key
g = 0.14
s2 = 290
pred(x) = s2*g**2 + g*x
plot "< ./code/adu_mu_s2" using 1:2 with points \
     pointtype 0 linecolor rgb "black",\
     pred(x) with lines linewidth 2 linecolor rgb "red"

set xlabel "Fitted Value"
set ylabel "Normalized residuals"
set grid
unset key
g = 0.14
s2 = 290
pred(x) = s2*g**2 + g*x
zero(x) = 0
plot "< ./code/adu_mu_s2" using (pred($1)):((pred($1)-$2)*sqrt(99.0*0.5)/$2) \
     with points pointtype 0 linecolor rgb "black",\
     zero(x) with lines linewidth 2 linecolor rgb "red"

7.8 Variance stabilization

7.8.1 Doing the figure

set xlabel "Mean of var. stab. ADU"
set ylabel "Variance of var. stab. ADU"
set grid
unset key
one(x) = 1
plot "< ./code/adu_mu_s2 --gain=0.14 --variance=290" using 1:2 with points \
     pointtype 0 linecolor rgb "black",\
     one(x) with lines linecolor rgb "red" linewidth 2

8 Automatic ROI detection

Now that we know how to stabilize the variance we can write a program that detects automatically the “Region Of Interest” (ROI), based on the simple idea that a ROI is a set of pixels whose fluorescence intensity is not constant in time (“something happens there”).

8.1 Program `roi`

8.1.0.1 `roi` skeleton

<<roi-include>>
int main(int argc, char *argv[])
{
  <<roi-usage>>
  <<roi-args>>
  <<roi-open-file-and-read-dset>>
  <<roi-stabilize-variance-and-get-pRSS>>
  <<roi-stabilize-print-results>>
  <<roi-close-file-free-memory>>  
  return 0;
}

8.1.0.2 `<<roi-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_cdf.h>
#include <gsl/gsl_statistics_double.h>

8.1.0.3 `<<roi-usage>>`

static char usage[] = \
  "usage: %s -f --file=string -d --dset=string -g --CCD_gain=real ...\n"
  "          ... -v --CCD_variance=real [-t --threshold=real] [-m --return_matrix]\n\n"
  "  -f --file <character string>: data file name (e.g. Data_POMC.hdf5)\n"
  "  -d --dset <character string>: data set name (e.g. 'stack') shoud be a 3D object\n"
  "  -g --CCD_gain <positive real>: CCD gain (e.g. 0.14)\n"
  "  -v --CCV_variance <positive real>: CCD read-out variance (e.g. 290)\n"
  "  -m --return_matrix if set a matrix whose values at pixel i,j are log(1-F(rss_i,j))\n"
  "     where F is the CDF of a chi square distribution with nsamp-1 degrees of freedome\n"
  "     nsamp being the number of measurements (exposures) made on each pixel. If unset\n"
  "     (default) the list of pixel in the ROI is returned as (i1,j1),(i2,j2),... \n"
  "  -t --threshold <negative real>: real number (e.g. -300 default), the level\n"
  "     under which the  log(1-F(rss_i,j)) is considered significant\n\n"
  " The program opens 'file' and gets dataset 'dset' (that should be a 3D array) containing\n"
  " ADUs. For each pixel i,j at each time index k it stabilizes the variance:\n"
  "   adu_i,j,k -> z_i,j,k = 2*sqrt(adu_i,j,k/CCD_gain+CCD_variance)\n"
  " the variance of the z_i,j,k should then be UNIFORMLY 1. It then computes the mean value\n"
  " along time for each pixel:\n"
  "   z_i,j_m = (z_i,j,1 + ... + z_i,j,K)/K\n"
  " and the residual sum of squares statistic:\n"
  "   rss_i,j = ((z_i,j,1-z_i,j_m)^2+...+(z_i,j,K-z_i,j_m)^2))\n"
  " under the null hypothesis (no signal in the pixel) this statistic should have a chi-square\n"
  " distribution with K-1 degrees of freedom. A matrix whose element i,j is:\n"
  "   log(1-F(rss_i,j))\n"
  " where F is the CDF of a chi-square distribution with K-1 degrees of freedom is constructed.\n"
  " If parameter '--return_matrix' is set (not the default) this matrix is printed to the stdout,\n"
  " otherwise the value of the F(rss_i,j) is compared to the Bonferroni corrected threshold set\n"
  " by '--threshold' (default 0.99) and if the value is larger, 'i j' is printed to the stdout.\n"
  " This is done for each pixel and the printed pairs are separated by '\n'.\n\n";

8.1.0.4 `<<roi-args>>`

char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
double threshold = -300.;
double gain,variance;
int print_matrix=0;
{int opt;
  static struct option long_options[] = {
    {"file",required_argument,NULL,'f'},
    {"dset",required_argument,NULL,'d'},
    {"CCD_gain",required_argument,NULL,'g'},
    {"CCD_variance",required_argument,NULL,'v'},
    {"threshold",optional_argument,NULL,'t'},
    {"return_matrix",no_argument,NULL,'m'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  char *dset_name;
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hmf:d:g:v:t:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      filename = optarg;
    }
    break;
    case 'd':
    {
      dset_name = optarg;
    }
    break;
    case 'g':
    {
      gain = (double) atof(optarg);
      if (gain <= 0.0) {
        fprintf(stderr,"CCD gain must be > 0\n");
        return -1;
      }
    }
    break;
    case 'v':
    {
      variance = (double) atof(optarg);
      if (variance < 0.0) {
        fprintf(stderr,"CCD variance must be >= 0\n");
        return -1;
      }
    }
    break;
    case 't':
    {
      threshold = (double) atof(optarg);
      if (0.0 <= threshold) {
        fprintf(stderr,"threshold should be\n");
        return -1;
      }
    }
    break;
    case 'm':
    {
      print_matrix=1;
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  strcat(dset,dset_name);
}

8.1.0.5 `<<roi-open-file-and-read-dset>>`

// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);
gsl_matrix *pRSS = gsl_matrix_alloc(nrow,ncol);

8.1.0.6 `<<roi-close-file-free-memory>>`

free(data);
gsl_matrix_free(pRSS);
H5Fclose (file_id);

8.1.0.7 `<<roi-stabilize-variance-and-get-pRSS>>`

for (size_t i=0; i<nrow; i++) {
  for (size_t j=0; j<ncol; j++) {
    double z[nsamp];
    for (size_t k=0; k<nsamp; k++) {
      double adu = (double) data[(i*ncol+j)*nsamp+k];
      z[k] = 2.0*sqrt(adu/gain+variance);
    }
    double rss = gsl_stats_variance(z,1,nsamp)*((double)nsamp-1.0); 
    gsl_matrix_set(pRSS,i,j,log(gsl_cdf_chisq_Q(rss, (double)nsamp-1.0)));
  }
}

8.1.0.8 `<<roi-stabilize-print-results>>`

if (print_matrix) {
  printf("# Matrix with %d rows and %d columns\n",
	 (int) nrow, (int) ncol);
  for (size_t i=0; i<nrow; i++) {
    printf("%g",gsl_matrix_get(pRSS,i,0));
    for (size_t j=1; j<ncol; j++)
      printf(", %g",gsl_matrix_get(pRSS,i,j));
    printf("\n");
  }
} else {
  size_t n_pixel=0;
  for (size_t i=0; i<nrow; i++) {
    for (size_t j=1; j<ncol; j++) {
      double stat = gsl_matrix_get(pRSS,i,j);
      if (stat <= threshold) {
	// Pixel part of ROI
	if (n_pixel==0)
	  printf("%d %d", (int) i, (int) j);
	else
	  printf("\n%d %d", (int) i, (int) j);
	n_pixel++;
      }
    }
  }
  if (n_pixel > 0)
    printf("\n");
  fprintf(stderr,"%d pixels in the ROI\n", (int) n_pixel);
}

8.1.0.9 Compile, run and test `roi`

We compile the code with:

gcc -W -g -o code/roi code/roi.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

We run it storing the output matrix in rss_stat with:

./code/roi --file=Data_POMC.hdf5 --dset=stack --CCD_gain=0.14 \
	   --CCD_variance=290 -m > rss_stat

We can also use valgrind (and the short version of the arguments):

valgrind ./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 \
	   -v 290 -m > dump

==7239== Memcheck, a memory error detector
==7239== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7239== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7239== Command: ./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -m
==7239== 
==7239== 
==7239== HEAP SUMMARY:
==7239==     in use at exit: 0 bytes in 0 blocks
==7239==   total heap usage: 6,579 allocs, 6,579 frees, 20,302,218 bytes allocated
==7239== 
==7239== All heap blocks were freed -- no leaks are possible
==7239== 
==7239== For counts of detected and suppressed errors, rerun with: -v
==7239== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If instead of the matrix we want the indexes of the pixels making the ROI we use:

./code/roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 > roi_pxl
cat roi_pxl

12 pixels in the ROI
26 36
26 37
26 38
26 39
27 36
27 37
27 38
27 39
28 36
28 37
28 38
28 39

8.2 Making the figure

set palette grey
set multiplot layout 1,2
set xrange [0:79]
set yrange [0:59]
set title "Full field"
plot 'rss_stat' matrix with image
set xrange [33:43]
set yrange [23:33]
set title "Cell body"
plot 'rss_stat' matrix with image
unset multiplot

9 Signal estimation

We reached the last task! Having identfied automatically the pixels of the ROI, we now estimate our model.

9.1 The model: a piecewise constant function

We have been (very) conservative and have kept as our ROI the pixels having an $log\left(1 - F_{χ_{K-1}^2}(RSS)\right) ≤ -300$.
We are left with 12 pixels.
We are going to model the fluorescence intensity of each of these pixels by: \[S_{i,j}(t) = φ_{i,j} \, f(t) + b \; ,\] where $f(t)$ is a signal time course to all pixels of the ROI, $φ_{i,j}$ is a pixel specific parameter and $b$ is a background fluorescence assumed identical for each pixel.
The time $t$ is in fact a discrete variable, $t = δ \, k$ ($δ$ = 150 ms) and we are seeking a pointwise estimation: $\{f_1,f_2,\ldots,f_K\}$ ($K$ = 168) where $f_k = f(δ \, k)$.
We end up with 12 ($φ_{i,j}$) + 168 ($f_k$) + 1 ($b$) = 181 parameters for 12 x 168 = 2016 measurements.
We need to add a constraint since with our model specification: \[S_{i,j,k} = φ_{i,j} \, f_k + b \; ,\] we can multiply all the $φ_{i,j}$ by 2 and divide all the $f_k$ by 2 and get the same prediction.
We are going to set $f_k=1$ for the first 5 time points (the stimulation comes at the 11th) and our pointwise estimation relates to what is usually done with this type of data, $Δ S(t) / S_0$ (where $S_0$ is a baseline average) through: \[Δ S(t) / S_0 = \frac{S(t) - S_0}{S_0} = f(t) - 1 + \mathrm{noise}\, .\]
Notice that no independent background measurement is used.
With variance stabilization we end up minimizing: \[RSS\left(b,(φ_{i,j}),(f_k)_{k=6,\ldots,168}\right) = ∑_{(i,j) ∈ \mathrm{ROI}} ∑_{k=1}^{168} \left(Z_{ijk}-F_{ijk}\right)^2 \, ,\] where \[Z_{ijk} = 2*\sqrt{ADU_{ijk}/\hat{G}+\hat{σ}_R^2}\] and \[F_{ijk} = 2*\sqrt{φ_{i,j} \, f_k + b + \hat{σ}_R^2}\, .\]
If our model is correct we should have: \[RSS\left(\hat{b},(\hat{φ}_{i,j}),(\hat{f}_k)_{k=6,\ldots,168}\right) ∼ χ^2_{12 × 168 - 176}\, .\]
The method also generates confidence intervals for the estimated parameters.

9.2 Parameters and data organization

We will be using the gsl nonlinear least-squares functions, we will have to define a residual returning function that takes as its first argument a (gsl-)vector of parameters. This was not an issue in Sec. Why does a proper noise model matter? since only a single parameter model was considered. Now we will be dealing with a 175 parameters model. We must therefore decide how these parameters are laid out in the parameter vector. We must also decide how the data are organized.

9.2.1 Data layout

We have P=12 pixels in our ROI and to each of those an ADU sequence with K=168 elements is associated. So we can represent these data as PxK (gsl-)matrix.

9.2.2 Going from the parameters to the model prediction

Our $(f_k)_{k=6,\ldots,168}$ sub-vector (a sub-vector of the parameter vector) can be seen as the long end of K=168 (gsl-)vector whose first five elements are one. The prediction can then be obtained by multiplying the column vector (Px1) of the $φ_{i,j}$ by the row vector (1x168) we just discussed before adding $b$ to each element of the resulting matrix.

9.2.3 Getting initial guesses

We get the initial guess for $(f_k)_{k=6,\ldots,168}$ by taking the mean across all the pixels of the ROI for each value of the time index k and we divide these mean values by the mean of the first five elements. We set arbitrarily b to 100 and we get initial guesses for the $φ_{i,j}$ from the mean value of the first five observations of each pixel to which we subtract b.

9.3 The program

We now write program fit_pwc_to_roi where “pwc” is short for “piecewise constant”.

9.3.1 Code definition

9.3.1.1 `fit_pwc_to_roi` skeleton

<<fit_pwc_to_roi-include>>
<<fit_pwc_to_roi-macro>>
<<fit_pwc_to_roi-data-structure>>
<<fit_pwc_to_roi-residual-fct>>
<<fit_pwc_to_roi-callback-fct>>
int main(int argc, char *argv[])
{
  <<fit_pwc_to_roi-usage>>
  <<fit_pwc_to_roi-args>>
  <<fit_pwc_to_roi-print-info>>
  <<fit_pwc_to_roi-read-data>>
  <<fit_pwc_to_roi-initial-guess>>
  <<fit_pwc_to_roi-initialize-data-structure>>
  <<fit_pwc_to_roi-initialize-gsl_multifit_nlinear_fdf>>
  <<fit_pwc_to_roi-allocate-and-initialize-solver>>
  <<fit_pwc_to_roi-inital-rss>>
  <<fit_pwc_to_roi-do-nls>>
  <<fit_pwc_to_roi-print-info-final>>
  <<fit_pwc_to_roi-print-results>>
  <<fit_pwc_to_roi-free-memory>>  
  return 0;
}

9.3.1.2 `<<fit_pwc_to_roi-include>>`

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
#include <math.h>
#include <hdf5.h>
#include <hdf5_hl.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_blas.h>
#include <gsl/gsl_multifit_nlinear.h>

9.3.1.3 `<<fit_pwc_to_roi-macro>>`

The baseline defined next is the duration (in sample points) during which the signal is assumed constant.

#define baseline 5

9.3.1.4 `<<fit_pwc_to_roi-data-structure>>`

To compute the residuals (difference between observations and predictions) we need the data from the ROI. Member stab_pxl of the following structure will contain a pointer to the (variance stabilized) data organized as a matrix with as many rows as pixels in the ROI and as many columns as sample points (as discussed above). Member variance of the structure will contain the CCD variance.

struct data {
  gsl_matrix * stab_pxl; // matrix of stabilized observations from ROI
  double variance; // the CCD variance
};

9.3.1.5 `<<fit_pwc_to_roi-usage>>`

static char usage[] = \
  "usage: %s -f --file=string -d --dset=string -g --CCD_gain=real ...\n"
  "          ... -v --CCD_variance=real -p --pixels=string \n\n"
  "  -f --file <character string>: data file name (e.g. Data_POMC.hdf5)\n"
  "  -d --dset <character string>: data set name (e.g. 'stack') shoud be a 3D object\n"
  "  -g --CCD_gain <positive real>: CCD gain (e.g. 0.14)\n"
  "  -v --CCV_variance <positive real>: CCD read-out variance (e.g. 290)\n"
  "  -p --pixels <character string>: name of a file containing the row and column\n"
  "       indexes of the pixels belonging to the ROI on two columns\n\n"
  " The program opens 'file' and gets dataset 'dset' (that should be a 3D array) containing\n"
  " ADUs. For each pixel i,j in the list specified by pixels at each time index k it\n"
  " stabilizes the variance:\n"
  "   adu_i,j,k -> z_i,j,k = 2*sqrt(adu_i,j,k/CCD_gain+CCD_variance)\n"
  " the variance of the z_i,j,k should then be UNIFORMLY 1. It then fits by nonlinear\n"
  " least-squares a model where the piecewise constant time course of the 'signal'\n"
  " of each pixel of the ROI is assumed to be the same. This 'signal' is then\n"
  " multiplied by a pixel specific factor (giving a predicted calcium induced)\n"
  " fluorescence and a common autofluorescence is added to each pixel.\n\n"
  " The program prints to the stderr fit diagnostics. It also prints to the stdout\n"
  " the 'signal' values together with the lower and upper limits of a 95% pointwise\n"
  " confidence intervals. Two blanck lines\n"
  " follow and, one as many columns as there are pixels in the ROI, the model\n"
  " predictions for each pixels are printed (a header gives the row and column\n"
  " index of each pixel). After two more blank lines the stabilized observations\n"
  " are printed in the same format.\n\n";

9.3.1.6 `<<fit_pwc_to_roi-args>>`

This code block reads the program parameters. It assigns (after allocating memory if necessary):

filename: pointer to a character string with the name of the data file.
dset: pointer to a character string with the name of the data set.
gain, variance: values of the CCD gain and read-out variance.
pxl_row, pxl_row: 1D arrays containing the row and column indexes of the pixels making the ROI.
nb_pxl: the number of pixels in the ROI.

char *filename;
char dset[25];
dset[0] = '/';
dset[1] = '\0';
double gain,variance;
size_t *pxl_row,*pxl_col;
size_t nb_pxl=0;
{int opt;
  static struct option long_options[] = {
    {"file",required_argument,NULL,'f'},
    {"dset",required_argument,NULL,'d'},
    {"CCD_gain",required_argument,NULL,'g'},
    {"CCD_variance",required_argument,NULL,'v'},
    {"pixels",optional_argument,NULL,'p'},
    {"help",no_argument,NULL,'h'},
    {NULL,0,NULL,0}
  };
  char *dset_name;
  int long_index =0;
  while ((opt = getopt_long(argc,argv,
                            "hf:d:g:v:p:",
                            long_options,       
                            &long_index)) != -1) {
    switch(opt) {
    case 'f':
    {
      filename = optarg;
    }
    break;
    case 'd':
    {
      dset_name = optarg;
    }
    break;
    case 'g':
    {
      gain = (double) atof(optarg);
      if (gain <= 0.0) {
        fprintf(stderr,"CCD gain must be > 0\n");
        return -1;
      }
    }
    break;
    case 'v':
    {
      variance = (double) atof(optarg);
      if (variance < 0.0) {
        fprintf(stderr,"CCD variance must be >= 0\n");
        return -1;
      }
    }
    break;
    case 'p':
    {
      char *fname = optarg;
      char line[BUFSIZ];
      FILE *fp;
      if (NULL == (fp = fopen(fname,"r"))) {
	fprintf(stderr,"Could not open file %s\n",fname);
	return -1;
      }
      while (fgets(line,BUFSIZ,fp))
	nb_pxl++;
      rewind(fp);
      pxl_row = malloc(nb_pxl*sizeof(size_t));
      pxl_col = malloc(nb_pxl*sizeof(size_t));
      int row_idx,col_idx;
      size_t pxl_idx=0;
      while (fscanf(fp,"%d %d",&row_idx,&col_idx)==2) {
	pxl_row[pxl_idx] = (size_t) row_idx;
	pxl_col[pxl_idx] = (size_t) col_idx;
	pxl_idx++;
      }
      fclose(fp);
    }
    break;
    case 'h': printf(usage,argv[0]);
      return -1;
    default : fprintf(stderr,usage,argv[0]);
      return -1;
    }
  }
  strcat(dset,dset_name);
}

9.3.1.7 `<<fit_pwc_to_roi-free-memory>>`

As usual this code block frees the allocated memory.

free(pxl_row);
free(pxl_col);
gsl_matrix_free(pxl_data);
gsl_vector_free(f);
gsl_vector_free(phi);
gsl_vector_free(par);
gsl_multifit_nlinear_free (w);
gsl_matrix_free(covar);

9.3.1.8 `<<fit_pwc_to_roi-print-info>>`

This code block prints to the stderr some general information concerning the fit that will be performed next.

fprintf(stderr,"Fitting now piecewise constant signal within ROI for\n"
	"dataset %s from file %s\n",dset,filename);
fprintf(stderr,"The row and column indexes of the pixels making the\n"
	"ROI are:\n");
fprintf(stderr,"  (%d,%d)", (int) pxl_row[0],
	  (int) pxl_col[0]);
for (size_t pxl_idx=1; pxl_idx<nb_pxl; pxl_idx++)
  if (pxl_row[pxl_idx] > pxl_row[pxl_idx-1])
    fprintf(stderr,"\n  (%d,%d)", (int) pxl_row[pxl_idx],
	    (int) pxl_col[pxl_idx]);
  else
    fprintf(stderr," (%d,%d)", (int) pxl_row[pxl_idx],
	    (int) pxl_col[pxl_idx]);
fprintf(stderr,"\nThe CCD gain and read-out variance are:\n"
	"  %g and %g\n\n",gain,variance);

9.3.1.9 `<<fit_pwc_to_roi-read-data>>`

This code block reads the data from the HDF5 file, allocates pxl_data, a gsl_matrix, and assigns ADU counts of the pixels making the ROI to it.

// Open FILE
hid_t file_id = H5Fopen (filename, H5F_ACC_RDONLY, H5P_DEFAULT);
// Get dimensions of 3D object contained in DSET
hsize_t dims[3];
H5LTget_dataset_info(file_id,dset,dims,NULL,NULL);
size_t nrow = (size_t) dims[0];
size_t ncol = (size_t) dims[1];
size_t nsamp = (size_t) dims[2];
// Read dset in data
int *data = malloc(nrow*ncol*nsamp*sizeof(int));
H5LTread_dataset_int(file_id,dset,data);
gsl_matrix *pxl_data = gsl_matrix_alloc(nb_pxl,nsamp);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
  size_t i = pxl_row[pxl_idx];
  size_t j = pxl_col[pxl_idx];
  for (size_t k=0; k<nsamp; k++) {
    double adu = (double) data[(i*ncol+j)*nsamp+k];
    gsl_matrix_set(pxl_data,pxl_idx,k,adu);
  }
}
free(data);
H5Fclose (file_id);

9.3.1.10 `<<fit_pwc_to_roi-initialize-data-structure>>`

Initialization of a data structure.

struct data d = { .stab_pxl = pxl_data,
                  .variance = variance};

9.3.1.11 `<<fit_pwc_to_roi-initial-guess>>`

Initial guesses are obtained here. Notice the use of the =vector_views=, phiphi and ff, they are very handy for making the match between “directly meaningful parameters” and their counter part on the parameter vector (par here). We are also working on a log scale since every model parameter is positive (that’s a way of enforcing it). The two variables that are initialized here and are used later on are:

n_par: the number of model parameters.
par: a pointer to a gsl_vector containing the (log of the) model parameters.

gsl_vector *f = gsl_vector_calloc(nsamp);
gsl_vector *phi = gsl_vector_calloc(nb_pxl);
double b=100;
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
  gsl_vector_const_view row = gsl_matrix_const_row(pxl_data,pxl_idx);
  gsl_vector_add(f,&row.vector);
}
gsl_vector_scale(f, 1.0/(double) nb_pxl);
double baseline_mean=0;
for (size_t k=0; k<baseline; k++)
  baseline_mean += gsl_vector_get(f,k);
baseline_mean /= (double) baseline;
gsl_vector_scale(f, 1.0/baseline_mean);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
  double sum=0;
  for (size_t k=0; k<baseline; k++)
    sum += gsl_matrix_get(pxl_data,pxl_idx,k);
  sum /= (double) baseline;
  gsl_vector_set(phi,pxl_idx,sum-b);
}
size_t n_par = 1+nb_pxl+(nsamp-baseline);
gsl_vector *par = gsl_vector_alloc(n_par);
gsl_vector_set(par,0,log(b));
gsl_vector_view phiphi = gsl_vector_subvector(par,1,nb_pxl);
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
  gsl_vector_set(&phiphi.vector,pxl_idx,log(gsl_vector_get(phi,pxl_idx)));
gsl_vector_view ff = gsl_vector_subvector(par,1+nb_pxl,nsamp-baseline);
for (size_t k=baseline; k<nsamp; k++)
  gsl_vector_set(&ff.vector,k-baseline,log(gsl_vector_get(f,k)));
// Stabilize variance of observations
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++) {
  for (size_t k=0; k<nsamp; k++) {
    // stabilize variance
    double z = 2*sqrt(gsl_matrix_get(pxl_data,pxl_idx,k)/gain+variance);
    gsl_matrix_set(pxl_data,pxl_idx,k,z);
  }
}

9.3.1.12 `<<fit_pwc_to_roi-residual-fct>>`

Definition of the “workhorse” of this program, the function that returns the vector of residuals (residual) from the model parameters (log_par) and the data (data). Again, vector (and matrix) views are heavily used.

int
residual_fct(const gsl_vector *log_par, void *data, 
	     gsl_vector * residual)
{
  gsl_matrix *obs = ((struct data *)data)->stab_pxl;
  double variance = ((struct data *)data)->variance;
  size_t n_pxl = obs->size1;
  size_t n_obs = obs->size2;
  size_t log_par_length = log_par->size;
  double auto_fluo = exp(gsl_vector_get(log_par,0));
  gsl_vector_const_view phi = gsl_vector_const_subvector(log_par, 1, n_pxl);
  gsl_vector_const_view f = gsl_vector_const_subvector(log_par, 1+n_pxl,n_obs-baseline);
  for (size_t pxl_idx=0; pxl_idx<n_pxl; pxl_idx++) {
    gsl_vector_const_view row = gsl_matrix_const_row(obs,pxl_idx);
    gsl_vector_view resid = gsl_vector_subvector(residual, pxl_idx*n_obs, n_obs);
    double phi_val = exp(gsl_vector_get(&phi.vector,pxl_idx));
    for (size_t k=0; k<baseline; k++) {
      double z = gsl_vector_get(&row.vector,k);
      double sF = 2*sqrt(auto_fluo+phi_val+variance);
      gsl_vector_set(&resid.vector,k,z-sF);
    }
    for (size_t k=baseline; k<n_obs; k++) {
      double z = gsl_vector_get(&row.vector,k);
      double s = exp(gsl_vector_get(&f.vector,k-baseline));
      double sF = 2*sqrt(auto_fluo+phi_val*s+variance);
      gsl_vector_set(&resid.vector,k,z-sF);
    }
  }
  return GSL_SUCCESS;
}

9.3.1.13 `<<fit_pwc_to_roi-callback-fct>>`

A simple callback function definition that prints to the stderr the iteration number together with the current RSS value.

void
callback(const size_t iter, void *params,
         const gsl_multifit_nlinear_workspace *w)
{
  gsl_vector *f = gsl_multifit_nlinear_residual(w);
  
  fprintf(stderr, "iter %2zu: RSS = %.4f\n",
          iter,
          gsl_pow_2(gsl_blas_dnrm2(f)));
}

9.3.1.14 `<<fit_pwc_to_roi-initialize-gsl_multifit_nlinear_fdf>>`

As its name says, initialization of the structure required by the least-squares solver.

gsl_multifit_nlinear_fdf fdf;
fdf.f = residual_fct;
fdf.df = NULL;   /* set to NULL for finite-difference Jacobian */
fdf.fvv = NULL;     /* not using geodesic acceleration */
fdf.n = nsamp*nb_pxl;
fdf.p = n_par;
fdf.params = &d;

9.3.1.15 `<<fit_pwc_to_roi-allocate-and-initialize-solver>>`

Now the initialization of the solver itself.

const gsl_multifit_nlinear_type *T = gsl_multifit_nlinear_trust;
gsl_multifit_nlinear_parameters fdf_params = gsl_multifit_nlinear_default_parameters();
/* allocate workspace with default parameters */
gsl_multifit_nlinear_workspace *w = gsl_multifit_nlinear_alloc(T,
							       &fdf_params,
							       nsamp*nb_pxl,
							       n_par);
/* initialize solver with starting point */
gsl_multifit_nlinear_init (par, &fdf, w);

9.3.1.16 `<<fit_pwc_to_roi-inital-rss>>`

We get the initial RSS value (the one given by the initial guess of the model parameters).

gsl_vector *residual = gsl_multifit_nlinear_residual(w);
double chisq0;
gsl_blas_ddot(residual, residual, &chisq0);

9.3.1.17 `<<fit_pwc_to_roi-do-nls>>`

Now we run the solver.

const double xtol = 1e-10;
const double gtol = 1e-10;
const double ftol = 0.0;
/* solve the system with a maximum of 200 iterations */
int info;
int status = gsl_multifit_nlinear_driver(200, xtol, gtol, ftol,
					 callback, NULL, &info,
					 w);

  /* compute covariance of best fit parameters */
gsl_matrix *J = gsl_multifit_nlinear_jac(w);
gsl_matrix *covar = gsl_matrix_alloc (1+nb_pxl+nsamp-baseline,
				      1+nb_pxl+nsamp-baseline);
gsl_multifit_nlinear_covar (J, 0.0, covar);

/* compute final cost */
double chisq;
gsl_blas_ddot(residual, residual, &chisq);

9.3.1.18 `<<fit_pwc_to_roi-print-info-final>>`

We print to the stderr general information of the fit. In particular the reason for stopping the iterations and the RSS per degree of freedom (should be close to 1 for a good fit).

fprintf(stderr, "\n\nsummary from method '%s/%s'\n",
	gsl_multifit_nlinear_name(w),
	gsl_multifit_nlinear_trs_name(w));
fprintf(stderr, "number of iterations: %zu\n",
	gsl_multifit_nlinear_niter(w));
fprintf(stderr, "function evaluations: %zu\n", fdf.nevalf);
fprintf(stderr, "Jacobian evaluations: %zu\n", fdf.nevaldf);
fprintf(stderr, "reason for stopping: %s\n",
	(info == 1) ? "small step size" : "small gradient");
fprintf(stderr, "initial RSS = %f\n", chisq0);
fprintf(stderr, "final   RSS = %f\n", chisq);
double dof = nsamp*nb_pxl - n_par;
fprintf(stderr, "RSS/dof = %g\n", chisq / dof);
fprintf (stderr, "status = %s\n\n", gsl_strerror (status));

9.3.1.19 `<<fit_pwc_to_roi-print-results>>`

We print to the stdout what was announced in our usage code block.

#define FIT(i) gsl_vector_get(w->x, i)
#define ERR(i) sqrt(gsl_matrix_get(covar,i,i))
printf("# Common background estimate: %g [%g,%g]\n\n",
       exp(FIT(0)),exp(FIT(0)-1.96*ERR(0)),
       exp(FIT(0)+1.96*ERR(0)));
// Print f estimation as a first data set
for (size_t k=0; k<baseline; k++)
  printf("%g %g %g\n",1.0,1.0,1.0);
for (size_t k=baseline; k<nsamp; k++) {
  double est = FIT(1+nb_pxl+k-baseline);
  double err = ERR(1+nb_pxl+k-baseline);
  printf("%g %g %g\n",
	 exp(est), exp(est-1.96*err),
	 exp(est+1.96*err));
}
// Print intensity estimations one column per pixel
printf("\n\n# Predicted variance stabilized signals\n");
double auto_fluo = exp(FIT(0));
printf("#");
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
  printf(" (%d,%d)",(int) pxl_row[pxl_idx],(int) pxl_col[pxl_idx]);
printf("\n");
for (size_t k=0; k<baseline; k++) {
  for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
    printf("%g ", 2*sqrt(exp(FIT(pxl_idx+1))+auto_fluo+variance));
  printf("\n");
}
for (size_t k=baseline; k<nsamp; k++) {
  for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
    printf("%g ", 2*sqrt(exp(FIT(pxl_idx+1))*exp(FIT(1+nb_pxl+k-baseline))+auto_fluo+variance));
  printf("\n");
}
// Print observations one column per pixel
printf("\n\n# Observed stabilized signals\n");
printf("#");
for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
  printf(" (%d,%d)",(int) pxl_row[pxl_idx],(int) pxl_col[pxl_idx]);
printf("\n");
for (size_t k=0; k<nsamp; k++) {
  for (size_t pxl_idx=0; pxl_idx<nb_pxl; pxl_idx++)
    printf("%g ", gsl_matrix_get(pxl_data,pxl_idx,k));
  printf("\n");
}

9.3.1.20 Compile, run and test `roi`

We compile the code with:

gcc -W -g -o code/fit_pwc_to_roi code/fit_pwc_to_roi.c -lgsl -lgslcblas -lhdf5 -lhdf5_hl -lm -std=gnu11

We run it and store the stdout output to a file called roi_fit:

./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl > roi_fit

Fitting now piecewise constant signal within ROI for
dataset /stack from file Data_POMC.hdf5
The row and column indexes of the pixels making the
ROI are:
  (26,36) (26,37) (26,38) (26,39)
  (27,36) (27,37) (27,38) (27,39)
  (28,36) (28,37) (28,38) (28,39)
The CCD gain and read-out variance are:
  0.14 and 290

iter  0: RSS = 17256272.3301
iter  1: RSS = 13998117.4429
iter  2: RSS = 7593455.7961
iter  3: RSS = 1663548.7841
iter  4: RSS = 143000.0538
iter  5: RSS = 54906.0470
iter  6: RSS = 40397.0454
iter  7: RSS = 23941.9594
iter  8: RSS = 12454.3227
iter  9: RSS = 6562.5716
iter 10: RSS = 3755.3448
iter 11: RSS = 2717.3812
iter 12: RSS = 2447.4664
iter 13: RSS = 2364.2423
iter 14: RSS = 2281.7851
iter 15: RSS = 2210.0332
iter 16: RSS = 2157.7101
iter 17: RSS = 2116.6896
iter 18: RSS = 2086.1365
iter 19: RSS = 2062.0695
iter 20: RSS = 2043.4441
iter 21: RSS = 2028.4456
iter 22: RSS = 2016.4435
iter 23: RSS = 2006.5716
iter 24: RSS = 1998.4771
iter 25: RSS = 1991.7092
iter 26: RSS = 1986.0608
iter 27: RSS = 1981.2822
iter 28: RSS = 1977.2434
iter 29: RSS = 1973.7948
iter 30: RSS = 1970.8489
iter 31: RSS = 1968.3371
iter 32: RSS = 1966.1716
iter 33: RSS = 1964.3094
iter 34: RSS = 1962.6948
iter 35: RSS = 1961.2984
iter 36: RSS = 1960.0846
iter 37: RSS = 1959.0317
iter 38: RSS = 1958.1156
iter 39: RSS = 1957.3199
iter 40: RSS = 1956.6276
iter 41: RSS = 1956.0261
iter 42: RSS = 1955.5034
iter 43: RSS = 1955.0498
iter 44: RSS = 1954.6563
iter 45: RSS = 1954.3157
iter 46: RSS = 1954.0212
iter 47: RSS = 1953.7672
iter 48: RSS = 1953.5486
iter 49: RSS = 1953.3612
iter 50: RSS = 1953.2009
iter 51: RSS = 1953.0645
iter 52: RSS = 1952.9489
iter 53: RSS = 1952.8516
iter 54: RSS = 1952.7701
iter 55: RSS = 1952.7024
iter 56: RSS = 1952.6468
iter 57: RSS = 1952.6015
iter 58: RSS = 1952.5651
iter 59: RSS = 1952.5363
iter 60: RSS = 1952.5140
iter 61: RSS = 1952.4970
iter 62: RSS = 1952.4846
iter 63: RSS = 1952.4758
iter 64: RSS = 1952.4700
iter 65: RSS = 1952.4663
iter 66: RSS = 1952.4643
iter 67: RSS = 1952.4633
iter 68: RSS = 1952.4630
iter 69: RSS = 1952.4629
iter 70: RSS = 1952.4629
iter 71: RSS = 1952.4629
iter 72: RSS = 1952.4629
iter 73: RSS = 1952.4629
iter 74: RSS = 1952.4629
iter 75: RSS = 1952.4629
iter 76: RSS = 1952.4629
iter 77: RSS = 1952.4629
iter 78: RSS = 1952.4629
iter 79: RSS = 1952.4629
iter 80: RSS = 1952.4629
iter 81: RSS = 1952.4629
iter 82: RSS = 1952.4629
iter 83: RSS = 1952.4629
iter 84: RSS = 1952.4629
iter 85: RSS = 1952.4629
iter 86: RSS = 1952.4629
iter 87: RSS = 1952.4629
iter 88: RSS = 1952.4629
iter 89: RSS = 1952.4629
iter 90: RSS = 1952.4629
iter 91: RSS = 1952.4629
iter 92: RSS = 1952.4629
iter 93: RSS = 1952.4629
iter 94: RSS = 1952.4629


summary from method 'trust-region/levenberg-marquardt'
number of iterations: 94
function evaluations: 16668
Jacobian evaluations: 0
reason for stopping: small step size
initial RSS = 17256272.330096
final   RSS = 1952.462926
RSS/dof = 1.06112
status = success

We see that our RSS (or $χ^2$) per degree of freedom is good! We can check our code with valgrind in the usual way:

valgrind ./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl > dump

==7270== Memcheck, a memory error detector
==7270== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==7270== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==7270== Command: ./code/fit_pwc_to_roi -f Data_POMC.hdf5 -d stack -g 0.14 -v 290 -p roi_pxl
==7270== 
Fitting now piecewise constant signal within ROI for
dataset /stack from file Data_POMC.hdf5
The row and column indexes of the pixels making the
ROI are:
  (26,36) (26,37) (26,38) (26,39)
  (27,36) (27,37) (27,38) (27,39)
  (28,36) (28,37) (28,38) (28,39)
The CCD gain and read-out variance are:
  0.14 and 290

iter  0: RSS = 17256272.3301
iter  1: RSS = 13998117.4429
iter  2: RSS = 7593455.7961
iter  3: RSS = 1663548.7841
iter  4: RSS = 143000.0538
iter  5: RSS = 54906.0470
iter  6: RSS = 40397.0454
iter  7: RSS = 23941.9594
iter  8: RSS = 12454.3227
iter  9: RSS = 6562.5716
iter 10: RSS = 3755.3448
iter 11: RSS = 2717.3812
iter 12: RSS = 2447.4664
iter 13: RSS = 2364.2423
iter 14: RSS = 2281.7851
iter 15: RSS = 2210.0332
iter 16: RSS = 2157.7101
iter 17: RSS = 2116.6896
iter 18: RSS = 2086.1365
iter 19: RSS = 2062.0695
iter 20: RSS = 2043.4441
iter 21: RSS = 2028.4456
iter 22: RSS = 2016.4435
iter 23: RSS = 2006.5716
iter 24: RSS = 1998.4771
iter 25: RSS = 1991.7092
iter 26: RSS = 1986.0608
iter 27: RSS = 1981.2822
iter 28: RSS = 1977.2434
iter 29: RSS = 1973.7948
iter 30: RSS = 1970.8489
iter 31: RSS = 1968.3371
iter 32: RSS = 1966.1716
iter 33: RSS = 1964.3094
iter 34: RSS = 1962.6948
iter 35: RSS = 1961.2984
iter 36: RSS = 1960.0846
iter 37: RSS = 1959.0317
iter 38: RSS = 1958.1156
iter 39: RSS = 1957.3199
iter 40: RSS = 1956.6276
iter 41: RSS = 1956.0261
iter 42: RSS = 1955.5034
iter 43: RSS = 1955.0498
iter 44: RSS = 1954.6563
iter 45: RSS = 1954.3157
iter 46: RSS = 1954.0212
iter 47: RSS = 1953.7672
iter 48: RSS = 1953.5486
iter 49: RSS = 1953.3612
iter 50: RSS = 1953.2009
iter 51: RSS = 1953.0645
iter 52: RSS = 1952.9489
iter 53: RSS = 1952.8516
iter 54: RSS = 1952.7701
iter 55: RSS = 1952.7024
iter 56: RSS = 1952.6468
iter 57: RSS = 1952.6015
iter 58: RSS = 1952.5651
iter 59: RSS = 1952.5363
iter 60: RSS = 1952.5140
iter 61: RSS = 1952.4970
iter 62: RSS = 1952.4846
iter 63: RSS = 1952.4758
iter 64: RSS = 1952.4700
iter 65: RSS = 1952.4663
iter 66: RSS = 1952.4643
iter 67: RSS = 1952.4633
iter 68: RSS = 1952.4630
iter 69: RSS = 1952.4629
iter 70: RSS = 1952.4629
iter 71: RSS = 1952.4629
iter 72: RSS = 1952.4629
iter 73: RSS = 1952.4629
iter 74: RSS = 1952.4629
iter 75: RSS = 1952.4629
iter 76: RSS = 1952.4629
iter 77: RSS = 1952.4629
iter 78: RSS = 1952.4629
iter 79: RSS = 1952.4629
iter 80: RSS = 1952.4629
iter 81: RSS = 1952.4629
iter 82: RSS = 1952.4629
iter 83: RSS = 1952.4629
iter 84: RSS = 1952.4629
iter 85: RSS = 1952.4629
iter 86: RSS = 1952.4629
iter 87: RSS = 1952.4629
iter 88: RSS = 1952.4629
iter 89: RSS = 1952.4629
iter 90: RSS = 1952.4629
iter 91: RSS = 1952.4629
iter 92: RSS = 1952.4629
iter 93: RSS = 1952.4629
iter 94: RSS = 1952.4629


summary from method 'trust-region/levenberg-marquardt'
number of iterations: 94
function evaluations: 16668
Jacobian evaluations: 0
reason for stopping: small step size
initial RSS = 17256272.330096
final   RSS = 1952.462926
RSS/dof = 1.06112
status = success

==7270== 
==7270== HEAP SUMMARY:
==7270==     in use at exit: 0 bytes in 0 blocks
==7270==   total heap usage: 6,684 allocs, 6,684 frees, 29,471,706 bytes allocated
==7270== 
==7270== All heap blocks were freed -- no leaks are possible
==7270== 
==7270== For counts of detected and suppressed errors, rerun with: -v
==7270== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

We are happy to see that all our allocated memory was freed before returning.

9.4 Doing the figures

We plot our $\hat{f}_k$ estimation together with pointwise 95% confidence intervals using the following code:

set grid
set xlabel "Time (s)"
set ylabel "f estimate"
unset key
plot [0:25][] 'roi_fit' index 0 using ($0*0.15):1 with lines\
     linecolor rgb 'red' linewidth 2,\
     '' index 0 using ($0*0.15):2 with lines\
     linecolor rgb 'grey' linewidth 2,\
     '' index 0 using ($0*0.15):3 with lines\
     linecolor rgb 'grey' linewidth 2

We now build a figure showing the (variance stabilized) data of each of the 12 pixels of the ROI (in black) together with the fitted values (in red).

set xlabel ""; set ylabel ""
set yrange [135:185]; set xrange [0:25]
set xtics format ""; set ytics format ""
unset key; set grid
set multiplot layout 3,4 margins 0.1,0.9,0.1,0.95 spacing 0,0
do for[k=1:12] {\
   plot 'roi_fit' index 1 using ($0*0.15):k with lines\
   linecolor rgb 'red' linewidth 2,\
   '' index 2 using ($0*0.15):k with lines\
   linecolor rgb 'black' linewidth 1}
unset multiplot

Files

ENP2017_technical_notes.org

Latest commit

History

ENP2017_technical_notes.org

File metadata and controls

Technical notes for the ENP 2017 course

1 Setup

2 Introduction

2.1 Why C, gnuplot and the shell?

2.2 Required software and libraries

2.3 A remark on the code presentation

3 Getting the data

3.1 Downloading the data

3.2 Python code converting HDF5 to FITS

4 POMC Data visualization

4.1 Data summaries with the HDF5 tools

4.2 C code printing to STDOUT part of the data set for gnuplot

4.3 C code doing variance stabilization

5 Why does a proper noise model matter?

5.1 Figure data = deterministic part + noise

5.1.1 C code requirements

5.1.2 Program sim_mono_exp

5.1.3 sim_mono_exp usage

5.1.4 Making the figure

5.2 Simple case simulation

5.2.1 What do we want to do?

5.2.2 C implementation for a single case

5.2.2.1 A note on the nonlinear least-squares routines

5.2.2.2 Program skeleton

5.2.2.3 Headers and macros

5.2.2.4 A structure holding “what is fixed”

5.2.2.5 Residual returning function definition for the first estimator

5.2.2.6 Jacobian returning function definition for the first estimator

5.2.2.7 Residual returning function definition for the second estimator

5.2.2.8 Jacobian returning function definition for the second estimator

5.2.2.9 “callback” function definition

5.2.2.10 Data simulation and “fixed parameters” initialization

5.2.2.11 Definitions of convergence / stopping conditions

5.2.2.12 <<expl_beta_est-setup-tilde-solver>>: solver memory allocation and initialization for the first estimator

5.2.2.13 <<expl_beta_est-setup-hat-solver>>: solver memory allocation and initialization for the second estimator

5.2.2.14 <<expl_beta_est-free-solver>>: free solvers’ space

5.2.2.15 <<expl_beta_est-fdf-tilde>>: function to minimize for the first estimator

5.2.2.16 <<expl_beta_est-fdf-hat>>: function to minimize for the second estimator

5.2.2.17 <<expl_beta_est-inital-cost-tilde>>

5.2.2.18 <<expl_beta_est-inital-cost-hat>>

5.2.2.19 <<expl_beta_est-get-beta-tilde>>

5.2.2.20 <<expl_beta_est-get-beta-hat>>

5.2.2.21 <<expl_beta_est-free-covar>>: frees space taken by covariance matrices

5.2.2.22 Code compilation

5.2.2.23 Code use

5.2.2.24 Using valgrind

5.2.3 C implementation of a systematic simulation

5.2.3.1 <<beta_samp_dist>> code skeleton

5.2.3.2 <<beta_samp_dist-include>> header

5.2.3.3 <<beta_samp_dist-usage>>

5.2.3.4 <<beta_samp_dist-args>>

5.2.3.5 <<beta_samp_dist-fdf-tilde>>

5.2.3.6 <<beta_samp_dist-fdf-hat>>

5.2.3.7 <<beta_samp_dist-setup-tilde-solver>>

5.2.3.8 <<beta_samp_dist-setup-hat-solver>>

5.2.3.9 <<beta_samp_dist-allocate-beta-vectors>>

5.2.3.10 <<beta_samp_dist-free-beta-vectors>>

5.2.3.11 <<beta_samp_dist-get-beta-tilde>>

5.2.3.12 <<beta_samp_dist-get-beta-hat>>

5.2.3.13 <<beta_samp_dist-store-results>>

5.2.3.14 <<beta_samp_dist-beta-range>>

5.2.3.15 <<beta_samp_dist-allocate-hist>>

5.2.3.16 <<beta_samp_dist-free-hist>>

5.2.3.17 <<beta_samp_dist-fill-hist>>

5.2.3.18 <<beta_samp_dist-write-hist>>

5.2.3.19 Code compilation

5.2.3.20 Code use

5.2.3.21 Figure construction

6 Convergence in law of a Poisson random variable towards a normal one

6.1 Scaled Poisson CDF

6.1.0.1 <<scaled_poisson>> skeleton

6.1.0.2 <<scaled_poisson-include>>

6.1.0.3 <<scaled_poisson-usage>>

6.1.0.4 <<scaled_poisson-args>>

2.1 Why `C`, `gnuplot` and the `shell`?

3.2 `Python` code converting `HDF5` to `FITS`

4.1 Data summaries with the `HDF5` tools

4.2 `C` code printing to `STDOUT` part of the data set for `gnuplot`

4.3 `C` code doing variance stabilization

5.1 Figure `data = deterministic part + noise`

5.1.1 `C` code requirements

5.1.2 Program `sim_mono_exp`

5.1.3 `sim_mono_exp` usage

5.2.2 `C` implementation for a single case

5.2.2.12 `<<expl_beta_est-setup-tilde-solver>>`: solver memory allocation and initialization for the first estimator

5.2.2.13 `<<expl_beta_est-setup-hat-solver>>`: solver memory allocation and initialization for the second estimator

5.2.2.14 `<<expl_beta_est-free-solver>>`: free solvers’ space

5.2.2.15 `<<expl_beta_est-fdf-tilde>>`: function to minimize for the first estimator

5.2.2.16 `<<expl_beta_est-fdf-hat>>`: function to minimize for the second estimator

5.2.2.17 `<<expl_beta_est-inital-cost-tilde>>`

5.2.2.18 `<<expl_beta_est-inital-cost-hat>>`

5.2.2.19 `<<expl_beta_est-get-beta-tilde>>`

5.2.2.20 `<<expl_beta_est-get-beta-hat>>`

5.2.2.21 `<<expl_beta_est-free-covar>>`: frees space taken by covariance matrices

5.2.3 `C` implementation of a systematic simulation

5.2.3.1 `<<beta_samp_dist>>` code skeleton

5.2.3.2 `<<beta_samp_dist-include>>` header

5.2.3.3 `<<beta_samp_dist-usage>>`

5.2.3.4 `<<beta_samp_dist-args>>`

5.2.3.5 `<<beta_samp_dist-fdf-tilde>>`

5.2.3.6 `<<beta_samp_dist-fdf-hat>>`

5.2.3.7 `<<beta_samp_dist-setup-tilde-solver>>`

5.2.3.8 `<<beta_samp_dist-setup-hat-solver>>`

5.2.3.9 `<<beta_samp_dist-allocate-beta-vectors>>`

5.2.3.10 `<<beta_samp_dist-free-beta-vectors>>`

5.2.3.11 `<<beta_samp_dist-get-beta-tilde>>`

5.2.3.12 `<<beta_samp_dist-get-beta-hat>>`

5.2.3.13 `<<beta_samp_dist-store-results>>`

5.2.3.14 `<<beta_samp_dist-beta-range>>`

5.2.3.15 `<<beta_samp_dist-allocate-hist>>`

5.2.3.16 `<<beta_samp_dist-free-hist>>`

5.2.3.17 `<<beta_samp_dist-fill-hist>>`

5.2.3.18 `<<beta_samp_dist-write-hist>>`

6.1.0.1 `<<scaled_poisson>>` skeleton

6.1.0.2 `<<scaled_poisson-include>>`

6.1.0.3 `<<scaled_poisson-usage>>`

6.1.0.4 `<<scaled_poisson-args>>`

6.1.0.5 `<<scaled_poisson-free-n_seq>>`

6.1.0.6 `<<scaled_poisson-print-header>>`

6.1.0.7 `<<scaled_poisson-get-and-print-CDF>>`

7.2.0.1 `print-exposure` skeleton

7.2.0.2 `<<print-exposure-include>>`

7.2.0.3 `<<print-exposure-usage>>`

7.2.0.4 `<<print-exposure-args>>`

7.2.0.5 `<<print-exposure-open-FILE-and-read-DSET>>`

7.2.0.6 `<<print-exposure-close-file-free-data>>`

7.2.0.7 `<<print-exposure-print-results>>`

7.2.0.8 Compile `print-exposure`

7.4.0.1 `adu_sequence_fit` skeleton

7.4.0.2 `<<adu_sequence_fit-include>>`

7.4.0.3 `<<adu_sequence_fit-usage>>`

7.4.0.4 `<<adu_sequence_fit-args>>`

7.4.0.5 `<<adu_sequence_fit-read-data>>`

7.4.0.6 `<<adu_sequence_fit-fit>>`

7.4.0.7 `<<adu_sequence_fit-print-results>>`

7.4.0.8 Compile `adu_sequence_fit`

7.4.0.9 Use and test `adu_sequence_fit`

7.5.1 Program `adu_drift_test` definition

7.5.1.1 `adu_drift_test` skeleton

7.5.1.2 `<<adu_drift_test-include>>`

7.5.1.3 `<<adu_drift_test-usage>>`

7.5.1.4 `<<adu_drift_test-args>>`

7.5.1.5 `<<adu_drift_test-read-data>>`

7.5.1.6 `<<adu_drift_test-close-file>>`

7.5.1.7 `<<adu_drift_test-fit>>`

7.5.1.8 `<<adu_drift_test-print-results>>`

7.5.1.9 Compile `adu_drift_test`

7.5.1.10 Use, test, etc of `adu_drift_test`

7.5.2.1 `mk_histogram` skeleton

7.5.2.2 `<<mk_histogram-include>>`

7.5.2.3 `<<mk_histogram-usage>>`

7.5.2.4 `<<mk_histogram-args>>`

7.5.2.5 `<<mk_histogram-read-data>>`

7.5.2.6 `<<mk_histogram-make-histo>>`