Skip to content

HKyleZhang/Thesis_Figure_and_Supplementary

Repository files navigation

License: CC BY-NC-ND 4.0

Catalogues of figures and supplementary of my PhD thesis


Link to the thesis: https://www.lu.se/lup/publication/649d00c5-d827-4f5b-95b9-58bb379e811b

Please cite:

Zhang H. 2023. Genomic studies of sex differences: On mutations, recombination, and sexual antagonism in songbirds. Lund: Lund University (Media-Tryck). 226 p.


Table of Contents


List of figures

Kappa

Figure Link Description
1 Link Sexual dimorphism quantified as the male-to-female difference in plumage colouration scores among selected species in Sylvioidea, Muscicapoidea, and Passeroidea superfamilies. Illustrations show monochromatic (e.g., great reed warbler, A. arundinaceus), and dichromatic (e.g., Spanish sparrow, P. hispaniolensis), males and females of some species. Birds illustrations were from the Handbook of the Birds of the World (del Hoyo and Elliott, 2006).
2 Link Illustration of different stages in the evolution of sexual antagonism. (A) A sexually monomorphic trait with different fitness optima in males and females, resulting in sexually antagonistic selection that is quantified by the opposing slopes of the relationships between relative fitness and phenotype across individual females and males. (B) Sexual dimorphism evolves as male and female phenotypic distributions move toward their respective fitness optima in response to sexually antagonistic selection. (C) A hypothetical endpoint in which the phenotypic distributions of males and females match their fitness optima, and sexual antagonism is resolved at the loci for this trait. This figure was adaptated from Cox (2017).
3 Link (A) The number of DNMs of paternal origin is plotted against the father’s age (in years). The blue line shows the linear fit (estimate of the slope=0.31, p=5.15 × 10−4) and the grey band represents the 95% confidence interval. (B) The number of DNMs of maternal origin is plotted against the mother’s age (in years), the blue line shows the linear fit (estimate of slope=0.12, p=0.02), and the grey band represents the 95% confidence interval. The figure was adapted from Wong et al. (2016).
4 Link Examples of sex differences in recombination landscapes. Local recombination rates are greater in males at chromosome ends, while local rates are often greater in females near the centromere (shown by the green bars). (A) Recombination along human chromosome 7 (Broman et al., 1998). (B) Recombination along domestic dog chromosome 19 (Wong et al., 2010). (C) Distribution of crossovers as function of relative distance from centromere across long arms of all Gasterosteus stickleback chromosomes (Sardell et al., 2018). The figure was adapted from Sardell and Kirkpatrick (2020).
5 Link Distribution range of great reed warbler in summer (orange) and winter (blue). The map and bird illustration were from the Handbook of the Birds of the World (del Hoyo and Elliott, 2006).
6 Link Phylogeny of the 13 selected species.
7 Link The three-generation pedigree of great reed warblers (Acrocephalus arundinaceus). Shown are generation (F0, F1, and F2), individual code (e.g., H7-38) and sex (square: male; circle: female).
8 Link Two scenarios of the occurrence and fate of germline mutations in a three-generation pedigree. (A) The first scenario involves germline mutations occurring in one individual (e.g., H7-38) of F0 generation, detected in one individual (e.g., H3-00) of F1 generation, and then present in several individuals (e.g., 254, 256, 257) of F2 generations. (B) The second scenario involves the germline mutations occurring in one individual (e.g., H3-00) of F1 generation and detected in one individual (e.g., 257) of F2 generation.
9 Link The workflow of using RecView. Solid lines indicate the basic workflow while dashed lines indicate the optional workflow. RecView requires an input genotype file which can be generated by using make_012gt() on the output file from VCFtools, or using make_012gt_from_vcf() on the VCF file. RecView further requires an input scaffold file containing the order and orientation of the scaffolds. These two input files are used together with the built-in dictionary of grandparent-of-origin (GoO) to produce the GoO figure showing the GoO inferences of alleles along the scaffolds, and a figure showing the informative alleles density. RecView can further locate putative recombination positions with the proportional difference or cumulative continuity score algorithms and output result figures and tables. The result figures and tables can be saved, including an intermediate table containing the GoO inferences at each SNP.
10 Link Examples of the occurrence and inheritance of crossover in individuals of F1 generation. The locations of recombination are visualised as the boundaries of two chromosome regions with different colours in individuals of F2 generation. Different colours indicate different grandparent-of-origins (i.e., individuals of F0 generation).
11 Link Estimates of tRC across 51 neo-sex chromosome genes from (A) the dS approach, (B) the MLCT approach, (C) the ELW approach, and (D) the BEAST approach. Genes were ordered according to the physical positions on great reed warbler’s neo-Z, and included 22 genes from the ancestral part and 29 genes on the added part (Sigeman et al., 2021). Colours correspond to clusters of a K-means clustering analysis (k = 2), with brown indicating Cluster 1 and blue Cluster 2. The colour gradient (heatmap) in the ELW approach indicates the ELW values for each hypothetical topology. The two horizontal lines mark the origin of Neognathae (green) and Sylvioidea (red).
12 Link Parent-of-origin for the DNMs.
13 Link The GUI of RecView with the setting panel (red square) for uploading input files (yellow square) and setting options and the output panel (green square) where results can be accessed by selecting different tabs (orange square).
14 Link The grandparental-of-origin of informative alleles at all SNPs along chromosome 1 in great reed warbler offspring ID-256 for. Dots are plotted with noise on the y-axis to alleviate the degree of overlap. Colouration indicates different scaffolds on chromosome 1 in the great reed warbler genome assembly (Sigeman et al., 2021).
15 Link CO rates and locations between sexes. (A) Recombination rates were not significantly different between males and females. (B) Physical distance and (C) proportional distance of COs from the telomeric end of chromosome arms were significantly different, with more COs near the telomeric end in males than females.
16 Link Pairwise linkage disequilibrium squared correlations (r2) between the SNPs within 100 kb. The scale at the bottom (black) indicates the physical distance (bp) between the SNPs. The scale at the top (red) indicates the distribution of the data points within the 100 kb. The scale on the right (red) indicates the distribution of the data points between 0.00 – 1.00 of r2.
17 Link FST values (A) and weighted mean FST values in a 10-kb window (B) on different chromosomes. Solid red line delimits the top 10 and dashed red line delimits the top 100 SNPs or windows with highest mean FST. “Unplaced” groups the data points on the scaffolds that were not assigned to chromosomes.
18 Link Significance values of single SNPs along the different chromosomes from a GWAS analysis with sex. Solid red line delimits the top 10 most significant SNPs and dashed red line delimits the top 100 most significant SNPs. “Unplaced” groups the data points on the scaffolds that were not assigned to chromosomes.
19 Link Comparing FST and significance of association with sex from GWAS for the top 100 SNPs in respective analyses, with the SNPs exclusively identified in FST-based approach (yellow), exclusively identified in GWAS approach (brown), the SNPs identified in both FST-based and GWAS approaches (red).

Paper I

Paper I has been published. Link to the publication: https://doi.org/10.1111/jeb.14068

Please cite:

Zhang, H., Sigeman, H. and Hansson, B., 2022. Assessment of phylogenetic approaches to study the timing of recombination cessation on sex chromosomes. Journal of evolutionary biology, 35(12), pp.1721-1733.

Figure Link Description
1 Link (a) Reference species topology used in the ELW and BEAST analyses. (b) Species included in the analyses of how lowering the number of species affected the tRC estimates in the BEAST analysis.
2 Link Examples of hypothetical topologies of Z and W gametologs under different recombination cessation scenarios when only including the A. arundinaceus W gametolog (a–c; the single-W dataset) or when including W gametologs of all six Sylvioidea species (d–f; the multi-W dataset). (a) Hypothetical topology no. 2 indicating recombination cessation before A. arundinaceus and A. stentoreus diverged, but after the split of these two species and A. palustris; (b) hypothetical topology no. 4 indicating recombination cessation before the speciation of I. opaca and after L. luscinioides; (c) hypothetical topology no. 9 indicating recombination cessation earlier than the formation of Sylvioidea, before the speciation of L. coronata and after M. undulatus; (d) hypothetical topology no. 1 indicating continuation of recombination after speciation of each Sylvioidea species; (e) hypothetical topology no. 5 indicating recombination cessation before speciation of L. luscinioides and after P. biarmicus; (f) hypothetical topology no. 11 indicating recombination cessation before G. gallus and after D. novaehollandiae speciation.
3 Link Estimates of tRC across 51 neo-sex chromosome genes from (a) the dS approach, (b) the MLCT approach, (c) the ELW approach, and (d) the BEAST approach. Genes were ordered according to the physical positions on A. arundinaceus neo-Z, and included 22 genes from the ancestral part and 29 genes on the added part (Sigeman et al., 2021). Colors correspond to clusters of a K-means clustering analysis (k = 2), with brown indicating Cluster 1 and blue Cluster 2. The color gradient (heatmap) in the ELW approach indicates the ELW values for each hypothetical topology. The two horizontal lines mark the origin of Neognathae (green) and Sylvioidea (red).
4 Link Relationship between estimates of tRC of the BEAST and ELW approaches plotted on a convertible scale corresponding to the posterior median and 95% HPD for the BEAST approach and the highest ELW-value topology and topology range of 95% accumulated ELW values for the ELW approach. Genes are colored according to their physical position on the ancestral (turquoise) or added (pink) part of the neo-sex chromosome of A. arundinaceus. The tRC estimates of the ELW and BEAST approaches were significantly correlated (Spearman's rank test: ρ = 0.952; ***: p < 0.001). The scale gives the conversion of the tRC estimates and the corresponding phylogenetic positions on the reference species topology. The timings for the origin of Neognathae (green) and Sylvioidea (red) are marked with dashed lines.
5 Link ELW values for each hypothetical topology when using the single-W dataset (red heatmap) and the multi-W dataset (blue heatmap), respectively, for genes on (a) the ancestral part and (b) the added part of the neo-sex chromosome.
6 Link (a) Posterior medians and 95% HPDs, and (b) the width of 95% HPD, for BEAST analyses of the 1-, 3-, 6-, and 12-outgroup datasets, respectively. Colors indicate the genes' location on the ancestral (turquoise) or added (pink) part of the neo-sex chromosome. In (a), Spearman's rank correlation coefficients and their significance levels are given (***: p < 0.001). In (b), overlaid on the box plots (grey), each gene is connected with lines between datasets, and significant values from post hoc analyses with Wilcoxon signed-rank tests after correction for multiple comparisons are given as followed: n.s.: not significant; **: p < 0.01; ***: p < 0.001.

Paper II

Figure Link Description
1 Link Mutation rates (µ) of 13 mammals, one fish and one bird estimated with pedigree-based approaches. Details are given in Table S1.
2 Link The three-generation pedigree of great reed warblers (Acrocephalus arundinaceus). Shown are generation (F0, F1, and F2), individual code (e.g., H7-38) and sex (square: male; circle: female).
3 Link Histogram of the number of F2 offspring inheriting each DNM detected in F1 parents (n = 18 DNMs). Note that a germline mutation detected in an F1 parent is expected to be transmitted with a probablility of 0.5 (i.e., it is expected to occur on average in 3 of the 6 offspring), whereas a somatic mutation is not expected to be transmitted to any offspring.
4 Link (A) Relationship between the number of DNMs and the size of the chromosomes. (B) Physical chromosomal positions, and (C) relative chromosomal positions, of the DNMs. (D) Mutation types for DNMs on CpG sites (red) or non-CpG sites (grey) show normalised frequencies with the abundance of each trinucleotide type (3-mer) in the genome. Note that 3-mers without DNMs are not shown.
5 Link Parent-of-origin for the DNMs.
6 Link Demographic history of the great reed warbler of western (Swedish, blue) and eastern (Turkish, red) populations between 1 × 10^4 and 1 × 10^6 years BP, showing that both populations experienced a drastic reduction in Ne from 400,000 years before present (BP), followed by a steady period with similar Ne for the populations from 200,000 years BP, over the start of the LGP (approximately 100,000 years BP), to approximately 50,000 years BP. After approximately 50,000 years BP, the Ne trajectories of the two populations start to deviate with the eastern population showing a steady increase, while the western population remained at lower levels. Around 20,000 years BP, the Ne trajectory of the western population showed an increase in size and then a decrease after approximately 15,000 years BP. (A,B) Thick lines represent MSMC run on 4-individual set-up for each population, while (A) thin lines with symbols represent the runs for each individual, and (B) thin lines without symbols represent the runs of 100 bootstrapped datasets. All Ne trajectories were scaled to real time using a generation time of 2 years and the corrected autosomal mutation rate estimate from this study (7.16 × 10^-9 mutations per site per generation). Shaded area indicates the LGP.
S1 Link IGV screen shot showing 50 bp surrounding the DNM at 55937724 bp of Contig 1 in offspring 253. The DNM can be traced back to H5-17 with another upstream SNP.
S2 Link IGV screen shot showing 50 bp surrounding the DNM at 36535859 bp of Contig 0 in offspring 255. No additional SNP can facilitate infering the parent-of-origin.

Paper III

Paper III has been deposited on bioRxiv. Link to the paper: https://doi.org/10.1101/2022.12.21.521365

Please cite:

Zhang, H. and Hansson, B., 2022. RecView: an interactive R application for viewing and locating recombination positions using pedigree data. bioRxiv, pp.2022-12.

Figure Link Description
1 Link The pedigree dataset required for analysing recombination locations with RecView. Grandparents are labelled A, B, C and D, and parents AB and CD. The analysis is conducted independently for each offspring.
2 Link The paternal grandparent-of-origin inference for 200 SNPs. These data are hypothetical but were selected to indicate how, e.g., sequencing errors may affect patterns. Data points are assigned with noise on the y-axis to avoid overlapping.
3 Link Illustrative demonstration of the proportional difference algorithm to automatically locate recombination positions. (I) The absolute difference of the proportion of the grandpaternal allele A for downstream (S1) and upstream (S2) windows (
4 Link Illustrative demonstration of the cumulative continuity score (CCS) algorithm to automatically locate putative recombination positions. The CCS is reset to zero whenever the next grandparent-of-origin (GoO) inference is different. Positive and negative CCSs indicate the continuity of GoO A and B, respectively. With a threshold of CCS = 30, there is one putative recombination position, between positions 100 and 101.
5 Link The workflow of using RecView. Solid lines indicate the basic workflow while dashed lines indicate the optional workflow. RecView requires an input genotype file which can be generated by using make_012gt() on the output file from VCFtools, or using make_012gt_from_vcf() on the VCF file. RecView further requires an input scaffold file containing the order and orientation of the scaffolds. These two input files are used together with the built-in dictionary of grandparent-of-origin (GoO) to produce the GoO figure showing the GoO inferences of alleles along the scaffolds, and a figure showing the informative alleles density. RecView can further locate putative recombination positions with the proportional difference or cumulative continuity score algorithms and output result figures and tables. The result figures and tables can be saved, including an intermediate table containing the GoO inferences at each SNP.
6 Link The GUI of RecView with the setting panel (red square) for uploading input files (yellow square) and setting options and the output panel (green square) where results can be accessed by selecting different tabs (orange sqaure).
7 Link The grandparental-of-origin of informative alleles at all SNPs along chromosome 1 in great reed warbler offspring ID-256 for (a) the full dataset and (b) the downsampled dataset. Each dot represents an allele at a specific SNP for the paternal or maternal chromosomes. Dots are plotted with noise on the y-axis to alleviate the degree of overlap. Colouration indicates different scaffolds on chromosome 1 in the great reed warbler genome assembly (Sigeman et al., 2021).
8 Link Visualization of the result from the proportional difference algorithm to locate putative recombination positions. Shown are the absolute difference of the proportion of the grandpaternal allele A, and C for downstream (S1) and upstream (S2) windows along the Chromosome 1 in offspring ID-256 in the full dataset (a) and downsampled dataset (b).
9 Link The local density of informative SNPs along the Chromosome 1 in offspring ID-256 in the full dataset (a) and downsampled dataset (b). The red diamonds show the putative recombination positions from the proportional difference algorithm.
10 Link Visualization of the result from the cumulative continuity score (CCS) algorithm to locate putative recombination positions. Shown are the CCS for the maternal and maternal chromosomes along the Chromosome 1 in offspring ID-256 in the full dataset (a) and the downsampled dataset (b). Increasing slopes indicate continuous alleles with inferred GoO from the paternal grandfather (upper panel) or maternal grandfather (lower panel), while decreasing slopes indicate the origin of the paternal grandmother (upper panel) or maternal grandmother (lower panel).
11 Link The local density of informative SNPs along chromosome 1 in offspring ID-256 in the full dataset (a) and downsampled dataset (b). The red diamonds show the putative recombination positions from the cumulative continuity score algorithm.

Paper IV

Figure Link Description
1 Link The pedigree of great reed warblers with their project-specific individual naming based on aluminium and colour leg-rings for the grandparents (F0) and parents (F1), and the last three digits of their aluminium ring number for the offspring.
2 Link (A) The distribution of crossovers (COs) of each segregating autosomal arm. The total number of segregating autosomal arms evaluated was 408 (34 paternal and 34 maternal chromosome arms within each of six offspring). (B-D) The distribution of total number of COs in the pedigree of each autosomal arm (n = 34) considering COs of both paternal and maternal origin (B), only paternal origin (C) and only maternal origin (D).
3 Link The association between the autosomal CO events and gene features. (A) Number of COs in intergenic and genic regions, in exons and introns, and in UTR and CDS (green bars). The expected numbers based on the size of these gene features within 6 Mb of the end of the chromosome arms are also given (grey bars). (B) The distribution of the distance to the closest gene for autosomal CO events.
4 Link (A) The distribution of male, female and sex-averaged recombination rates on autosomal arms. (B) The association between male (green rectangles), female (yellow circles) or the sex-averaged (grey diamonds) recombination rates and the size of autosomal arms.
5 Link The bias of COs towards the telomeric ends of chromosomes. (A) The location of CO positions of paternal and maternal origins on each chromosome arm. (B) The physical distance of CO positions from the telomeric end of autosomal arms. (C) The proportional distance of CO positions from the telomeric end of autosomal arms. Grey bars indicate the sizes of the autosomal arms. The colouration and shape of points indicate paternal (green squares) and maternal (yellow circles) CO events.

Paper V

Figure Link Description
1 Link Distribution of allele balance (i.e., alternative vs. reference allele read count ratio in heterozygous samples; A, B), log2 fold female-to-male coverage diferrence (C, D), and proportion of properly paired reads supporting the alternate allele (E, F) for the 0.1% SNPs with the highest FST value (A, C, E) and the remaining 99.9% SNPs (B, D, F).
2 Link Venn diagram of the number and percentage of top 0.1% SNPs with highest FST retained after applying the filters of ≥0.35 allele balance , and/or ≤0.25 log fold of absolute difference in female-to-male coverage, and/or ≥0.5 proportion of properly paired reads supporting the alternate allele. After applying all three filters, 11,353 (74.4%) of the original 15,967 SNPs were retained.
3 Link (A) Percentage of query sequences with matched regions on at least one W-linked scaffold, and (B) the distribution of query sequences with matched regions on different numbers of W-linked scaffolds.
4 Link Linkage disequilibrium (r2) between pairs of SNPs within 100 kb. The bottom scale at the x-axis (black) shows the physical distance (bp) between the SNPs, and the top scale (red) the distribution of the data points.
5 Link FST of single SNPs (A) and mean FST in 10 kb window (B) along the genome. The solid red line delimits the top 10, and the dashed red line the top 100, highest FST values. “Unplaced” denotes the scaffolds that were not assigned to chromosomes.
6 Link Significance value of each SNP from a GWAS on sex plotted on their chromosomal location. The solid and dashed red lines delimit the top 10 and 100 most significant SNPs, respectively. “Unplaced” denotes the scaffolds that were not assigned to chromosomes.
7 Link Comparison of the top 100 SNPs from the FST analysis and the GWAS with sex, with the SNPs exclusively identified in the FST-based approach (yellow), exclusively identified in the GWAS approach (brown), the SNPs identified by both approaches (red).
8 Link (A) Corrected wing lengths in 51 female (yellow) and 49 male (green) great reed warblers. Wing length was corrected for the age of the bird and the ringer who measured the wing length. Dashed lines represent mean corrected wing lengths in each sex. (B) Wing length before and after correction in females (yellow)and males (green). Vertical and horizontal dashed lines represent mean wing lengths before and after correction, respectively, in females (yellow) and males (green).
9 Link Significance value of each SNP from a GWAS on corrected wing length along their chromosomes. The solid red line delimits the top 10, and the dashed red line the top 100, most significant SNPs. “Unplaced” denotes the scaffolds that were not assigned to chromosomes.
10 Link Corrected wing length (median) of individuals being homozygous for the reference allele (Hom.REF), heterozygous (Het), or homozygous for the alternate allele (Hom.ALT), for the top 10 most significant SNPs (A), the top 100 most significant SNPs (B), and all SNPs (C).
S1 Link Q-Q plot for GWAS on the sex.
S2 Link Q-Q plot for GWAS on the corrected wing lengths.

About

Catalogues of figures and supplementary of my PhD thesis

Resources

License

Stars

Watchers

Forks