Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindAllMarkers and FindMarkers pulls fold-change data from wrong slot #6976

Closed
jbeanphd opened this issue Feb 23, 2023 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@jbeanphd
Copy link

'FindAllMarkers' and 'FindMarkers' appear to be pulling fold change data from the wrong slot.

# insert reproducible example here
> library(dplyr)

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

> library(Seurat)
> pbmc_small <- pbmc_small
> View(pbmc_small)
> pbmc_small <- SCTransform(pbmc_small)
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 220 by 80
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 220 genes, 80 cells
  |======================================================================================================================================================| 100%
Second step: Get residuals using fitted parameters for 220 genes
  |======================================================================================================================================================| 100%
Computing corrected count matrix for 220 genes
  |======================================================================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 0.5991514 secs
Determine variable features
Place corrected count matrix in counts slot
Centering data matrix
  |======================================================================================================================================================| 100%
Set default assay to SCT
> pbmc_small <- FindVariableFeatures(pbmc_small)
> pbmc_small <- RunPCA(pbmc_small, features = pbmc_small@assays$SCT@var.features)
Warning in irlba(A = t(x = object), nv = npcs, ...) :
  You're computing too large a percentage of total singular values, use a standard svd instead.
PC_ 1 
Positive:  NKG7, CCL5, CST7, GZMA, CTSW, PRF1, GNLY, IL32, FGFBP2, GZMM 
	   CD7, CD3D, LCK, GZMB, RARRES3, CD247, XBP1, LAMP1, CD3E, GZMH 
	   SPON2, KLRD1, ALOX5AP, CCL4, EIF4A2, C12orf75, GYPC, KLRG1, LYAR, AKR1C3 
Negative:  CST3, LYZ, LST1, S100A9, IFITM3, TYMP, HLA-DRA, AIF1, HLA-DPA1, SERPINA1 
	   S100A8, HLA-DPB1, CFD, FCN1, GRN, HLA-DMA, HLA-DRB5, CTSS, LGALS2, HLA-DRB1 
	   IFI30, LGALS3, CFP, FCGRT, BID, HLA-DQB1, S100A11, CD68, HLA-DQA1, SAT1 
PC_ 2 
Positive:  PPBP, SDPR, CLU, GNG11, PF4, GPX1, SPARC, CD9, HIST1H2AC, GP9 
	   TUBB1, NRGN, NCOA4, TPM4, CA2, NGFRAP1, ITGA2B, TREML1, TMEM40, PGRMC1 
	   RUFY1, ODC1, CCL5, MYL9, FERMT3, PTCRA, ACRBP, PARVB, TSC22D1, TALDO1 
Negative:  NKG7, IFITM2, CST7, GZMA, FCGR3A, IL32, CTSW, EIF4A2, SSR2, FGFBP2 
	   CD7, GNLY, CD3D, TYROBP, SRSF7, GZMM, PRF1, HSP90AA1, LCK, LGALS1 
	   CXCR4, HNRNPA3, XBP1, GZMB, LTB, S100A11, FCER1G, HNRNPF, CD247, YWHAB 
PC_ 3 
Positive:  HLA-DRA, HLA-DQB1, LTB, HLA-DRB1, HLA-DPB1, MS4A1, CD79A, HLA-DPA1, HLA-DQA1, HLA-DRB5 
	   HLA-DMB, CD79B, CXCR4, TCL1A, SP100, LY86, NT5C, SNHG7, HLA-DQA2, HLA-DMA 
	   HVCN1, TMEM123, FCER2, LINC00926, PPAPDC1B, FCRLA, BANK1, EAF2, TNFAIP8, RP11-693J15.5 
Negative:  TYROBP, FCER1G, FCGR3A, LGALS1, LST1, CFD, CD68, AIF1, S100A9, IFITM3 
	   S100A8, SERPINA1, FCN1, GZMB, CST3, SAT1, RHOC, PRF1, S100A11, GNLY 
	   TYMP, PSAP, CARD16, FGFBP2, CTSS, GZMA, NKG7, RGS2, SMCO4, LILRA3 
PC_ 4 
Positive:  FGFBP2, GZMB, GNLY, PRF1, NKG7, CST7, HLA-DPB1, HLA-DRB1, HLA-DPA1, CTSW 
	   HLA-DQB1, HLA-DRA, HLA-DQA1, CCL4, GZMA, AKR1C3, GZMH, HLA-DMA, IGFBP7, TTC38 
	   HLA-DMB, KLRD1, HLA-DRB5, C12orf75, SPON2, IL2RB, PCMT1, CLIC3, ALOX5AP, GSTP1 
Negative:  CD3D, LTB, LDHB, IL7R, CD3E, TMEM123, EIF4A2, NOSIP, THYN1, IFITM2 
	   TAF7, TAGAP, COTL1, MPHOSPH6, IL32, ACAP1, SRSF7, PIK3IP1, CD2, MAL 
	   DNAJB1, EPC1, SAFB2, AIF1, GYPC, CXCR4, SAT1, ASNSD1, CTSS, CCR7 
PC_ 5 
Positive:  LYZ, LGALS2, S100A9, FCER1A, CLEC10A, GSTP1, MS4A6A, S100A8, HLA-DMA, IL1B 
	   RGS1, LGALS3, CD14, ASGR1, CD3D, GRN, FCGRT, LDHB, TYMP, CST3 
	   SRSF7, GPX1, CFP, IL32, IL7R, CD1C, MPEG1, FUOM, RNF130, LCK 
Negative:  FCGR3A, CD79B, RHOC, RP11-290F20.3, TNFRSF1B, MS4A1, HVCN1, LILRA3, SAT1, AIF1 
	   IFITM2, MS4A7, CD79A, SERPINA1, CD68, LST1, CTSS, FCER1G, FAM96A, TCL1A 
	   IFITM3, ADAR, EPC1, WARS, C5AR1, RGS2, HCK, LTB, PSAP, LINC00926 
> pbmc_small <- FindNeighbors(pbmc_small, dims = 1:30)
Computing nearest neighbor graph
Computing SNN
> pbmc_small <- FindClusters(pbmc_small)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 80
Number of edges: 1876

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.5397
Number of communities: 4
Elapsed time: 0 seconds
> FindAllMarkers(pbmc_small, pseudocount.use = 0, assay = 'SCT', slot = 'data') |> head()
Calculating cluster 0
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 1
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 2
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 3
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
            p_val avg_log2FC pct.1 pct.2    p_val_adj cluster gene
LCK  1.652211e-11   5.946419 0.733  0.02 3.634864e-09       0  LCK
CD3D 5.627207e-11   7.451211 0.700  0.02 1.237985e-08       0 CD3D
CD7  6.484134e-11   6.380822 0.700  0.02 1.426510e-08       0  CD7
CST7 7.186854e-10   5.886713 0.667  0.04 1.581108e-07       0 CST7
IL32 7.878490e-10   6.544321 0.667  0.04 1.733268e-07       0 IL32
GZMM 9.576371e-10        Inf 0.600  0.00 2.106802e-07       0 GZMM


# 'FoldChange' gives a different result than is reported by 'FindAllMarkers' when both use 'assay = "SCT"' and 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = c(1:3), features = 'LCK', assay = 'SCT', slot = 'data')
    avg_log2FC pct.1 pct.2
LCK   5.627499 0.733  0.02


#specifying 'slot = "counts"' returns the same fold change reported by 'FindAllMarkers' that was set to 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = c(1:3), features = 'LCK', assay = 'SCT', slot = 'counts')
    avg_log2FC pct.1 pct.2
LCK   5.946419 0.733  0.02


#the same is true for 'FindMarkers'

> FindMarkers(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'data')
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
           p_val avg_log2FC pct.1 pct.2    p_val_adj
LCK 3.088212e-07   5.003002 0.733 0.038 6.794065e-05


# 'FoldChange' gives a different result than is reported by  'FindMarkers'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'data')
    avg_log2FC pct.1 pct.2
LCK   4.684082 0.733 0.038


#again specifying 'slot = "counts"' returns the same result that 'FindMarkers' reports when 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'counts')
    avg_log2FC pct.1 pct.2
LCK   5.003002 0.733 0.038


#pulling the values out manually reveals 'FoldChange' is reporting the correct values while 'FindAllMarkers' and 'FindMarkers'
#are pulling fold change data from the wrong slot.
 
> pbmc_small@assays$SCT@data['LCK', (row.names(filter([email protected], seurat_clusters == 0)))] |> mean()
[1] 0.6853326
> pbmc_small@assays$SCT@data['LCK', (row.names(filter([email protected], seurat_clusters == 1)))] |> mean()
[1] 0.02665951
> pbmc_small@assays$SCT@data['LCK', (row.names(filter([email protected], seurat_clusters != 0)))] |> mean()
[1] 0.01386294
> log2(0.6853326/0.02665951)
[1] 4.684082
> log2(0.6853326/0.01386294)
[1] 5.627499
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter([email protected], seurat_clusters == 0)))] |> mean()
[1] 1.233333
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter([email protected], seurat_clusters == 1)))] |> mean()
[1] 0.03846154
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter([email protected], seurat_clusters != 0)))] |> mean()
[1] 0.02
> log2(1.233333/0.03846154)
[1] 5.003002
> log2(1.233333/0.02)
[1] 5.946419



> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_4.3.0       dplyr_1.1.0        SeuratObject_4.1.3 sp_1.6-0          

loaded via a namespace (and not attached):
  [1] nlme_3.1-162           matrixStats_0.63.0     spatstat.sparse_3.0-0  RcppAnnoy_0.0.20       RColorBrewer_1.1-3     httr_1.4.4            
  [7] backports_1.4.1        sctransform_0.3.5      tools_4.2.2            utf8_1.2.3             R6_2.5.1               irlba_2.3.5.1         
 [13] KernSmooth_2.23-20     uwot_0.1.14            lazyeval_0.2.2         colorspace_2.1-0       withr_2.5.0            tidyselect_1.2.0      
 [19] gridExtra_2.3          compiler_4.2.2         progressr_0.13.0       cli_3.6.0              spatstat.explore_3.0-6 plotly_4.10.1         
 [25] scales_1.2.1           lmtest_0.9-40          spatstat.data_3.0-0    ggridges_0.5.4         pbapply_1.7-0          readODS_1.8.0         
 [31] goftest_1.2-3          stringr_1.5.0          digest_0.6.30          spatstat.utils_3.0-1   pkgconfig_2.0.3        htmltools_0.5.4       
 [37] parallelly_1.34.0      limma_3.52.4           fastmap_1.1.0          htmlwidgets_1.6.1      rlang_1.0.6            rstudioapi_0.14       
 [43] shiny_1.7.4            generics_0.1.3         zoo_1.8-11             jsonlite_1.8.3         spatstat.random_3.1-3  ica_1.0-3             
 [49] car_3.1-1              magrittr_2.0.3         patchwork_1.1.2        Matrix_1.5-3           Rcpp_1.0.10            munsell_0.5.0         
 [55] fansi_1.0.4            abind_1.4-5            reticulate_1.28        lifecycle_1.0.3        stringi_1.7.12         carData_3.0-5         
 [61] MASS_7.3-58.2          Rtsne_0.16             plyr_1.8.8             grid_4.2.2             parallel_4.2.2         listenv_0.8.0         
 [67] promises_1.2.0.1       ggrepel_0.9.3          deldir_1.0-6           miniUI_0.1.1.1         lattice_0.20-45        cowplot_1.1.1         
 [73] splines_4.2.2          tensor_1.5             pillar_1.8.1           igraph_1.3.5           spatstat.geom_3.0-6    future.apply_1.10.0   
 [79] reshape2_1.4.4         codetools_0.2-19       leiden_0.4.3           gprofiler2_0.2.1       glue_1.6.2             data.table_1.14.6     
 [85] png_0.1-8              vctrs_0.5.2            httpuv_1.6.9           polyclip_1.10-4        gtable_0.3.1           RANN_2.6.1            
 [91] purrr_1.0.1            tidyr_1.3.0            scattermore_0.8        future_1.31.0          ggplot2_3.4.1          mime_0.12             
 [97] broom_1.0.3            xtable_1.8-4           rstatix_0.7.2          later_1.3.0            survival_3.5-3         viridisLite_0.4.1     
[103] tibble_3.1.8           cluster_2.1.4          globals_0.16.2         fitdistrplus_1.1-8     ellipsis_0.3.2         ROCR_1.0-11  
@saketkc
Copy link
Collaborator

saketkc commented Jul 7, 2023

Thanks for the bug report and reproducible example @jbeanphd! This is now fixed in the develop branc e08e108

You can check out the latest updates by installing the develop branch:

remotes::install_github("satijalab/seurat", ref="develop")

If you are still facing issues, please open a new issue.

@saketkc saketkc closed this as completed Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants