Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VlnPlot removes violins below the threshold from the graphical output #5756

Closed
vkavaka opened this issue Mar 18, 2022 · 10 comments
Closed

VlnPlot removes violins below the threshold from the graphical output #5756

vkavaka opened this issue Mar 18, 2022 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@vkavaka
Copy link

vkavaka commented Mar 18, 2022

Dear Seurat team,

by exploring some genes that look quite specific on the VlnPlots we noticed, that by looking through ridges in some cases the violins are deleted in the graphical output if they are below the threshold. Here is the example of the Violin with standard VlnPlot function:
Screenshot 2022-03-18 at 11 52 41
Here is the output by plotting the same gene with ggplot2 geometrical violins. As you see, the violins in groups 1 and 4 look the same, but 2 and 3 appear.
Screenshot 2022-03-18 at 11 52 06
Why does the VlnPlot cutoff the 2 groups in the middle? What do you think about this possible misleading in the visualization?

@vkavaka vkavaka added the bug Something isn't working label Mar 18, 2022
@yuhanH
Copy link
Collaborator

yuhanH commented Mar 18, 2022

Hi @vkavaka
Could you post a reproducible example for this VlnPlot issue? You may use pbmc_smallor any dataset in SeuratData or any public data. Thanks.

@yuhanH yuhanH self-assigned this Mar 18, 2022
@vkavaka
Copy link
Author

vkavaka commented Mar 20, 2022

Dear @yuhanH, thank you for your prompt reply. We created the reproducible example using the pbmc3k dataset. Here is the code:

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
VlnPlot(pbmc, "NKG7", pt.size=0)
vln_df = data.frame(NKG7 = pbmc[["RNA"]]@data["NKG7",], cluster = pbmc$seurat_clusters)
ggplot(vln_df, aes(x = cluster, y = NKG7)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")

Here is the Violin using the VlnPlot:
Screenshot 2022-03-20 at 11 43 28
Same with the ggplot2 (as you can see the violins below the cutoff start to appear):
Screenshot 2022-03-20 at 11 43 33

Session info:
R version 4.1.2 (2021-11-01) ggplot2_3.3.5 SeuratData_0.2.1 SeuratObject_4.0.4 Seurat_4.0.6

@vkavaka
Copy link
Author

vkavaka commented Mar 21, 2022

@yuhanH as a possible reason: we suggest it might be the noising build in the VlnPlot function leading to removing the violins in the graphical output. Would be very happy to read your opinion on that behalf

@vkavaka
Copy link
Author

vkavaka commented Mar 24, 2022

Dear @yuhanH , do you have any updates on that behalf? In our opinion, the issue is quite important and possibly leading to the misinterpretation of the "specific looking" results

@yuhanH
Copy link
Collaborator

yuhanH commented Mar 24, 2022

hi @vkavaka
Thanks for showing this reproducible example. I agree with you that the change of the violin plots is related to the noise.

vln_df = data.frame(NKG7 = pbmc[["RNA"]]@data["NKG7",], cluster = pbmc$seurat_clusters)
noise <- rnorm(n = length(x =vln_df$NKG7)) / 100000
vln_df$NKG7.noise <- vln_df$NKG7  + noise
ggplot(vln_df, aes(x = cluster, y = NKG7)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")  
ggplot(vln_df, aes(x = cluster, y = NKG7.noise)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")  

image

You can also see that the noise is very small and it mainly just introduce very small variation for the data.
image
Not sure why it effectively affects Violin shapes. It seems to be an issue related to geom_violin.
But I also agree that it may lead to the misinterpretation of the specific looking results. It suggests that you would better keep showing the data points in the violin plot.
image

@yuhanH yuhanH added the more-information-needed We need more information before this can be addressed label Mar 24, 2022
@vkavaka
Copy link
Author

vkavaka commented Mar 24, 2022

@yuhanH thank you for your reply and suggestion. Would you consider still keeping the noise in the vlnplot function? The only clusters that are affected seem to be the ones with the lower expression, the higher ones are completely unchanged.

Not very sure whether showing the cell points is the best way to overcome this bias, especially with a lot of cells in the object. We noticed, that after a certain point you cannot lower the size of the dots with pt.size argument of the VlnPlot. Any ideas on how to overcome this limitation and print the dots even smaller?

@no-response no-response bot removed the more-information-needed We need more information before this can be addressed label Mar 24, 2022
@yuhanH
Copy link
Collaborator

yuhanH commented Mar 24, 2022

Right. When the number of cells is big, you may consider changing the alpha value for the points.
For example:

p0 <- VlnPlot(pbmc, "NKG7")
p1 <- VlnPlot(pbmc, "NKG7")
p1$layers[[2]]$aes_params$alpha <- 0.1
p0+p1

We will add this alpha value parameter into VlnPlot soon.
image

@vkavaka
Copy link
Author

vkavaka commented Mar 24, 2022

@yuhanH thank you for the hint with the alpha values. And what do you think about the noise? I understand, that the developers wouldn't add it up if it would not be necessary. But as you can see in this example, it may affect the data visualization. Is there any explanation, why the noise should be kept and used further?

@yuhanH yuhanH added enhancement New feature or request and removed bug Something isn't working labels Apr 22, 2022
@yuhanH
Copy link
Collaborator

yuhanH commented Apr 22, 2022

Hi @vkavaka
The distribution of low expression values in the original data appears to be less fitting with the dots in the plot.
For now, we retain this noise. However, we remain open to reconsidering and possibly removing it if there are clear biases emerge as a consequence of this noise.

@yuhanH
Copy link
Collaborator

yuhanH commented Jul 6, 2023

hi

@yuhanH yuhanH closed this as completed Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants