Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using counts matrix with known labels as reference #35

Open
wmacnair opened this issue Jun 30, 2022 · 1 comment
Open

Using counts matrix with known labels as reference #35

wmacnair opened this issue Jun 30, 2022 · 1 comment

Comments

@wmacnair
Copy link

wmacnair commented Jun 30, 2022

Hi

I would love to be able to try out Symphony for label-transfer, but with a set of labels defined by an alternative method to Harmony. So for example, starting from an sce object where the colData includes columns like sample_id and cluster. It feels like this could be quite a common use-case, for example where a dataset is published just with the counts matrix and a list of cluster annotations.

Is this currently possible? My guess is that it could theoretically work, with some tweaking to the code. You would need to implement a new function called something like createHarmonyObjWithDefinedClusters. This would calculate all the variable genes + PCA loadings + etc, but wouldn't do the clustering step. It would then use the pre-defined clusters to estimate the mixture model components. The Symphony steps would then work as normal.

Perhaps you could comment on whether this would actually work, or whether I've misunderstood something fundamental...! ;)

I've seen a couple of other issues that are related (#9, #15, #17, #32, I think), so it feels like a generic solution could be valuable.

Thanks for your efforts :)
Will

@joycekang
Copy link
Collaborator

Hi Will,

Apologies in getting back to you!

Symphony does not use the cluster labels themselves when doing the mapping (since it maps cells into continuous PC space). So you can use any predefined labels to do the label transfer step once you've mapped the query into reference coordinates. I would make sure that the predefined labels still visually separate well in the Harmony embedding (suggesting that the embedding captures the transcriptional variation that the original study captured when assigning labels).

One common misconception is that the Harmony/Symphony soft clusters may not map 1:1 to the clusters that are defined by a graph-based clustering algorithm (e.g. Leiden or Louvain are commonly used). In a typical Harmony workflow, the soft clusters are used in the mixture modeling step for integration only, and once the harmonized embedding is defined then graph-based clustering can be run separately.

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants