Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data leakage problem in your model #9

Open
hurleyLi opened this issue Aug 17, 2019 · 1 comment
Open

data leakage problem in your model #9

hurleyLi opened this issue Aug 17, 2019 · 1 comment

Comments

@hurleyLi
Copy link

hurleyLi commented Aug 17, 2019

The design of your adjacency matrix adj_mats_orig and the way you split the train/test set will cause a huge data leakage problem in your training, because your validation and test set is created independently for gene_adj and gene_adj.transpose(copy=True), and therefore the edges from the validation / test set in gene_adj is actually included in the training set of gene_adj.transpose(copy=True).

Same problem goes for the train / validate set between gene_drug_adj and drug_gene_adj. The validation edges from gene_drug_adj are actually used for training in drug_gene_adj, and vise versa.

Could you please clarify?
Thanks!

Originally posted by @hurleyLi in #7 (comment)

@Fakak
Copy link

Fakak commented Jan 12, 2021

Hello @hurleyLi , I have the same problem as you at first, but now I think this is not a big problem because what we want to predict is between drug nodes, which means p-p and p-d edge doesn't matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants