Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy_tree() can't scale to my data size but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy ? #46

Open
JunhaoWang opened this issue Jul 23, 2020 · 2 comments
Labels
question Further information is requested

Comments

@JunhaoWang
Copy link

policy_tree() can't scale to my data size (100000 obs, 200 dimensional state/covariate, 20 actions) but multi_causal_forest() can scale, can I just use argmax of multi-action treatment effect estimation as a good policy, instead of searching exhaustively through tree functions from state to produce actions?

@erikcs
Copy link
Member

erikcs commented Jul 24, 2020

There is a note on scaling in the online documentation here:

https://grf-labs.github.io/policytree/articles/policytree.html#gauging-the-runtime-of-tree-search

As you see the cardinality of the the Xj's is important, and you can speed things up by trying to increase split.step (in effect rounding the Xj's).

But n=100k and p=200 will not take an agreeable amount of time. You can try to reduce the dimensionality by only using say the 20 variables with the highest split frequencies across the 20 causal forests.

The argmax policy is discussed in section 5.1 (California Gain example) in https://arxiv.org/pdf/1702.02896.pdf (referred to as the plug-in policy) and may be fine, depending on your purpose (interpretable predictions or not).

@erikcs erikcs added the question Further information is requested label Jul 24, 2020
@erikcs
Copy link
Member

erikcs commented Sep 3, 2020

For practical reference, here is a short table of empirical run times for policy_tree (version 1.0).

depth n (continuous) features actions split.step time
2 1000 30 20 1 1.5 min
2 1000 30 20 10 7 sec
2 10 000 30 20 1 3 hrs
2 10 000 30 20 10 14 min
2 10 000 30 20 1, but round(X, 2) 8 min
2 100 000 30 20 10 50 hrs
2 100 000 30 20 1, but round(X, 2) 6.3 hrs
2 100 000 60 20 1, but round(X, 2) 25 hrs
2 100 000 30 3 10 7.4 hrs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants