Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING CHANGES] Argument Configuration with Hydra and Dataclasses #246

Merged
merged 32 commits into from
Sep 16, 2020

Conversation

JackUrb
Copy link
Contributor

@JackUrb JackUrb commented Sep 14, 2020

Overview

This change is moving Mephisto over from using argparser (and related helpers) to configure its core operations to using Hydra. Hydra will allow for configurable, committable files that allow sharing exact experimental runs with ease. It'll also allow easier access to deep Mephisto configuration options directly from the command line, or through those configuration scripts.

There's no clean way to introduce these changes into Mephisto other than one breaking branch, and as such this PR contains all of the components for migrating Mephisto over. The commits in order roughly follow the Implementation checklist. Most are direct refactors that choose to retain existing behavior where possible, but introduce Hydra to facilitate a complete and total removal of ARG_STRING, Argparse, and all related helpers.

Implementation

  • Directly translates existing argparse options to dataclasses for use in hydra
  • Update all classes to reference their related argument-providing dataclasses
  • Update operator flow to work from a Hydra config, rather than the ARG_LIST
  • Update each of the examples in turn to have Hydra-based scripts rather than argparse based scripts.
  • Updates task READMEs to show the correct hydra usage.
  • Updates all testing to use Hydra code.
  • Delete all ARG_STRING, argparse, and related code (add_args_to_group, much of argparse_parser, etc).
  • Creates guide for transitioning from pre-hydra to post hydra

Testing

Tests:

pytest test
====================================================================================== test session starts ======================================================================================
platform darwin -- Python 3.8.3, pytest-5.3.2, py-1.8.1, pluggy-0.13.1
rootdir: /Users/jju/mephisto
plugins: hydra-core-1.0.0, requests-mock-1.7.0
collected 105 items                                                                                                                                                                             

test/test_mephisto.py .                                                                                                                                                                   [  0%]
test/core/test_database.py sssssssssssssssssssssss.......................                                                                                                                 [ 44%]
test/core/test_operator.py ....                                                                                                                                                           [ 48%]
test/core/test_supervisor.py ......                                                                                                                                                       [ 54%]
test/core/test_task_launcher.py ....                                                                                                                                                      [ 58%]
test/providers/mturk/test_mturk.py .                                                                                                                                                      [ 59%]
test/providers/mturk_sandbox/test_mturk_provider.py ssss.....                                                                                                                             [ 67%]
test/server/architects/test_heroku_architect.py sss...                                                                                                                                    [ 73%]
test/server/architects/test_local_architect.py sss...                                                                                                                                     [ 79%]
test/server/architects/test_mock_architect.py sss...                                                                                                                                      [ 84%]
test/server/blueprints/test_mock_blueprint.py ssssssss........                                                                                                                            [100%]

=============================================================== 61 passed, 44 skipped, 10 warnings in 134.81s (0:02:14) ===============================================================

mephisto register <> test

>>> mephisto register mock
dest        type    default         help                                        choices    required
----------  ------  --------------  ------------------------------------------  ---------  ----------
name        str     MOCK_REQUESTER  Name for the requester in the Mephisto DB.             True
force_fail  bool    False           Trigger a failed registration                          False
>>> mephisto register mock name=TEST_MOCK_2 force_fail=True
Forced failure test exception was set
>>> mephisto register mock name=TEST_MOCK_2
Registered successfully.
>>> mephisto requesters
  requester_id  provider_type    requester_name    registered
--------------  ---------------  ----------------  ------------
             1  mock             MOCK_REQUESTER    False
             2  mock             TEST_MOCK_2       False

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 14, 2020
@JackUrb JackUrb changed the title [WIP] [BREAKING CHANGES] Argument Configuration with Hydra and Dataclasses [BREAKING CHANGES] Argument Configuration with Hydra and Dataclasses Sep 15, 2020
@JackUrb JackUrb marked this pull request as ready for review September 15, 2020 15:00
Copy link
Contributor

@pringshia pringshia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait until this gets shipped!

Most of the comments here are mostly be just being annoying with some nits.

docs/quickstart.md Show resolved Hide resolved
mephisto/core/operator.py Outdated Show resolved Hide resolved
'Operator.parse_and_launch_run_wrapper has been deprecated in favor '
'of using Hydra for argument configuration. See the docs TODO in order '
'to upgrade.'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay for developer friendly UX

mephisto/data_model/blueprint.py Show resolved Hide resolved
mephisto/utils/scripts.py Outdated Show resolved Hide resolved
mephisto/utils/scripts.py Show resolved Hide resolved
mephisto/utils/scripts.py Outdated Show resolved Hide resolved
examples/parlai_chat_task_demo/parlai_test_script.py Outdated Show resolved Hide resolved
mephisto/core/hydra_config.py Outdated Show resolved Hide resolved
examples/static_react_task/run_task.py Outdated Show resolved Hide resolved
@JackUrb JackUrb merged commit b6161ae into master Sep 16, 2020
@JackUrb JackUrb deleted the dataclasses-hydra branch September 16, 2020 19:52
This was referenced Sep 16, 2020
@omry
Copy link

omry commented Sep 16, 2020

Awesome!

Copy link

@omry omry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.

Comment on lines +68 to +69
access-key-id=$ACCESS_KEY\
secret-access-key=$SECRET_KEY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can access environment variables from the config:

key: ${env:ACCESS_KEY}

See OmegaConf docs.

Comment on lines +57 to +61
- `mephisto.task.task_title` - The title of the task showed to workers in their queue
- `mephisto.task.task_description` - The short description for the task that may be showed inline while they're searching for tasks
- `mephisto.task.task_tags` - Any searchable tags that would make it easy to find your task
- `mephisto.task.task_reward` - The reward paid out in dollars for completing your task
- `mephisto.task.task_name` - This task name will be used to consolidate your data across runs. Generally we suggest using `descriptive-string-pilot-#` when piloting and `descriptive-task-string` for your main collection. More details on this are in the (TODO) Mephisto argument instructions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can generate the config node with --cfg job --package mephisto.
I often use those as documentation snippets, try it out to see the output.
Example here.

Unfortunately there is no support for help strings yet.

Copy link
Contributor Author

@JackUrb JackUrb Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've managed pulling out help strings from the dataclass metadata in #249. Might be worth considering pulling help from metadata in Hydra? (of course this relies on being able to get the dataclass itself rather than the config).

@@ -0,0 +1,13 @@
#@package _global_
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use #@ package mephisto here to drop one nesting level.

used to demonstrate the functionalities around using Mephisto
for ParlAI tasks."
task_reward: 0.3
task_tags: "dynamic,chat,testing"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use yaml lists:

task_tags: [dynamic,chat,testing]

or:

task_tags:
  - dynamic
  - chat
  - testing

Comment on lines +20 to +27
defaults = [
{"mephisto.blueprint": BLUEPRINT_TYPE},
{"mephisto.architect": 'local'},
{"mephisto.provider": 'mock'},
"conf/base",
{"conf": "example"},
]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a scriptconfig.yaml file in your search path you can put the defaults in it in yaml format (which is cleaner). it will get used even if your TestScriptConfig does not have the defaults declared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difficulty here is that BLUEPRINT_TYPE is only loaded up at runtime, as we import it from another class.

@dataclass
class TestScriptConfig(RunScriptConfig):
defaults: List[Any] = field(default_factory=lambda: defaults)
task_dir: str = TASK_DIRECTORY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using OmegaConf resolvers for things like that.

Copy link
Contributor Author

@JackUrb JackUrb Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered, but what is the use case for using a resolve being more advantageous? Sure, in both cases I can use ${task_dir:} but if I actually want to access cfg.task_dir I'll still need task_dir: ${task_dir:} somewhere right?

examples/static_react_task/README.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants