Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidating AgentState metadata #814

Merged
merged 2 commits into from
Jul 7, 2022
Merged

Consolidating AgentState metadata #814

merged 2 commits into from
Jul 7, 2022

Conversation

JackUrb
Copy link
Contributor

@JackUrb JackUrb commented Jul 6, 2022

Overview

To unblock @Etesam913 on #791, this PR standardizes a flow for reading and writing metadata from the top-level AgentState class

After much (internal) deliberation on what should go in the main AgentState class and what should still be fragmented out into the subclasses, I decided that only metadata would be considered a first-class feature of AgentState's (rather than all data types) but that the save+load flow would be standardized and updated towards the direction of #567. Happy to revisit this in discussions, but I couldn't find any other clean way to allow new agent types to be flexible with how they handle their data.

Implementation

New functionality:

  • Some AgentState abstract methods have been moved behind wrappers such that I can put the save+load lifecycle functions inside
  • AgentState.save_metadata and AgentState.load_metadata have been created as ways to handle IO for the metadata, while direct edits can be made to the AgentState.metadata object (a dataclass of type _AgentStateMetadata).

Towards future design, this allows the update_metadata method to exist inside of the base AgentState class rather than needing implementations throughout the similar classes.

@JackUrb JackUrb requested a review from pringshia July 6, 2022 22:12
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 6, 2022
@JackUrb JackUrb requested a review from Etesam913 July 6, 2022 22:13
Copy link
Contributor

@pringshia pringshia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together!

Have a few minor questions but overall looks good.

# Backwards compatibility for times
if "start_time" in state:
self.metadata.task_start = state["start_time"]
self.metadata.task_end = state["end_time"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this migration consideration

Class to track the first-class feature fields of info about an AgentState.

AgentState subclasses may choose to track additional metadata, but should
attach to the agent state subclass directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by "attach" here? is that some Python specific terminology?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I just mean use their own attributes, this is leftover commentary from when I intended to standardize more. I'll update


task_start: Optional[float] = None
task_end: Optional[float] = None
# TODO other metadata fields can be initialized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining that adding the tips plugin would be as simple as doing yarn add mephisto-worker-addons and using it without having to modify the back-end code much. Would this require end users to update the python code to use different subclasses for AgentStateMetadata?

Also what if they wanted to compose different plugins? For example, let's say in the future we introduce a plugin for gamification or leaderboards. It would be nice if adding those plugins just worked with the metadata storage, without having to define and compose the dataclasses.

Not sure if this is the right solution, but just wanted to articulate the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern, but I'm wary of projecting too far here. Additional plugins could piggyback on metadata by adding directly to this data class on functionality we feel is standard or general enough. Anything additional or in development should probably just store on an AgentState+Blueprint if the functionality is too complex to simply be in this metadata dataclass.

if "times" in self.state:
assert isinstance(self.state["times"], dict)
self.metadata.task_start = self.state["times"]["task_start"]
self.metadata.task_end = self.state["times"]["task_end"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm happy keeping the times within the "times" top level field, as library managed metadata... and the new metadata field for plugins and extensions to use. it might make some of the compatibility stuff here easier to manage though i see you've already handled it. I don't have a strong opinion on this though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of standardization here for these simple fields is overall worth it. I likely should have done this before.

Copy link
Contributor

@Etesam913 Etesam913 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I don't see any problems and it ran as expected.

One thing that would make sense to think about (I could do this when I merge into add-tips-example):
There isn't a way to update the metadata generally. As of now this isn't a problem because the metadata only has a task_start and task_end field. There is no reason to update these outside of init and submit.

However, if there is a feature like Tips or something else, it might not make too much sense to create an update method for each new field in metadata.

Something like below can be created in agent_state.py:

def update_metadata(self, property_name: str, property_value):
    if property_name not in self.metadata:
        """
        put some message here to tell user that the metadata field does not exist 
        (could help user know that they have to add a field to the _AgentStateMetadata) 
        """
        return

    else:
        self.metadata[property_name] = property_value

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 7, 2022

@Etesam913 noted in the PR description:

Towards future design, this allows the update_metadata method to exist inside of the base AgentState class rather than needing implementations throughout the similar classes.

Definitely would make sense to create an update_metadata method that is able to handle more complex alterations when provided with a metadata update packet's data.

Rebasing #791 on this should let you clean up the code a bit while keeping relatively the same design.

@pringshia
Copy link
Contributor

pringshia commented Jul 7, 2022

There isn't a way to update the metadata generally

Agreed on the general edit part, that's sort of what I was sort of getting at with my comment here.

Also, given (from the OP):

while direct edits can be made to the AgentState.metadata object

I think the idea is that edits are to be made directly to the metadata object and then persisted via a call to AgentState.save_metadata?

@JackUrb
Copy link
Contributor Author

JackUrb commented Jul 7, 2022

Correct - the AgentState class semantics rely on explicit save_data and load_data calls for persisting vs overwriting from disk, and so I had the metadata attribute follow these semantics as well.

I only didn't implement update_metadata as I was leaving it to be introduced in #791

@codecov-commenter
Copy link

Codecov Report

Merging #814 (8b22ec1) into main (c776529) will increase coverage by 0.55%.
The diff coverage is 65.92%.

@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
+ Coverage   64.37%   64.92%   +0.55%     
==========================================
  Files         107      107              
  Lines        9178     9195      +17     
==========================================
+ Hits         5908     5970      +62     
+ Misses       3270     3225      -45     
Impacted Files Coverage Δ
mephisto/data_model/agent.py 80.25% <0.00%> (+4.45%) ⬆️
...s/remote_procedure/remote_procedure_agent_state.py 41.53% <28.57%> (+5.39%) ⬆️
...eprints/abstract/static_task/static_agent_state.py 37.20% <38.46%> (+4.85%) ⬆️
.../blueprints/parlai_chat/parlai_chat_agent_state.py 30.35% <41.66%> (+2.18%) ⬆️
mephisto/abstractions/databases/local_database.py 89.83% <60.00%> (-1.67%) ⬇️
mephisto/abstractions/database.py 86.18% <68.75%> (-0.81%) ⬇️
...ephisto/abstractions/_subcomponents/agent_state.py 84.11% <89.36%> (+6.17%) ⬆️
...o/abstractions/blueprints/mock/mock_agent_state.py 92.30% <100.00%> (+8.43%) ⬆️
mephisto/operations/client_io_handler.py 85.47% <100.00%> (ø)
mephisto/data_model/unit.py 77.59% <0.00%> (-0.55%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c776529...8b22ec1. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants