Adding tutorial stubs to docusaurus (#613)

* Adding doc stubs * Contribution guide * Adding documentation stubs for many sections * Some documentation * Working through first tutorial using defaults * Fixing up mephisto wut * Finishing first tutorial finally :) * Fixing after webpack updates, work on tutorial 2 * Double ii fix
facebookresearch · Dec 6, 2021 · 75ef231 · 75ef231
1 parent 1fb092f
commit 75ef231
Show file tree

Hide file tree

Showing 57 changed files with 4,446 additions and 8,464 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,13 +8,19 @@ tmp/*
 **/node_modules/*
 mephisto/server/**/package-lock.json
 mephisto/server/blueprints/**/build/*
-examples/**/build/*
+
 **/*.log
 **/build/*
 **/_generated/*
 **/outputs/*
 .coverage
 
+# Examples
+examples/simple_static_task/hydra_configs/conf/*
+!examples/simple_static_task/hydra_configs/conf/example.yaml
+!examples/simple_static_task/hydra_configs/conf/onboarding_example.yaml
+examples/**/build/*
+
 # PyCharm
 .idea
 

diff --git a/docs/web/docs/guides/how-tos/_category_.yml b/docs/web/docs/guides/how-tos/_category_.yml
diff --git a/docs/web/docs/guides/how_to_contribute/_category_.yml b/docs/web/docs/guides/how_to_contribute/_category_.yml
@@ -0,0 +1,3 @@
+label: "How to contribute"
+collapsed: true
+position: 4
diff --git a/docs/web/docs/guides/how_to_contribute/backend_development.md b/docs/web/docs/guides/how_to_contribute/backend_development.md
@@ -0,0 +1,22 @@
+---
+sidebar_position: 3
+---
+
+# Backend: dev setup
+
+We use [pre-commit](https://pre-commit.com/) to enforce code styles on the code base (using `black` for Python and `prettier` for Javascript).
+
+To setup your local codebase to auto-lint and avoid lint test failures for your PRs, please set up pre-commit for your local repo as such:
+
+1. `pip install pre-commit`
+2. `pre-commit install` to install git hooks
+3. `pre-commit run --all-files` (optional - run ad-hoc against all files)
+
+
+## Local development mode
+
+If you've installed Mephisto via `pip install mephisto` in the past, in order to get python to use your local version of the package, navigate to your `Mephisto` folder and run:
+```bash
+pip install -e .
+```
+This will ensure that your local changes are used in the running version of Mephisto
diff --git a/docs/web/docs/contributors/development.md → ...how_to_contribute/frontend_development.md b/docs/web/docs/contributors/development.md → ...how_to_contribute/frontend_development.md
@@ -1,4 +1,8 @@
-# Front-end: Dev setup
+---
+sidebar_position: 2
+---
+
+# Frontend: dev setup
 
 We use [pre-commit](https://pre-commit.com/) to enforce code styles on the code base (using `black` for Python and `prettier` for Javascript).
 

diff --git a/docs/web/docs/guides/how_to_contribute/getting_started.md b/docs/web/docs/guides/how_to_contribute/getting_started.md
@@ -0,0 +1,17 @@
+---
+sidebar_position: 1
+---
+
+# Contributing to Mephisto
+
+Mephisto is built for developing on, but developing for Mephisto can sometimes be unclear. We aim to provide some guides on contributions for abstractions, underlying infrastructure, or developer experience, but oftentimes the best resource will be opening an issue on our Github directly.
+
+## Understanding Mephisto
+
+One of the most important parts of contributing is understanding where the project stands and where we're going. Be sure to check out relevant pages on our architecture or coming plans to ramp into things. You can also check our references when trying to dive into specific components. 
+
+If documentation for a specific component is lacking here, you can take a step further to investigate the blame for the file - we strive to keep documentation for our decisions in all of our pull requests, and some insight is likely present within.
+
+## I want to help, but don't know where to start
+
+If you want to contribute with something, there are often github issues marked with [help wanted](https://github.com/facebookresearch/Mephisto/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22). We also note some as being [good first issues](https://github.com/facebookresearch/Mephisto/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and these often have in-depth descriptions on how to get started.
diff --git a/docs/web/docs/guides/how_to_use/_category_.yml b/docs/web/docs/guides/how_to_use/_category_.yml
@@ -0,0 +1,3 @@
+label: "In-depth use"
+collapsed: true
+position: 3
diff --git a/docs/web/docs/guides/how_to_use/efficiency_organization/_category_.yml b/docs/web/docs/guides/how_to_use/efficiency_organization/_category_.yml
@@ -0,0 +1,3 @@
+label: "Efficiency and Organization"
+collapsed: true
+position: 4
diff --git a/docs/web/docs/guides/how-tos/config_faq.md → ...use/efficiency_organization/config_faq.md b/docs/web/docs/guides/how-tos/config_faq.md → ...use/efficiency_organization/config_faq.md
diff --git a/docs/web/docs/guides/how-tos/docker.md → ..._to_use/efficiency_organization/docker.md b/docs/web/docs/guides/how-tos/docker.md → ..._to_use/efficiency_organization/docker.md
@@ -1,4 +1,7 @@
 # Using Docker
+
+Some users prefer to keep Mephisto entirely contained. Docker is one option for being able to do this.
+
 ```bash
 # Build the docker image and tag with name 'mephisto'
 $ docker build -t mephisto . 

diff --git a/docs/web/docs/guides/how_to_use/efficiency_organization/reusing_configs.md b/docs/web/docs/guides/how_to_use/efficiency_organization/reusing_configs.md
@@ -0,0 +1,7 @@
+---
+sidebar_position: 1
+---
+
+# Use the same configs across tasks
+
+TODO - talk about setting up hydra profiles
diff --git a/docs/web/docs/guides/how_to_use/efficiency_organization/task_organization.md b/docs/web/docs/guides/how_to_use/efficiency_organization/task_organization.md
@@ -0,0 +1,7 @@
+---
+sidebar_position: 2
+---
+
+# Organize tasks and qualifications
+
+TODO - extend on the workflow tutorial
diff --git a/docs/web/docs/guides/how_to_use/task_creation/_category_.yml b/docs/web/docs/guides/how_to_use/task_creation/_category_.yml
@@ -0,0 +1,3 @@
+label: "Creating a task"
+collapsed: false
+position: 2
diff --git a/...web/docs/guides/how-tos/front_end_faqs.md → ...use/task_creation/developing_frontends.md b/...web/docs/guides/how-tos/front_end_faqs.md → ...use/task_creation/developing_frontends.md
@@ -1,28 +1,19 @@
-# Front-end: FAQs
+---
+sidebar_position: 1
+---
 
-### How do I add UI error handling for my tasks?
+# Developing and debugging frontends
 
-Currently, we have beta functionality for error handling. We provide a few ways of getting a signal into how your tasks are faring:
-
-1. Proactively alerting crowd workers when an error occurs and encouraging them to contact you if this happens
-2. Auto-logging errors for React-based tasks
-2. Exposing error logging infrastructure for more advanced custom front-end use cases
-
-To opt into #1 above, you need to define a global variable as such:
-```js
-window._MEPHISTO_CONFIG_ = {
-    /* required: */
-    ADD_ERROR_HANDLING: true,
-    /* optional: */
-    ERROR_REPORT_TO_EMAIL: "[email protected]"
-}
-```
+## Adding UI error handling to tasks
 
-This will show a prompt as such if an uncaught error is detected:
+Currently, we have beta functionality for error handling. We provide a few ways of getting a signal into how your tasks are faring:
 
-![](/faq_ui_error_message.png)
+1. Auto-logging errors for React-based tasks
+2. Proactively alerting crowd workers when an error occurs and encouraging them to contact you if this happens
+3. Exposing error logging infrastructure for more advanced custom front-end use cases
 
-For #2 above, auto-logging can be enabled for React apps by importing the `<ErrorBoundary />` component and wiring it up as such:
+### Automatic frontend logging
+For #1 above, auto-logging can be enabled for React apps by importing the `<ErrorBoundary />` component and wiring it up as such:
 
 ```jsx
 import { ErrorBoundary } from "mephisto-task";
@@ -35,9 +26,24 @@ return (
   </ErrorBoundary>
 );
 ```
-
 This will automatically send an error packet to the backend Mephisto server when an error occurs.
 
+### Alerting crowd-workers of issues
+To opt into #2 above, you need to define a global variable as such:
+```js
+window._MEPHISTO_CONFIG_ = {
+    /* required: */
+    ADD_ERROR_HANDLING: true,
+    /* optional: */
+    ERROR_REPORT_TO_EMAIL: "[email protected]"
+}
+```
+
+This will show a prompt as such if an uncaught error is detected:
+
+![](/faq_ui_error_message.png)
+
+### Advanced Usage
 `handleFatalError` can also be used in any custom logic code you wish - for example, in handling errors for AJAX requests which live outside of the scope of React Error Boundaries:
 
 ```jsx

diff --git a/docs/web/docs/guides/how_to_use/task_creation/hosting_assets.md b/docs/web/docs/guides/how_to_use/task_creation/hosting_assets.md
@@ -0,0 +1,76 @@
+---
+sidebar_position: 2
+---
+
+# Hosting task assets
+
+Generally there are two models for hosting assets related to a task, with distinct tradeoffs. These are to upload files to the routing server, or to store the files locally on Mephisto and share the data on connection. The former is generally the easier solution.
+
+
+### Uploading files
+
+**Pros:**
+- Really simple to implement with `StaticBlueprint`-based tasks.
+- Reduces bandwidth concerns from the main server running Mephisto, as the data is managed on the routing server.
+
+**Cons:**
+- Requires more storage space on the routing server. In some `Architect`s (like the `EC2Architect`), this may increase costs. In others (like the `HerokuArchitect`), it may not even be possible to exceed a maximum server size.
+- Data is stored on an external server, and can be directly addressed. This exposes your source data to crawling while the server is up.
+- Requires manual setup implementation on `Blueprint`s that don't extend the `StaticBlueprint`.
+
+The method of uploading files directly involves taking a folder and uploading its contents to the statically accessible part of the routing server. With `StaticBlueprint`s, this is done by providing an argument for `mephisto.blueprint.extra_source_dir`. For instance:
+
+```yaml
+# my_conf.yaml
+mephisto:
+  blueprint:
+    extra_source_dir: my_path/
+...
+```
+
+This would make all of the files available at `my_path/` accessible from the frontend. As such, if the file `my_path/TestImg.png` was on the local machine running mephisto, you could access `<server>/TestImg.png` from your frontend. For instance:
+```js
+function LoadedImage({source}) {
+  ...
+  return <div>
+    <img src={source}/>
+  </div>
+}
+```
+This component would be able to render with `<LoadedImage source={'TestImg.png'} />`. This means you can pass data for each task for the files you want it to reference in `task_data` and use these in the frontend.
+
+### Local storage of files
+
+**Pros:**
+- All data is stored locally, and cannot be directly compromised. 
+- Works with any `Architect`s and `Blueprint`s.
+
+**Cons:**
+- Increases the size of saved data, as base64 encodings of files will be included in the final files
+- Reduces bandwidth for the task, as the Mephisto server is responsible for sending potentially large files
+
+This process involves sending the binary of the object to the frontend, and directly rendering it. You'd likely do this process while assembling a `task_data` array. For instance if you're working with images:
+```python
+import base64
+
+def get_task_data(img_dir: str):
+    imgs = {}
+    for filename in os.listdir(img_dir):
+        with open(os.path.join(img_dir, filename), 'rb') as bin_image:
+            imgs[filename] = "data:image/jpeg;base64," + base64.b64encode(bin_image.read())
+
+    return [{'img_name': k, 'img_data': v} for k, v in imgs]
+```
+
+Then on the frontend you can access the `img_data` and use it in a component directly. For instance:
+
+```js
+function PassedImage({img_data}) {
+  ...
+  return <div>
+    <img src={img_data}/>
+  </div>
+}
+```
+
+If data issues are a concern, one could modify the `AgentState` to delete the `img_data` (or other data-heavy keys) and retain filenames on the final save.
diff --git a/docs/web/docs/guides/how_to_use/worker_quality/_category_.yml b/docs/web/docs/guides/how_to_use/worker_quality/_category_.yml
@@ -0,0 +1,3 @@
+label: "Worker quality control"
+collapsed: false
+position: 3
diff --git a/...des/how-tos/common_qualification_flows.md → ...ker_quality/common_qualification_flows.md b/...des/how-tos/common_qualification_flows.md → ...ker_quality/common_qualification_flows.md
@@ -1,11 +1,14 @@
-# Qualifications
-Qualification control is a powerful component of Mephisto, allowing you to filter out workers with both manual and automatic controls. Within this are typical allowlists and blocklists, setting up value-based qualifications, making automatic qualifications for onboarding, and also utilizing the qualifications that various crowdsourcing providers have to offer. This document seeks to describe some common use cases for qualifications, and how we currently go about using them.
+---
+sidebar_position: 1
+---
 
+# Using qualifications to improve worker quality
+Qualification control is a powerful component of Mephisto, allowing you to filter out workers with both manual and automatic controls. Within this are typical allowlists and blocklists, setting up value-based qualifications, making automatic qualifications for onboarding, and also utilizing the qualifications that various crowdsourcing providers have to offer. This document seeks to describe some common use cases for qualifications, and how we currently go about using them.
 
-# Blocking qualifications
+### Blocking qualifications
 When you set a `block_qualification` during a launch, calling `Worker.grant_qualification(<block_qualification>)` will prevent that worker from working on any tasks that you have set the same `block_qualification` for. You can use this to set up blocklists for specific tasks, or for groups of tasks.
 
-# Onboarding qualifications
+### Onboarding qualifications
 Mephisto has an automatic setup for assigning workers qualifications for particular tasks that they've worked on, such that it's possible to specify a qualification that a worker can be granted on the first time they take out a particular task. This qualification is given the name `onboarding_qualification`, and is compatible with any blueprints that have onboarding tasks.
 
 When a worker accepts your task for the first time, they will have neither the passing or failing version of the onboarding qualification, and will be put into a test version of the task that determines if they are qualified. Then only those that qualify the first time will be able to continue working on that task.
@@ -40,7 +43,7 @@ shared_state.qualifications = [
 ]
 ```
 
-# Allowlists and Blocklists
+### Allowlists and Blocklists
 Similarly to how the standard `block_qualification` works, it's possible to add additional qualifications to `Worker`s by granting workers qualifications and making their existence exclusive or inclusive. This is accomplished by adding the qualifications to your `SharedTaskState`:
 ```python
 from mephisto.data_model.qualification import QUAL_NOT_EXIST, make_qualification_dict
@@ -69,7 +72,7 @@ shared_state.qualifications = [
 ]
 ```
 
-# Adding custom qualifications to SharedTaskState
+### Adding custom qualifications to SharedTaskState
 You should be able to specify a qualification in Mephisto using the following:
 ```python
 from mephisto.operations.utils import find_or_create_qualification
@@ -90,12 +93,12 @@ where `QUAL_COMPARATOR` is any of the comparators available [here](https://githu
 
 You can directly grant that qualification to mephisto `Worker`'s using `Worker.grant_qualification("QUALIFICATION_NAME", qualification_value)`.
 
-# What if I want to block a worker that hasn't connected before?
+### What if I want to block a worker that hasn't connected before?
 For this you'll want to use the interface that a `CrowdProvider` has set up to do the granting process directly. An example for this can be found in `abstractions.providers.mturk.utils.script_utils`. 
 
 Note, while you're able to grant these qualifications to a worker that isn't tracked by Mephisto, it will not be possible for Mephisto to help in bookkeeping qualifications granted to workers in this manner.
 
-# What if I want to use qualifications only set by a provider?
+### What if I want to use qualifications only set by a provider?
 For the special case of provider-specific qualifications, `SharedTaskState` has fields for `<provider>_specific_qualifications` wherein you can put qualifications in the expected format for that crowd provider. For instance, you can do the following for using an [MTurk-specific qualification](https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_QualificationRequirementDataStructureArticle.html#ApiReference_QualificationType-IDs) on a task:
 ```python
 shared_state = #... initialize a SharedTaskState for your run

diff --git a/docs/web/docs/guides/how_to_use/worker_quality/other_methods.md b/docs/web/docs/guides/how_to_use/worker_quality/other_methods.md
@@ -0,0 +1,9 @@
+---
+sidebar_position: 5
+---
+
+# Other methods for quality control
+
+TODO discuss usage of pre-qualifications for MTurk, worker-agreement, multi-tiered worker qualifications, and review-tasks-as-tasks as methods
+
+TODO note that while these aren't yet codified, it would be great to see as a contribution.
diff --git a/docs/web/docs/guides/how_to_use/worker_quality/using_golds.md b/docs/web/docs/guides/how_to_use/worker_quality/using_golds.md
@@ -0,0 +1,7 @@
+---
+sidebar_position: 4
+---
+
+# Check against standards with Gold Labels
+
+TODO - guide on how to use gold labels to prevent slipping
diff --git a/docs/web/docs/guides/how_to_use/worker_quality/using_onboarding.md b/docs/web/docs/guides/how_to_use/worker_quality/using_onboarding.md
@@ -0,0 +1,7 @@
+---
+sidebar_position: 2
+---
+
+# Teach potential workers with Onboarding
+
+TODO - guide on how to use onboarding to ensure that workers are understanding their task
diff --git a/docs/web/docs/guides/how_to_use/worker_quality/using_validation.md b/docs/web/docs/guides/how_to_use/worker_quality/using_validation.md
@@ -0,0 +1,7 @@
+---
+sidebar_position: 3
+---
+
+# Check worker quality with Validation
+
+TODO - guide on how to use validation to ensure that the work is good automatically