Skip to content

Commit

Permalink
Adding tutorial stubs to docusaurus (#613)
Browse files Browse the repository at this point in the history
* Adding doc stubs

* Contribution guide

* Adding documentation stubs for many sections

* Some documentation

* Working through first tutorial using defaults

* Fixing up mephisto wut

* Finishing first tutorial finally :)

* Fixing after webpack updates, work on tutorial 2

* Double ii fix
  • Loading branch information
JackUrb committed Dec 6, 2021
1 parent 1fb092f commit 75ef231
Show file tree
Hide file tree
Showing 57 changed files with 4,446 additions and 8,464 deletions.
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,19 @@ tmp/*
**/node_modules/*
mephisto/server/**/package-lock.json
mephisto/server/blueprints/**/build/*
examples/**/build/*

**/*.log
**/build/*
**/_generated/*
**/outputs/*
.coverage

# Examples
examples/simple_static_task/hydra_configs/conf/*
!examples/simple_static_task/hydra_configs/conf/example.yaml
!examples/simple_static_task/hydra_configs/conf/onboarding_example.yaml
examples/**/build/*

# PyCharm
.idea

Expand Down
2 changes: 0 additions & 2 deletions docs/web/docs/guides/how-tos/_category_.yml

This file was deleted.

3 changes: 3 additions & 0 deletions docs/web/docs/guides/how_to_contribute/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
label: "How to contribute"
collapsed: true
position: 4
22 changes: 22 additions & 0 deletions docs/web/docs/guides/how_to_contribute/backend_development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
sidebar_position: 3
---

# Backend: dev setup

We use [pre-commit](https://pre-commit.com/) to enforce code styles on the code base (using `black` for Python and `prettier` for Javascript).

To setup your local codebase to auto-lint and avoid lint test failures for your PRs, please set up pre-commit for your local repo as such:

1. `pip install pre-commit`
2. `pre-commit install` to install git hooks
3. `pre-commit run --all-files` (optional - run ad-hoc against all files)


## Local development mode

If you've installed Mephisto via `pip install mephisto` in the past, in order to get python to use your local version of the package, navigate to your `Mephisto` folder and run:
```bash
pip install -e .
```
This will ensure that your local changes are used in the running version of Mephisto
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
# Front-end: Dev setup
---
sidebar_position: 2
---

# Frontend: dev setup

We use [pre-commit](https://pre-commit.com/) to enforce code styles on the code base (using `black` for Python and `prettier` for Javascript).

Expand Down
17 changes: 17 additions & 0 deletions docs/web/docs/guides/how_to_contribute/getting_started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
sidebar_position: 1
---

# Contributing to Mephisto

Mephisto is built for developing on, but developing for Mephisto can sometimes be unclear. We aim to provide some guides on contributions for abstractions, underlying infrastructure, or developer experience, but oftentimes the best resource will be opening an issue on our Github directly.

## Understanding Mephisto

One of the most important parts of contributing is understanding where the project stands and where we're going. Be sure to check out relevant pages on our architecture or coming plans to ramp into things. You can also check our references when trying to dive into specific components.

If documentation for a specific component is lacking here, you can take a step further to investigate the blame for the file - we strive to keep documentation for our decisions in all of our pull requests, and some insight is likely present within.

## I want to help, but don't know where to start

If you want to contribute with something, there are often github issues marked with [help wanted](https://github.com/facebookresearch/Mephisto/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22). We also note some as being [good first issues](https://github.com/facebookresearch/Mephisto/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22), and these often have in-depth descriptions on how to get started.
3 changes: 3 additions & 0 deletions docs/web/docs/guides/how_to_use/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
label: "In-depth use"
collapsed: true
position: 3
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
label: "Efficiency and Organization"
collapsed: true
position: 4
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Using Docker

Some users prefer to keep Mephisto entirely contained. Docker is one option for being able to do this.

```bash
# Build the docker image and tag with name 'mephisto'
$ docker build -t mephisto .
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 1
---

# Use the same configs across tasks

TODO - talk about setting up hydra profiles
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 2
---

# Organize tasks and qualifications

TODO - extend on the workflow tutorial
3 changes: 3 additions & 0 deletions docs/web/docs/guides/how_to_use/task_creation/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
label: "Creating a task"
collapsed: false
position: 2
Original file line number Diff line number Diff line change
@@ -1,28 +1,19 @@
# Front-end: FAQs
---
sidebar_position: 1
---

### How do I add UI error handling for my tasks?
# Developing and debugging frontends

Currently, we have beta functionality for error handling. We provide a few ways of getting a signal into how your tasks are faring:

1. Proactively alerting crowd workers when an error occurs and encouraging them to contact you if this happens
2. Auto-logging errors for React-based tasks
2. Exposing error logging infrastructure for more advanced custom front-end use cases

To opt into #1 above, you need to define a global variable as such:
```js
window._MEPHISTO_CONFIG_ = {
/* required: */
ADD_ERROR_HANDLING: true,
/* optional: */
ERROR_REPORT_TO_EMAIL: "[email protected]"
}
```
## Adding UI error handling to tasks

This will show a prompt as such if an uncaught error is detected:
Currently, we have beta functionality for error handling. We provide a few ways of getting a signal into how your tasks are faring:

![](/faq_ui_error_message.png)
1. Auto-logging errors for React-based tasks
2. Proactively alerting crowd workers when an error occurs and encouraging them to contact you if this happens
3. Exposing error logging infrastructure for more advanced custom front-end use cases

For #2 above, auto-logging can be enabled for React apps by importing the `<ErrorBoundary />` component and wiring it up as such:
### Automatic frontend logging
For #1 above, auto-logging can be enabled for React apps by importing the `<ErrorBoundary />` component and wiring it up as such:

```jsx
import { ErrorBoundary } from "mephisto-task";
Expand All @@ -35,9 +26,24 @@ return (
</ErrorBoundary>
);
```

This will automatically send an error packet to the backend Mephisto server when an error occurs.

### Alerting crowd-workers of issues
To opt into #2 above, you need to define a global variable as such:
```js
window._MEPHISTO_CONFIG_ = {
/* required: */
ADD_ERROR_HANDLING: true,
/* optional: */
ERROR_REPORT_TO_EMAIL: "[email protected]"
}
```

This will show a prompt as such if an uncaught error is detected:

![](/faq_ui_error_message.png)

### Advanced Usage
`handleFatalError` can also be used in any custom logic code you wish - for example, in handling errors for AJAX requests which live outside of the scope of React Error Boundaries:

```jsx
Expand Down
76 changes: 76 additions & 0 deletions docs/web/docs/guides/how_to_use/task_creation/hosting_assets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
sidebar_position: 2
---

# Hosting task assets

Generally there are two models for hosting assets related to a task, with distinct tradeoffs. These are to upload files to the routing server, or to store the files locally on Mephisto and share the data on connection. The former is generally the easier solution.


### Uploading files

**Pros:**
- Really simple to implement with `StaticBlueprint`-based tasks.
- Reduces bandwidth concerns from the main server running Mephisto, as the data is managed on the routing server.

**Cons:**
- Requires more storage space on the routing server. In some `Architect`s (like the `EC2Architect`), this may increase costs. In others (like the `HerokuArchitect`), it may not even be possible to exceed a maximum server size.
- Data is stored on an external server, and can be directly addressed. This exposes your source data to crawling while the server is up.
- Requires manual setup implementation on `Blueprint`s that don't extend the `StaticBlueprint`.

The method of uploading files directly involves taking a folder and uploading its contents to the statically accessible part of the routing server. With `StaticBlueprint`s, this is done by providing an argument for `mephisto.blueprint.extra_source_dir`. For instance:

```yaml
# my_conf.yaml
mephisto:
blueprint:
extra_source_dir: my_path/
...
```

This would make all of the files available at `my_path/` accessible from the frontend. As such, if the file `my_path/TestImg.png` was on the local machine running mephisto, you could access `<server>/TestImg.png` from your frontend. For instance:
```js
function LoadedImage({source}) {
...
return <div>
<img src={source}/>
</div>
}
```
This component would be able to render with `<LoadedImage source={'TestImg.png'} />`. This means you can pass data for each task for the files you want it to reference in `task_data` and use these in the frontend.

### Local storage of files

**Pros:**
- All data is stored locally, and cannot be directly compromised.
- Works with any `Architect`s and `Blueprint`s.

**Cons:**
- Increases the size of saved data, as base64 encodings of files will be included in the final files
- Reduces bandwidth for the task, as the Mephisto server is responsible for sending potentially large files

This process involves sending the binary of the object to the frontend, and directly rendering it. You'd likely do this process while assembling a `task_data` array. For instance if you're working with images:
```python
import base64

def get_task_data(img_dir: str):
imgs = {}
for filename in os.listdir(img_dir):
with open(os.path.join(img_dir, filename), 'rb') as bin_image:
imgs[filename] = "data:image/jpeg;base64," + base64.b64encode(bin_image.read())

return [{'img_name': k, 'img_data': v} for k, v in imgs]
```

Then on the frontend you can access the `img_data` and use it in a component directly. For instance:

```js
function PassedImage({img_data}) {
...
return <div>
<img src={img_data}/>
</div>
}
```

If data issues are a concern, one could modify the `AgentState` to delete the `img_data` (or other data-heavy keys) and retain filenames on the final save.
3 changes: 3 additions & 0 deletions docs/web/docs/guides/how_to_use/worker_quality/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
label: "Worker quality control"
collapsed: false
position: 3
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# Qualifications
Qualification control is a powerful component of Mephisto, allowing you to filter out workers with both manual and automatic controls. Within this are typical allowlists and blocklists, setting up value-based qualifications, making automatic qualifications for onboarding, and also utilizing the qualifications that various crowdsourcing providers have to offer. This document seeks to describe some common use cases for qualifications, and how we currently go about using them.
---
sidebar_position: 1
---

# Using qualifications to improve worker quality
Qualification control is a powerful component of Mephisto, allowing you to filter out workers with both manual and automatic controls. Within this are typical allowlists and blocklists, setting up value-based qualifications, making automatic qualifications for onboarding, and also utilizing the qualifications that various crowdsourcing providers have to offer. This document seeks to describe some common use cases for qualifications, and how we currently go about using them.

# Blocking qualifications
### Blocking qualifications
When you set a `block_qualification` during a launch, calling `Worker.grant_qualification(<block_qualification>)` will prevent that worker from working on any tasks that you have set the same `block_qualification` for. You can use this to set up blocklists for specific tasks, or for groups of tasks.

# Onboarding qualifications
### Onboarding qualifications
Mephisto has an automatic setup for assigning workers qualifications for particular tasks that they've worked on, such that it's possible to specify a qualification that a worker can be granted on the first time they take out a particular task. This qualification is given the name `onboarding_qualification`, and is compatible with any blueprints that have onboarding tasks.

When a worker accepts your task for the first time, they will have neither the passing or failing version of the onboarding qualification, and will be put into a test version of the task that determines if they are qualified. Then only those that qualify the first time will be able to continue working on that task.
Expand Down Expand Up @@ -40,7 +43,7 @@ shared_state.qualifications = [
]
```

# Allowlists and Blocklists
### Allowlists and Blocklists
Similarly to how the standard `block_qualification` works, it's possible to add additional qualifications to `Worker`s by granting workers qualifications and making their existence exclusive or inclusive. This is accomplished by adding the qualifications to your `SharedTaskState`:
```python
from mephisto.data_model.qualification import QUAL_NOT_EXIST, make_qualification_dict
Expand Down Expand Up @@ -69,7 +72,7 @@ shared_state.qualifications = [
]
```

# Adding custom qualifications to SharedTaskState
### Adding custom qualifications to SharedTaskState
You should be able to specify a qualification in Mephisto using the following:
```python
from mephisto.operations.utils import find_or_create_qualification
Expand All @@ -90,12 +93,12 @@ where `QUAL_COMPARATOR` is any of the comparators available [here](https://githu

You can directly grant that qualification to mephisto `Worker`'s using `Worker.grant_qualification("QUALIFICATION_NAME", qualification_value)`.

# What if I want to block a worker that hasn't connected before?
### What if I want to block a worker that hasn't connected before?
For this you'll want to use the interface that a `CrowdProvider` has set up to do the granting process directly. An example for this can be found in `abstractions.providers.mturk.utils.script_utils`.

Note, while you're able to grant these qualifications to a worker that isn't tracked by Mephisto, it will not be possible for Mephisto to help in bookkeeping qualifications granted to workers in this manner.

# What if I want to use qualifications only set by a provider?
### What if I want to use qualifications only set by a provider?
For the special case of provider-specific qualifications, `SharedTaskState` has fields for `<provider>_specific_qualifications` wherein you can put qualifications in the expected format for that crowd provider. For instance, you can do the following for using an [MTurk-specific qualification](https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_QualificationRequirementDataStructureArticle.html#ApiReference_QualificationType-IDs) on a task:
```python
shared_state = #... initialize a SharedTaskState for your run
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
sidebar_position: 5
---

# Other methods for quality control

TODO discuss usage of pre-qualifications for MTurk, worker-agreement, multi-tiered worker qualifications, and review-tasks-as-tasks as methods

TODO note that while these aren't yet codified, it would be great to see as a contribution.
7 changes: 7 additions & 0 deletions docs/web/docs/guides/how_to_use/worker_quality/using_golds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 4
---

# Check against standards with Gold Labels

TODO - guide on how to use gold labels to prevent slipping
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 2
---

# Teach potential workers with Onboarding

TODO - guide on how to use onboarding to ensure that workers are understanding their task
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 3
---

# Check worker quality with Validation

TODO - guide on how to use validation to ensure that the work is good automatically
Loading

0 comments on commit 75ef231

Please sign in to comment.