Skip to content

Commit

Permalink
Fix some typos in README.
Browse files Browse the repository at this point in the history
  • Loading branch information
raj-sinha committed Jul 12, 2024
1 parent 66587d0 commit 0fcfeda
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The metric reported by the pipeline is model [AUC](https://developers.google.com

<span style="color:red;background-color:lightgrey">input_bigquery_table_path (string)</span>: A BigQuery table path in the format 'project.dataset.table'. If this is the only BigQuery path provided, this will be used in conjunction with the test_dataset_holdout_fraction parameter to create a train/test split. Use exactly one of BigQuery locations or GCS locations.

<span style="color:red;background-color:lightgrey">input_gcs_uri (string)</span>: Cloud Storage location to store the input data. If this is the only Cloud Storage location provided, this will be used in conjunction with test_dataset_holdout_fraction parameter to create a train/test split. Use exactly one of BigQuery locations or GCS locations.
<span style="color:red;background-color:lightgrey">data_input_gcs_uri (string)</span>: Cloud Storage location to store the input data. If this is the only Cloud Storage location provided, this will be used in conjunction with test_dataset_holdout_fraction parameter to create a train/test split. Use exactly one of BigQuery locations or GCS locations.

<span style="color:red;background-color:lightgrey">output_gcs_uri (string)</span>: Cloud Storage location to store the supervised model assets. The location should be in the form gs://bucketname/foldername. A timestamp will be added to the end of the folder so that multiple runs of this won't overwrite previous runs. Supervised model assets are always stored to GCS.

Expand All @@ -91,7 +91,7 @@ one class classifier ensemble to label a point as negative. The higher this valu

<span style="color:yellow;background-color:lightgrey">test_dataset_holdout_fraction</span>: This setting is used if <span style="color:yellow;background-color:lightgrey">test_bigquery_table_path</span> is not provided. Float between 0 and 1 representing the fraction of samples to hold out as a test set. Default is 0.2, meaning 20% of the data is used for training. In the PU setting, this means that 20% of the positive labels and 100% of the negative data (Since we do not use any ground truth negative data for the supervised mode training) will be used for creating the test sets. For the PNU setting, it is just 20% of positive and negative samples, sampled uniformly at random, all other data would be used for training.

<span style="color:yellow;background-color:lightgrey">test_gcs_uri</span>: Cloud Storage location to store the CSV data to be used for evaluating the supervised model. Note that the positive and negative label values must also be the same in this testing set. It is okay to have your test labels in that form, or use 1 for positive and 0 for negative. Use exactly one of BigQuery locations or GCS locations.
<span style="color:yellow;background-color:lightgrey">data_test_gcs_uri</span>: Cloud Storage location to store the CSV data to be used for evaluating the supervised model. Note that the positive and negative label values must also be the same in this testing set. It is okay to have your test labels in that form, or use 1 for positive and 0 for negative. Use exactly one of BigQuery locations or GCS locations.

<span style="color:yellow;background-color:lightgrey">upload_only</span>: Use this setting in conjunction with `output_bigquery_table_path` or `data_output_gcs_uri`. When `True`, the algorithm will just upload the pseudo labeled data to the specified table, and will skip training a supervised model. When set to `False`, the algorithm will also train a supervised model and upload it to a GCS location. Default is `False`.

Expand Down

0 comments on commit 0fcfeda

Please sign in to comment.