Revert "Merge in eyra updates"

citp · Jul 26, 2024 · b675bf4 · b675bf4
1 parent ee5a799
commit b675bf4
Show file tree

Hide file tree

Showing 7 changed files with 8 additions and 30 deletions.
diff --git a/.github/workflows/checks.yaml b/.github/workflows/checks.yaml
@@ -41,13 +41,6 @@ jobs:
       - name: Run prediction
         run: docker run --rm -v "$(pwd)/.:/data" eyra-rank:latest /data/PreFer_fake_data.csv /data/PreFer_fake_background_data.csv --output /data/predictions.csv
 
-      - name: Check if file exists
-        run: |
-          if [ ! -f "predictions.csv" ]; then
-            echo "Predictions file not found. Please check the logs to see what went wrong."
-            exit 1
-          fi
-
       - name: Build Docker scoring image
         uses: docker/build-push-action@v4
         with:

diff --git a/README.md b/README.md
@@ -46,10 +46,8 @@ Submit your method via the "Submit Method" task on the Next platform by providin
 
 ℹ️ If the check fails go to [FAQ](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#frequently-asked-questions). You might need to add dependencies as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#how-to-add-or-edit-dependencies-librariespackages).
 
-4. On the main page of your repository, above the file list, click "Commits" to view a list of commits. Do NOT click "N commits ahead of". See example below:
- ![](https://github.com/eyra/fertility-prediction-challenge/blob/master/images/screenshot_commits.PNG)
-
-5. Go to the commit that you want to submit and right click on "view commit details", then click "Copy Link Address", see example below:
+4. On the main page of your repository, above the file list, click commits to view a list of commits, as described [here](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/about-commits#about-commit-branches-and-tag-labels)
+5. Go to the commit that you want to submit and right click on view commit details, then click "Copy Link Address", see example below:
 
 ![](https://github.com/eyra/fertility-prediction-challenge/blob/master/images/Copy%20link%20to%20commit.png)
 

diff --git a/images/Copy link to commit.png b/images/Copy link to commit.png
diff --git a/images/screenshot_commits.PNG b/images/screenshot_commits.PNG
diff --git a/model.rds b/model.rds
diff --git a/python.Dockerfile b/python.Dockerfile
@@ -11,4 +11,4 @@ COPY *.py /app
 COPY *.joblib /app
 
 ENTRYPOINT ["conda", "run", "-n", "eyra-rank", "python", "/app/run.py"]
-CMD []
+CMD ["predict", "/data/fake_data.csv"]
diff --git a/score.py b/score.py
@@ -13,17 +13,6 @@
 
 The predictions need to be in a separate file with two columns (nomem_encr, prediction).
 
-Update from April 30:  
-Starting from the second intermediate leaderboard, we use this updated `score.py` script. 
-When calculating recall, we now take into account not only the cases when a predicted value was available (i.e., not missing) but all cases in the holdout set. 
-Specifically, in the updated script, we divide the number of true positives by the total number of positive cases in the ground truth data 
-(i.e., the number of people who actually had a new child), rather than by the sum of true positives and false negatives. 
-This change only matters if there are missing values in predictions. 
-We made this change to avoid a situation where a model makes very accurate predictions for only a small number of cases 
-(where the remaining cases were not predicted because of missing values on predictor variables), 
-yet gets the same result as a model that makes similar accurate predictions but for all cases. 
-Commented lines of code were part of our original scoring function. 
-
 """
 
 import sys
@@ -66,28 +55,26 @@ def score(prediction_path, ground_truth_path, output):
         merged_df
     )
 
-    # Calculate true positives and false positives
+    # Calculate true positives, false positives, and false negatives
     true_positives = len(
         merged_df[(merged_df["prediction"] == 1) & (merged_df["new_child"] == 1)]
     )
     false_positives = len(
         merged_df[(merged_df["prediction"] == 1) & (merged_df["new_child"] == 0)]
     )
-
-    # Calculate the actual number of positive instances (N of people who actually had a new child) for calculating recall
-    n_all_positive_instances = len(merged_df[merged_df["new_child"] == 1])
+    false_negatives = len(
+        merged_df[(merged_df["prediction"] == 0) & (merged_df["new_child"] == 1)]
+    )
 
     # Calculate precision, recall, and F1 score
     try:
         precision = true_positives / (true_positives + false_positives)
     except ZeroDivisionError:
         precision = 0
-
     try:
-        recall = true_positives / n_all_positive_instances
+        recall = true_positives / (true_positives + false_negatives)
     except ZeroDivisionError:
         recall = 0
-
     try:
         f1_score = 2 * (precision * recall) / (precision + recall)
     except ZeroDivisionError: