Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new functions to support downloading from AWS s3 #218

Merged
merged 17 commits into from
Jun 20, 2024

Conversation

fang19911030
Copy link
Collaborator

Add new python functions which can be used to download files from AWS s3 to the utils.py

@fang19911030 fang19911030 changed the title New inference pengcheng Add new functions to support downloading from AWS s3 May 13, 2024
@kjsato
Copy link
Contributor

kjsato commented May 13, 2024

@fang19911030 I got errors when invoking pytest test_utils.py such as:

============================================== test session starts ===============================================
platform darwin -- Python 3.9.13, pytest-7.4.0, pluggy-1.0.0
rootdir: /Users/koji/Codes/flepimop/flepiMoP/flepimop/gempyor_pkg/tests/utils
plugins: anyio-3.5.0, cov-4.1.0
collected 17 items

test_utils.py ......F....FFFFFF [100%]

==================================================== FAILURES ====================================================

============================================ short test summary info =============================================
FAILED test_utils.py::test_print_disk_diagnosis_success - AttributeError: module 'gempyor.utils' has no attribute 'print_disk_diagnosis'
FAILED test_utils.py::test_create_resume_out_filename - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_out_filename'
FAILED test_utils.py::test_create_resume_input_filename - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_input_filename'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_true_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_false_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_flepi_block_index_2 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_create_resume_file_names_map - AttributeError: module 'gempyor.utils' has no attribute 'create_resume_file_names_map'
========================================== 7 failed, 10 passed in 3.31s ==========================================

Are any prerequisites required?

@kjsato
Copy link
Contributor

kjsato commented May 14, 2024

after updated conda env and gempyor:

============================================== test session starts ===============================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.0.0
rootdir: /Users/koji/Codes/flepimop/flepiMoP/flepimop/gempyor_pkg/tests/utils
collected 17 items

test_utils.py .............FFF. [100%]

==================================================== FAILURES ====================================================
============================================ short test summary info =============================================
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_true_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_resume_discard_seeding_false_flepi_block_index_1 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
FAILED test_utils.py::test_get_parquet_types_flepi_block_index_2 - AttributeError: module 'gempyor.utils' has no attribute 'get_parquet_types'
==================================== 3 failed, 14 passed, 9 warnings in 3.83s ====================================

@fang19911030
Copy link
Collaborator Author

Hi @kjsato , I corrected the wrong function name used in unit tests. These errors are solved.

@kjsato
Copy link
Contributor

kjsato commented May 14, 2024

Hi @kjsato , I corrected the wrong function name used in unit tests. These errors are solved.

I confirmed, thx

Copy link
Contributor

@kjsato kjsato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing it like L364, for readability in utils.py?

@kjsato kjsato self-requested a review May 14, 2024 23:17
kjsato
kjsato previously approved these changes May 14, 2024
Copy link
Collaborator

@jcblemai jcblemai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is very good, I had some comments along the way we can discuss.

The end goal is to have a command line to resume like there is currently.gempyor-simulate and these functions should be adapted to be used through this entry point, that will read environment variable or command line argument through click.

def create_resume_out_filename(filetype: str, liketype: str) -> str:
run_id = os.environ.get("FLEPI_RUN_INDEX")
prefix = f"{os.environ.get('FLEPI_PREFIX')}/{os.environ.get('FLEPI_RUN_INDEX')}"
inference_filepath_suffix = f"{liketype}/intermidate"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intermediate

flepimop/gempyor_pkg/src/gempyor/utils.py Outdated Show resolved Hide resolved
)


def get_parquet_types_for_resume() -> List[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good, but these functions should take arguments and not read environment variables, because we want to use them from gempyor (and have less and fewer environment variable), and because if we need an environment variable version we can add a wrapper around it.

(think of a script like simulate.py, that is e.g called resume_from.py : this script parse the environment variable from the safety of the click module and call these functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note: seeding is a csv, so I think this function should be renamed "get_filetype_for_resume" (this 4 letter id is called filetype throughout the code)-

flepimop/gempyor_pkg/src/gempyor/utils.py Show resolved Hide resolved
@jcblemai
Copy link
Collaborator

(again, sorry for the delay reviewing, as you know round 18 has been really hard)

@fang19911030
Copy link
Collaborator Author

No worry, I will do change based the feedbacks

Base automatically changed from new_inference to main June 7, 2024 13:43
@jcblemai jcblemai dismissed kjsato’s stale review June 7, 2024 13:43

The base branch was changed.

@jcblemai
Copy link
Collaborator

This looks good to me, thank you @fang19911030 for the fixes

@kjsato
Copy link
Contributor

kjsato commented Jun 19, 2024

@fang19911030 Thanks, I checked the tests could be passed wothout any problem!

@jcblemai jcblemai merged commit 9e0ad95 into main Jun 20, 2024
1 check passed
@jcblemai jcblemai deleted the new_inference_pengcheng branch June 20, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants