Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add csv-demand parser #995

Merged
merged 52 commits into from
Jun 17, 2024
Merged

Conversation

ekatef
Copy link
Collaborator

@ekatef ekatef commented Mar 21, 2024

#987

The aim of the PR is to provide an option to use csv files as demand inputs.

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@ekatef ekatef marked this pull request as draft March 21, 2024 16:45
@ekatef
Copy link
Collaborator Author

ekatef commented Mar 22, 2024

CI currently fails during building a test configuration, when get_load_paths_gegis is called from Snakemake. It looks like there may be some difference in naming conventions.

@ekatef
Copy link
Collaborator Author

ekatef commented Mar 30, 2024

The tutorial workflow in CI fails due to an empty power plants matching located in the requested region. Not sure in which way it is connected with the changes introduced by the PR, and can't reproduce it locally yet.

@ekatef
Copy link
Collaborator Author

ekatef commented Apr 7, 2024

@davide-f could you please have a look? CI is failing due an some troubles in finding power plants for the region, which I can't reproduce locally... Any ideas be very welcome on what may go wrong!

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @ekatef :D

scripts/build_demand_profiles.py Outdated Show resolved Hide resolved
scripts/build_demand_profiles.py Outdated Show resolved Hide resolved
scripts/build_demand_profiles.py Outdated Show resolved Hide resolved
scripts/build_demand_profiles.py Show resolved Hide resolved
load_path = os.path.join(load_dir, continent + ext)
if os.path.exists(load_path):
load_paths.append(load_path)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move it inside the if, otherwise the .csv never gets executed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks :)

@ekatef ekatef force-pushed the add_demand_parser branch 2 times, most recently from a159418 to d9276c6 Compare May 7, 2024 22:38
@ekatef
Copy link
Collaborator Author

ekatef commented May 7, 2024

The changes in this PR should be aligned with #1017.

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @ekatef :D
The functionality is absolutely there!

Need to check why the CI is not working though

As minor comments:

  • plase revise the release_notes
  • it may be good to add something in the documentation to clarify on how to feed in the custom data and leverage on this amazing contribution. I understand this is time demanding. not sure if you have the time to craft something very very basic to be later improved or we leave it completely for a next PR.

What do you think?

Snakefile Outdated Show resolved Hide resolved
@ekatef
Copy link
Collaborator Author

ekatef commented May 29, 2024

Great @ekatef :D The functionality is absolutely there!

Need to check why the CI is not working though

As minor comments:

  • plase revise the release_notes
  • it may be good to add something in the documentation to clarify on how to feed in the custom data and leverage on this amazing contribution. I understand this is time demanding. not sure if you have the time to craft something very very basic to be later improved or we leave it completely for a next PR.

What do you think?

Thanks a lot for reviewing @davide-f! 😄 I can confirm that the functionality has been tested successfully :) 🎉

The reason for CI failing has been that Snakemake is parsing all the inputs before starting to do anything. So, it didn't help much to place get_load_paths_gegis() in the inputs of build_demand_profiles, as Snakemake still executed get_load_paths_gegis() before loading the databundle when running the tutorial case. That leads to failure along the workflow, if a new modification of get_load_paths_gegis is used as it captures only the paths which really exist of the drive.

To fix this, I had to put back composing of load_data_paths, but add a condition to treat the paths differently, depending on the actual presence of the load data. Not sure it's the most elegant approach but seems to be an effective one.

Done for a release note :D Have also drafted a documentation on using custom load profiles. Could you please check if this addition looks clear enough to be used?

PS Can't help but say that it's amazing to have a documentation build as a part of CI 💚 Thanks a lot to you and @GbotemiB for adding this feature!

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, the CI is running :D so closer to merge :)

Snakefile Outdated
Comment on lines 46 to 50
if os.path.exists(os.path.join("data", config["load_options"]["ssp"])):
load_data_paths = get_load_paths_gegis("data", config, check_existence=True)
else:
# demand profiles are not available yet -> no opportunity to use custom load files
load_data_paths = get_load_paths_gegis("data", config, check_existence=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm I think the check_existence can lead to misleading effects due to when that is evaluated.
I think we should not have this option and keep the previous behavior maybe?

The reason is that the file may not exist when the function is triggered.
For example, with a fresh run, the file does not exists at the beginning because it is moved here from retrieve_databundle.

Maybe the check on whether the file is missing should be moved when the files are actually read, but if the file is missing, the workflow breaks so it may not be needed.
Except the check on whether all selected countries are found, I expected that check to be there already but may be good to crosscheck if you wanted to address that here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for double-checking @davide-f 🙂 This file-existence issue is quite a tricky part, in fact.

The point is snakemake parses all the input paths before building DAG and starting to execute anything. Which means that load_data_paths is evaluated before any of the rule would run, including retrieve_databundle. So, it's possible that load_data_paths is evaluated when demand files are not loaded yet. It has been perfectly alright before, when we were not interested if the demand files exist. But the demand parser must check which exactly inputs present, and breaks if there is no any inputs yet.

If we would restore the original get_load_paths_gegis removing the check-existence condition, that will lead to the error in a fresh run. That has been a reason of CI being unhappy previously, which I have looked to fix. Not sure the current implementation is the most elegant one, but it works and hopefully doesn't introduce breaking changes into the workflow.

Agree that it can also work if we would check if the data folder exist directly in the load_data_paths (an example of a similar solution). Have implemented and tested the improvement and happy to have your opinion on that 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh right, good point! I catched the issue but not completely the side effects, great explanation.
I'm not a great fan of that long if case; I got another idea; adding a review comment and let's see what's your idea.
Already this is good :)

Comment on lines 121 to 137
for continent in region_load:
for ext in [".nc", ".csv"]:
load_path = os.path.join(str(load_dir), str(continent) + str(ext))
if os.path.exists(load_path):
load_paths.append(load_path)
break

avail_regions = [
os.path.split(os.path.abspath(pth))[1].split(".nc")[0] for pth in load_paths
]

logger.info(f" An assumed load folder {load_dir}, load path is {load_paths}.")

if len(load_paths) == 0:
logger.warning(
f"No demand data file for {set(region_load).difference(avail_regions)}. An assumed load folder {load_dir}."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we drop the exists of scenario path and we do:

Suggested change
for continent in region_load:
for ext in [".nc", ".csv"]:
load_path = os.path.join(str(load_dir), str(continent) + str(ext))
if os.path.exists(load_path):
load_paths.append(load_path)
break
avail_regions = [
os.path.split(os.path.abspath(pth))[1].split(".nc")[0] for pth in load_paths
]
logger.info(f" An assumed load folder {load_dir}, load path is {load_paths}.")
if len(load_paths) == 0:
logger.warning(
f"No demand data file for {set(region_load).difference(avail_regions)}. An assumed load folder {load_dir}."
)
regions_found = []
file_names = []
for continent in region_load:
sel_ext = ".nc"
for ext in [".nc", ".csv"]:
load_path = os.path.join(str(load_dir), str(continent) + str(ext))
if os.path.exists(load_path):
sel_ext = ext
regions_found.append(continent)
break
file_name = str(continent) + str(sel_ext)
load_path = os.path.join(str(load_dir), file_name)
load_paths.append(load_path)
file_names.append(file_name)
logger.info(
f"Demand data folder: {load_dir}, load path is {load_paths}.\n" +
f"Expected files: " + "; ".join(file_names)
)

If the file does not exist, this solution should backup to the .nc case.
Not sure about how much info to disclose;
I'm unsure it is good to advice about whether demand data files are existing or not: for the standard user that may be confusing because of the ordering about when snakemake makes this evaluation.

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the solution! 😄 Implemented. Thanks a lot for that.

Agree that the no-demand warning feels redundant. Moreover, it's also not needed: in case no demand data provided and retrive_databundle: false, Snakemake just throws an error even before running anything. So, we can just remove this checking part.

@ekatef
Copy link
Collaborator Author

ekatef commented Jun 8, 2024

Cool, the CI is running :D so closer to merge :)

@davide-f thanks :D Have implemented your suggestions, and the changes look now quite consistent with the original version which feels really great 🙃 Thanks a lot for reviewing!

Both local and CI testing are successful now.

Could you please check if everything looks well in the revised version?

@ekatef ekatef linked an issue Jun 12, 2024 that may be closed by this pull request
@davide-f davide-f merged commit 2c33699 into pypsa-meets-earth:main Jun 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support input demand data with csv format
2 participants