Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional parameter frequency_dimension across wave.resource #336

Merged
merged 6 commits into from
Aug 19, 2024

Conversation

akeeste
Copy link
Contributor

@akeeste akeeste commented Jun 27, 2024

This PR makes the xarray work in wave.resource more consistent by adding the optional parameter frequency_dimension to functions like significant_wave_height that then pass the parameter to frequency_moment.

This is an immediate speed fix for #331 by allowing xarray formats to be passed to all relevant wave.resource functions

@ssolson
Copy link
Contributor

ssolson commented Jun 27, 2024

@akeeste in case you are catching up we need to get #335 merged before anything will pass tests due to the numpy 2.0.0 upgrade. (waiting on the wind hindcast rn which is everyone's favorite 5 hour test).

@ssolson
Copy link
Contributor

ssolson commented Aug 16, 2024

@akeeste I wrote a script to see if this was improving the testing time and it does not appear to be making much difference. Is the purpose of this to address #331 or just to make the code more consistent?

Job Name run_10406645270 run_10373389775 Time Difference (minutes)
(ADCP_Delft3D_TRTS_example.ipynb, 1200) 18.850000 18.550000 -0.300000
(Delft3D_example.ipynb, 180) 2.216667 2.116667 -0.100000
(PacWave_resource_characterization_example.ipynb) 12.783333 12.233333 -0.550000
(SWAN_example.ipynb, 180) 2.150000 2.266667 0.116667
(WPTO_hindcast_example.ipynb, 180) 1.950000 1.866667 -0.083333
(adcp_example.ipynb, 240) 2.083333 3.083333 1.000000
(adv_example.ipynb, 180) 2.083333 1.933333 -0.150000
(cdip_example.ipynb, 180) 2.416667 3.000000 0.583333
(directional_waves.ipynb, 180) 1.983333 2.316667 0.333333
(environmental_contours_example.ipynb, 360) 10.616667 10.733333 0.116667
(extreme_response_MLER_example.ipynb, 360) 2.166667 2.350000 0.183333
(extreme_response_contour_example.ipynb, 360) 4.833333 4.500000 -0.333333
(extreme_response_full_sea_state_example.ipynb) 4.933333 5.550000 0.616667
(loads_example.ipynb, 180) 1.900000 2.000000 0.100000
(metocean_example.ipynb, 180) 2.000000 2.050000 0.050000
(mooring_example.ipynb, 240) 2.600000 2.683333 0.083333
(power_example.ipynb, 180) 2.116667 2.333333 0.216667
(qc_example.ipynb, 180) 1.883333 2.116667 0.233333
(river_example.ipynb, 180) 1.883333 1.850000 -0.033333
(short_term_extremes_example.ipynb, 180) 1.916667 1.916667 0.000000
(tidal_example.ipynb, 180) 2.000000 1.883333 -0.116667
(tidal_performance_example.ipynb, 180) 1.883333 2.150000 0.266667
(upcrossing_example.ipynb, 180) 1.900000 1.916667 0.016667
(wave_example.ipynb, 180) 1.900000 1.900000 0.000000
(wecsim_example.ipynb, 180) 2.166667 1.966667 -0.200000
import requests
import pandas as pd


# GitHub API token

token = <YOUR_GITHUB_PAT>

# Headers for authorization
headers = {"Authorization": f"token {token}"} if token else {}

urls = [
    "https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10406645270/jobs?per_page=50",
    "https://api.github.com/repos/MHKiT-Software/MHKiT-Python/actions/runs/10373389775/jobs?per_page=50",
]

# Function to fetch jobs details from a workflow run
def fetch_jobs_data(url):
    response = requests.get(url, headers=headers, timeout=10)
    response.raise_for_status()  # Raise an error for bad status codes
    data = response.json()
    return data['jobs']

# Function to calculate the runtime of a job
def calculate_runtime(job):
    start_time = pd.to_datetime(job['started_at'])
    end_time = pd.to_datetime(job['completed_at'])
    runtime = end_time - start_time
    return runtime.total_seconds() / 60  # Convert to minutes

# Initialize an empty list to hold the job data
all_jobs_data = []

# Fetch and process job data for each workflow run
for url in urls:
    jobs = fetch_jobs_data(url)
    
    # Process job data and append it to the list
    for job in jobs:
        all_jobs_data.append({
            "workflow_run_id": job['run_id'],
            "job_name": job['name'],
            "status": job['status'],
            "conclusion": job['conclusion'],
            "runtime_minutes": calculate_runtime(job)
        })

# Convert the list of job data to a DataFrame
df = pd.DataFrame(all_jobs_data)

# Inspect the DataFrame
print(df.head())

# Example of additional filtering or processing:
# Filter only jobs that start with "test-notebooks"
df_filtered = df[df['job_name'].str.startswith("test-notebooks")]

# Optionally, clean up job names
df_filtered['job_name'] = df_filtered['job_name'].str.replace("test-notebooks (examples/", "(", regex=False)


# Pivot the data to compare runtimes side-by-side
pivot_df = df_filtered.pivot(index="job_name", columns="workflow_run_id", values="runtime_minutes").reset_index()
pivot_df.columns = ['job_name', 'run_10406645270', 'run_10373389775']

# Calculate the difference between the runtimes
pivot_df['time_difference (minutes)'] = pivot_df['run_10373389775'] - pivot_df['run_10406645270']

# Display the data to the user
print(pivot_df)

@akeeste
Copy link
Contributor Author

akeeste commented Aug 19, 2024

@ssolson This PR doesn't yet address the underlying xarray slowdown, but it enables a workaround. The wave functions can currently be sped up by inputting xarray formats instead of pandas. However the parameter frequency_dimension was not added to all wave functions, which may limit xarray input to this functions and internal calls (typically calls to frequency_moment)

Most of the examples are still using pandas so this workaround is not utilized. Upcoming PRs #348 and after will address the underlying slowdown.

@ssolson
Copy link
Contributor

ssolson commented Aug 19, 2024

From our discussion this test addresses the slowness by allowing a workaround but does not solve the slowness. Additional PR to follow for full issue.

@ssolson ssolson merged commit efcda0a into MHKiT-Software:develop Aug 19, 2024
39 checks passed
@akeeste akeeste deleted the xarray_wave_resource branch August 27, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Xarray Strategy causing significant MHKiT computation times (MHKiT slow)
2 participants