Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kmeans clustering function #286

Merged
merged 1 commit into from
Feb 8, 2021
Merged

Conversation

oyvindeide
Copy link
Contributor

@oyvindeide oyvindeide commented Jan 11, 2021

Closes #241

@pytest.mark.parametrize("method", ["spearman_correlation", "auto_scale"])
@pytest.mark.parametrize(
"num_polynomials",
tuple(range(1, 5)) + (20, 100),
)
def test_misfit_preprocessor_n_polynomials(num_polynomials, method):
def test_misfit_preprocessor_n_polynomials(
Copy link
Contributor

@eivindjahren eivindjahren Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case, you get a cleaner implementation with hypothesis than with pytest parametrization.
Just a suggestion:

from hypothesis import given, assume
import hypothesis.strategies as st

clustering_functions = st.sampled_from(["hierarchical", "kmeans"])
methods = st.sampled_from(["spearman_correlation", "auto_scale"])

@given(st.integers(min_value=1, max_value=100), methods, clustering_functions)
def test_misfit_preprocessor_n_polynomials(
   num_polynomials, method, clustering_function
):
    if (
        clustering_function == "kmeans"
        and method == "spearman_correlation"
     ):
       assume(num_polynomials in [4,5]) # or more

@lars-petter-hauge
Copy link
Contributor

Had some minor comments, otherwise I think the implementation looks good!

Copy link
Contributor

@lars-petter-hauge lars-petter-hauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MPP] Running slow with large number of observations
3 participants