Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Ensembling multiple trajectories with PDELibrary and WeakPDELibrary #148

Closed
znicolaou opened this issue Jan 10, 2022 · 13 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@znicolaou
Copy link
Collaborator

The PDELibrary and WeakPDELibrary libraries need to reshape the data appropriately to take derivatives and integrals along specific axes when calculating the library matrix. When using multiple_trajectories=True in the SINDy.fit function, we need to inform the library that multiple trajectories are present so the reshaping can be done correctly. I implemented a fix by adding a self.num_trajectories variable to the libraries, which is set in the SINDy.fit function when multiple_trajectories=True. This works when ensemble=False in the SINDy.fit function, but I have not yet implemented the ensembling case. The subset selection should probably be done independently on each trajectory in the multiple_trajectories=True case.

@znicolaou znicolaou added the bug Something isn't working label Jan 10, 2022
@znicolaou znicolaou self-assigned this Jan 10, 2022
@akaptano
Copy link
Collaborator

Dang, I thought I had covered this. Could you post an example that produces an error? Thanks for looking into this!

@znicolaou
Copy link
Collaborator Author

No problem! I was playing with some experimental data, but I'll put together a simpler example that we can work off of and post it here in a bit. I think it won't take much to get the ensembling working--there just needs to be some additional things done in the case of multiple_trajectories in there somewhere.

@znicolaou
Copy link
Collaborator Author

Here's a short example for testing.

integrator_keywords = {}
integrator_keywords['rtol'] = 1e-12
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-12

def diffuse (t, u, dx, nx):
    u = np.reshape(u, nx)
    du = ps.differentiation.SpectralDerivative(d=2, axis=0)._differentiate(u, dx)
    return np.reshape(u * du, nx)

N = 200
h0 = 1.0
L = 5
T = 1

t = np.linspace(0, T, N)
x = np.arange(0, N) * L / N

ep = 0.5 * h0
y0 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x) 
                + ep * np.cos(2 * np.pi / L * x), N)
y1 = np.reshape(h0 + ep * np.cos(4 * np.pi / L * x) 
                - ep * np.cos(2 * np.pi / L * x), N)
dx = x[1] - x[0]

sol1 = solve_ivp(diffuse, (t[0], t[-1]), 
                y0=y0, t_eval=t, args=(dx, N),
                **integrator_keywords)
sol2 = solve_ivp(diffuse, (t[0], t[-1]), 
                y0=y1, t_eval=t, args=(dx, N),
                **integrator_keywords)

u = [np.reshape(sol1.y, (N, N, 1)),np.reshape(sol2.y, (N, N, 1))]


library_functions = [lambda x: x]
library_function_names = [lambda x: x]

pde_lib = ps.PDELibrary(
    library_functions=library_functions,
    function_names=library_function_names,
    derivative_order=2,
    spatial_grid=x,
    is_uniform=True,
)

X, T = np.meshgrid(x, t, indexing='ij')
XT = np.transpose([X, T], [1, 2, 0])

weak_lib = ps.WeakPDELibrary(
    library_functions=library_functions,
    function_names=library_function_names,
    derivative_order=2,
    spatiotemporal_grid=XT,
    K=100,
    is_uniform=False,
    num_pts_per_domain=30,
)

np.random.seed(100)

#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()

#works fine
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=False)
model.print()

#Runs poorly
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()

#ValueError: cannot reshape array of size 80000 into shape (200,200,1)
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True, n_subset=len(t))
model.print()

#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=pde_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()

#AttributeError: 'list' object has no attribute 'shape'
optimizer = ps.STLSQ(threshold=0.1, alpha=1e-5, normalize_columns=False)
model = ps.SINDy(feature_library=weak_lib, optimizer=optimizer, feature_names=['u'])
model.fit(u, multiple_trajectories=True, t=t, ensemble=True)
model.print()

The fit works for multiple_trajectories=True, ensemble=False but not for multiple_trajectories=True, ensemble=True. I think the fix should not be too complicated--around line 440 of pysindy.py, there should be some if multiple_trajectories statements that loop over trajectories and do the ensembling on each trajectory.

@akaptano
Copy link
Collaborator

Cool, I'm happy to take a look at this tomorrow!

@akaptano
Copy link
Collaborator

Did you already make changes to the code? Even the first model throws an error for me

@znicolaou
Copy link
Collaborator Author

Oh yeah, sorry that was totally unclear. I did push a couple changes to master yesterday to implement the multiple_trajectories=True, ensemble=False cases. The old code throws the errors because it previously didn't distinguish the spatiotemporal data for multiple trajectories and ran into reshaping errors when trying to take derivatives/integrals. So I added variables self.num_trajectories to the WeakPDELibrary and PDELibrary classes to store that information when pysindy.fit is invoked, and then looped over each trajectory in the transform functions.

@akaptano
Copy link
Collaborator

Got this working... I'll push the changes in a moment but my guess is there will be some weird edge cases left to find at some point. Added your example script to the tests in test_pysindy.py.

@akaptano
Copy link
Collaborator

Some notes on the changes:

  1. The weak formulation is really tricky. Basically, in order to get ensembling + multiple trajectories using the weak form, you would need to duplicate the temporal_grid several times and make this the new, longer temporal_grid. This doesn't work because the grid is required to be strictly ascending for the interpolation (also it makes the full spatiotemporal grid much larger). Instead, I settled on sub-sampling each of the trajectories on the normal temporal_grid, and currently each of the trajectories gets sub-sampled in the same way. In other words, if time indices [1, 10, 3] are chosen, trajectories 1, 2, 3, ... all only use these three indices and are stitched together.
  2. It seems like there is a better way to do the num_trajectories thing in the PDE libraries, but I'm not sure what it is.

@znicolaou
Copy link
Collaborator Author

Nice--thanks for looking so quickly! I agree that we may need to test a bit to discover other potential issues, and there are probably cleaner solutions for both PDELibrary and WeakPDELibrary.

In the PDELibrary case, part of the challenge is that it's possible to deduce num_trajectories*num_time = n_sample // np.product(self.grid_dims), but not num_trajectories and num_time independently as far as I can tell. I'll think about whether there's another option too.

I'll keep testing on this issue as I use the package, so we may as well keep it open for now and post updates as we see them.

@akaptano
Copy link
Collaborator

akaptano commented Jan 11, 2022

Well it compiled fine on my computer but this is a bizarre (and new?) error while running the linting tests online, although looks like it has to do with some git backend. I'll take a look at this in a bit

@akaptano
Copy link
Collaborator

apparently today is the day where GitHub finally updates its security protocols https://github.blog/2021-09-01-improving-git-protocol-security-github/#when-are-these-changes-effective. I'll see if I can fix this for the repo.

@znicolaou
Copy link
Collaborator Author

Ah, confusing. We'll figure it out...

@akaptano
Copy link
Collaborator

Okay fixed that and think I fixed this bug so closing this for now. Feel free to open again when we invariably find some more edge cases. Hopefully this is good enough for your coarse grained models for now!

jpcurbelo pushed a commit to jpcurbelo/pysindy_fork that referenced this issue May 9, 2024
* deleted the prediction_interval_plots since it was redundant
* minor typos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants