Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with pickle #193

Open
PolyachenkoYA opened this issue Mar 28, 2021 · 4 comments
Open

problem with pickle #193

PolyachenkoYA opened this issue Mar 28, 2021 · 4 comments

Comments

@PolyachenkoYA
Copy link
Contributor

PolyachenkoYA commented Mar 28, 2021

The problem: .names doubles after interacting with pickle, cPickle and dill.

How to reproduce (test.txt):

import gromacs.formats as gmx
import os
import pickle

def load_xvg(filepath, failsafe=2):
    xvg_file = gmx.XVG()
    xvg_file.read(filepath)
    return xvg_file

def test_fnc(xvg_file):
    print('names: ', xvg_file.names)
    print('shape: ', xvg_file.array.shape)
    print('names after: ', xvg_file.names)    
    print('shape after: ', xvg_file.array.shape)
    
xvg_filepath = 'test.txt'
pickle_filepath = 'test.pkl'

xvg_file = load_xvg(xvg_filepath)
print('names out: ', xvg_file.names)
print('shape out: ', xvg_file.array.shape)
test_fnc(xvg_file)

pickle.dump(xvg_file, open(pickle_filepath, 'wb'))
xvg_file = pickle.load(open(pickle_filepath, 'rb'))
print('\n2 out: ', xvg_file.names)
test_fnc(xvg_file)

os.remove(pickle_filepath)

gives

names out:  ['Temperature']
shape out:  (2, 2)
names:  ['Temperature']
shape:  (2, 2)
names after:  ['Temperature']
shape after:  (2, 2)

2 out:  ['Temperature']
names:  ['Temperature']
shape:  (2, 2)
names after:  ['Temperature', 'Temperature']
shape after:  (2, 2)

So, something wrong happens on the stage of either dumping or loading. It's also the case for cPickle and dill libs.

@orbeckst
Copy link
Member

orbeckst commented Jun 8, 2021

I think the following happens after unpickling

  1. x.names shows the content that it had when pickled
  2. x.shape triggers x.parse() (because by default savedata=False) so that the array is re-read from disk; on line
    if line.startswith("@ s") and "subtitle" not in line:
    name = line.split("legend ")[-1].replace('"','').strip()
    self.names.append(name)
    the column names are appended to x.names. Now the column name appears twice.

If you try with

def load_xvg(filepath, failsafe=2):
    xvg_file = gmx.XVG(savedata=True)
    xvg_file.read(filepath)
    return xvg_file

then does the same problem appear? I would think not because in this case, the array itself would be included in the pickle file (which might make it big) and the re-reading from disk won't be triggered.

@orbeckst
Copy link
Member

orbeckst commented Jun 8, 2021

@PolyachenkoYA I am sorry that I haven't replied to this issue sooner. Please feel free to ping me with @orbeckst when you open another issue or PR: create the issue, then add a comment with "can you please have a look @orbeckst " ... and then I get at least a notification. Otherwise it's too difficult to keep track of every open source project that I have code in. Thank you!! ❤️

@jandom
Copy link
Collaborator

jandom commented Nov 11, 2023

There hasn't been any activity here – can we close this one out @orbeckst?

@orbeckst
Copy link
Member

I don't really know if that's something that needs to be changed or just documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants