Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

Closed
caowencai opened this issue Nov 25, 2017 · 16 comments

Comments

@caowencai
Copy link

caowencai commented Nov 25, 2017

To handle with the stack overflow problem, all my parameters in gpufit are initialized in the way like

float *initial_parameters = new float [n_fits * n_model_parameters] ();

Then I found the n_fits was limited in a extent : when the n_fits was large enough(like 1000,0000 in my case),the gpufitted output_parameters turned out to be all-zero, but it worked as expected when n_fits was set to 100,0000. It is hard to figure out. By the way, the PC memory is OK.

@superchromix
Copy link
Collaborator

Please post a sample program which reproduces the error.

@caowencai
Copy link
Author

caowencai commented Nov 27, 2017

Finally get it.
Reason:
available gpu memory is set too narrow, so it is easy to get the error "maximum user info size exceeded" which should have been printed out explicitly in throw std::runtime_error("maximum user info size exceeded").
Solution:
turn up available_gpu_memory_ =double(free_bytes) * 0.5
void Info::get_gpu_properties() at info.cu, line 14, available_gpu_memory_ = std::size_t(double(free_bytes) * 0.1);

@jkfindeisen
Copy link
Collaborator

@adrianjp88 Should this happen? I just thought then it would use smaller chunks of fits, so as long as at least the data of one fit fits into the available GPU memory it should run fine, shouldn't it?

@caowencai caowencai changed the title Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough[solved] Nov 27, 2017
@caowencai caowencai changed the title Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough[solved] Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough Nov 27, 2017
@superchromix
Copy link
Collaborator

superchromix commented Nov 27, 2017

@gittry We cannot understand your issue without a complete example code which reproduces the problem.

@superchromix superchromix removed the bug label Nov 27, 2017
caowencai pushed a commit to caowencai/Gpufit that referenced this issue Nov 28, 2017
Discussion at issue gpufit#31
Output_parameters turn out to be all-zero inexplicably when n_fits is
set up to large enough.
In this example, it is bound up with available_gpu_memory_ =
std::size_t(double(free_bytes) * 0.1) in line 14, info.cu.al model
@caowencai
Copy link
Author

caowencai commented Nov 28, 2017

The example is pulled at #33.
The user_info data is at https://drive.google.com/open?id=1M4TnXf3TQex3LFeEkXArlTB5GYvDzIKY, 488 MB, which cannot be uploaded by pulling.

In line 14, info.cu, available_gpu_memory_ =std::size_t(double(free_bytes) * 0.1)
changing 0.1 to 0.7 solved my problem.

@mscipio
Copy link

mscipio commented Nov 28, 2017

Maybe it's not relevant, but am I wrong when I say that you are using user_info to pass the kernel your data, and data to pass the time vector (the same for all your measurements)?
Because then inside the kernel you keep using user_info as independent variable (in the if-else block at the beginning ) and data as what you want to fit ...

@mscipio
Copy link

mscipio commented Nov 28, 2017

I have a question that is somehow related to this issue ... let me know if you prefer me to open a new issue.

I succeeded in implementing my compartmental model as per issues #27 and #30 and now I was experimenting with increasing number of parallel fits. What I discovered is that if I use a n_fit that is greater then my max_chunck_size, when the library tries to allocate gpu memory for the second chunck I get the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  invalid argument
Aborted

That is thrown by void GPUData::init at:

write(
        parameters_,
        &initial_parameters[chunk_index_*info_.max_chunk_size_*info_.n_parameters_],
        chunk_size_ * info_.n_parameters_);

Quite surprisingly, the exact same command run a couple of lines above to write "data_" to the GPU memory completes just fine.

I checked and this error happens with both my new model and with original models you implemented.
Can it be an issue with my GPU? I would find it hard to believe but I spent the entire morning trying to find a solution without success, so far ...

@superchromix
Copy link
Collaborator

@mscipio
Please post an example code which reproduces the error.

@mscipio
Copy link

mscipio commented Nov 28, 2017

That's not going to be easy because I am working on Linux, so I did some changes to the code to make it compile and it's no longer compatible with the version in this repo.

If you say that in your version you don't have an issue of this kind, I will try (T.T) to trace back all the differences hoping to find MY mistake along the way.

You don't need an example code from me to test it out with your code: just pick on of the examples (like Linear_Regression_Example.cpp and increase A LOT n_fit so that the problem doesn't fit you GPU in one chunk)

EDIT:
I just saw that you merged your version with @jkfindeisen 's one, so maybe I really should go back and revert to a compatible version to be in line with you. Is the current version able to compile under linux?

@superchromix
Copy link
Collaborator

@mscipio It is not clear to me what is the error you are reporting. In the manuscript we have tested Gpufit with up to 10^8 fits per function call. This is significantly larger than the maximum number of fits that can be processed simultaneously on the GPU.

You are making modifications to the core of the Gpufit code, so introducing changes there could easily lead to bugs. Why do you need to make changes to Gpudata::Init?

@mscipio
Copy link

mscipio commented Nov 28, 2017

@superchromix
Yes, you are right obviously, I am sorry for bothering.

The changes I made (I wasn't trying to modify Gpudata::Init, anyway) were meant to debug my new kernel (and just a C++ implementation was not enough). Now I just cloned back your current version of the library and will go on working on this one. I just checked and I don't have that issue with Linear_Regression_Example.cpp, so I guess it was something I made to cause the error.

I will check my new model in this current branch asap and eventually open a pull request if you are interested in having it.
Thanks.

@superchromix
Copy link
Collaborator

@gittry
It looks like you are using the user_info incorrectly. Your experimental data should not be passed to Gpufit through the user_info parameter. The user_info should be used to store independent variables.

@caowencai
Copy link
Author

caowencai commented Nov 29, 2017

@superchromix
I follow the doc that

custom x positions for the data points of every fit, stored in user_info

The experimental data is just Unique X coordinate values for each fit stored in float type.

Then how to make clear that I get proper results when n_fits or available_gpu_memory_ is set properly in some extent, otherwise, the output turns out all zero.

@superchromix
Copy link
Collaborator

@gittry
What data are you fitting? In your code, I see some values loaded from a file, and some that are set manually (hard coded) in the program. What is being loaded from the file? What are you storing in user_info?

@caowencai
Copy link
Author

caowencai commented Dec 4, 2017

@superchromix
The loaded data is served as X-coordinates ,which stored in user_info; the data parameter in the code, set manually, is served as Y-coordinates. Both are the sampling data and each sampling gets 8 points, 4 parameters to fit the curve in the formula y = ae^(bx)+ce^(dx), where a, b, c and d are parameters to fit.

At my issue, I set the original variable size at
available_gpu_memory_ =std::size_t(double(free_bytes) * 0.1) in line 14, info.cu
bt changing 0.1 to 0.7 and solved my problem.
Otherwise, all output_parameters turns zero.

@superchromix
Copy link
Collaborator

We have updated the memory GPU memory management in the latest versions of Gpufit, to allow for larger user_info sizes. This should address this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants