Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

caowencai · 2017-11-25T13:00:19Z

To handle with the stack overflow problem, all my parameters in gpufit are initialized in the way like

float *initial_parameters = new float [n_fits * n_model_parameters] ();

Then I found the n_fits was limited in a extent : when the n_fits was large enough(like 1000,0000 in my case),the gpufitted output_parameters turned out to be all-zero, but it worked as expected when n_fits was set to 100,0000. It is hard to figure out. By the way, the PC memory is OK.

superchromix · 2017-11-26T14:15:51Z

Please post a sample program which reproduces the error.

caowencai · 2017-11-27T11:19:28Z

Finally get it.
Reason:
available gpu memory is set too narrow, so it is easy to get the error "maximum user info size exceeded" which should have been printed out explicitly in throw std::runtime_error("maximum user info size exceeded").
Solution:
turn up available_gpu_memory_ =double(free_bytes) * 0.5
void Info::get_gpu_properties() at info.cu, line 14, available_gpu_memory_ = std::size_t(double(free_bytes) * 0.1);

jkfindeisen · 2017-11-27T11:27:55Z

@adrianjp88 Should this happen? I just thought then it would use smaller chunks of fits, so as long as at least the data of one fit fits into the available GPU memory it should run fine, shouldn't it?

superchromix · 2017-11-27T11:59:15Z

@gittry We cannot understand your issue without a complete example code which reproduces the problem.

Discussion at issue gpufit#31 Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough. In this example, it is bound up with available_gpu_memory_ = std::size_t(double(free_bytes) * 0.1) in line 14, info.cu.al model

caowencai · 2017-11-28T07:46:36Z

The example is pulled at #33.
The user_info data is at https://drive.google.com/open?id=1M4TnXf3TQex3LFeEkXArlTB5GYvDzIKY, 488 MB, which cannot be uploaded by pulling.

In line 14, info.cu, available_gpu_memory_ =std::size_t(double(free_bytes) * 0.1)
changing 0.1 to 0.7 solved my problem.

mscipio · 2017-11-28T08:35:34Z

Maybe it's not relevant, but am I wrong when I say that you are using user_info to pass the kernel your data, and data to pass the time vector (the same for all your measurements)?
Because then inside the kernel you keep using user_info as independent variable (in the if-else block at the beginning ) and data as what you want to fit ...

mscipio · 2017-11-28T11:26:48Z

I have a question that is somehow related to this issue ... let me know if you prefer me to open a new issue.

I succeeded in implementing my compartmental model as per issues #27 and #30 and now I was experimenting with increasing number of parallel fits. What I discovered is that if I use a n_fit that is greater then my max_chunck_size, when the library tries to allocate gpu memory for the second chunck I get the following error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  invalid argument
Aborted

That is thrown by void GPUData::init at:

write(
        parameters_,
        &initial_parameters[chunk_index_*info_.max_chunk_size_*info_.n_parameters_],
        chunk_size_ * info_.n_parameters_);

Quite surprisingly, the exact same command run a couple of lines above to write "data_" to the GPU memory completes just fine.

I checked and this error happens with both my new model and with original models you implemented.
Can it be an issue with my GPU? I would find it hard to believe but I spent the entire morning trying to find a solution without success, so far ...

superchromix · 2017-11-28T11:44:23Z

@mscipio
Please post an example code which reproduces the error.

mscipio · 2017-11-28T11:59:55Z

That's not going to be easy because I am working on Linux, so I did some changes to the code to make it compile and it's no longer compatible with the version in this repo.

If you say that in your version you don't have an issue of this kind, I will try (T.T) to trace back all the differences hoping to find MY mistake along the way.

You don't need an example code from me to test it out with your code: just pick on of the examples (like Linear_Regression_Example.cpp and increase A LOT n_fit so that the problem doesn't fit you GPU in one chunk)

EDIT:
I just saw that you merged your version with @jkfindeisen 's one, so maybe I really should go back and revert to a compatible version to be in line with you. Is the current version able to compile under linux?

superchromix · 2017-11-28T12:37:32Z

@mscipio It is not clear to me what is the error you are reporting. In the manuscript we have tested Gpufit with up to 10^8 fits per function call. This is significantly larger than the maximum number of fits that can be processed simultaneously on the GPU.

You are making modifications to the core of the Gpufit code, so introducing changes there could easily lead to bugs. Why do you need to make changes to Gpudata::Init?

mscipio · 2017-11-28T12:43:17Z

@superchromix
Yes, you are right obviously, I am sorry for bothering.

The changes I made (I wasn't trying to modify Gpudata::Init, anyway) were meant to debug my new kernel (and just a C++ implementation was not enough). Now I just cloned back your current version of the library and will go on working on this one. I just checked and I don't have that issue with Linear_Regression_Example.cpp, so I guess it was something I made to cause the error.

I will check my new model in this current branch asap and eventually open a pull request if you are interested in having it.
Thanks.

superchromix · 2017-11-28T13:02:54Z

@gittry
It looks like you are using the user_info incorrectly. Your experimental data should not be passed to Gpufit through the user_info parameter. The user_info should be used to store independent variables.

caowencai · 2017-11-29T00:54:28Z

@superchromix
I follow the doc that

custom x positions for the data points of every fit, stored in user_info

The experimental data is just Unique X coordinate values for each fit stored in float type.

Then how to make clear that I get proper results when n_fits or available_gpu_memory_ is set properly in some extent, otherwise, the output turns out all zero.

superchromix · 2017-12-01T01:51:01Z

@gittry
What data are you fitting? In your code, I see some values loaded from a file, and some that are set manually (hard coded) in the program. What is being loaded from the file? What are you storing in user_info?

caowencai · 2017-12-04T07:48:32Z

@superchromix
The loaded data is served as X-coordinates ,which stored in user_info; the data parameter in the code, set manually, is served as Y-coordinates. Both are the sampling data and each sampling gets 8 points, 4 parameters to fit the curve in the formula y = ae^(bx)+ce^(dx), where a, b, c and d are parameters to fit.

At my issue, I set the original variable size at
available_gpu_memory_ =std::size_t(double(free_bytes) * 0.1) in line 14, info.cu
bt changing 0.1 to 0.7 and solved my problem.
Otherwise, all output_parameters turns zero.

superchromix · 2018-03-29T11:31:51Z

We have updated the memory GPU memory management in the latest versions of Gpufit, to allow for larger user_info sizes. This should address this issue.

jkfindeisen added the bug label Nov 27, 2017

caowencai changed the title ~~Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough~~ Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough[solved] Nov 27, 2017

caowencai changed the title ~~Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough[solved]~~ Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough Nov 27, 2017

superchromix removed the bug label Nov 27, 2017

caowencai mentioned this issue Nov 28, 2017

Example for issue #31 #33

Closed

superchromix closed this as completed Mar 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

caowencai commented Nov 25, 2017 •

edited

Loading

superchromix commented Nov 26, 2017

caowencai commented Nov 27, 2017 •

edited

Loading

jkfindeisen commented Nov 27, 2017

superchromix commented Nov 27, 2017 •

edited

Loading

caowencai commented Nov 28, 2017 •

edited

Loading

mscipio commented Nov 28, 2017 •

edited

Loading

mscipio commented Nov 28, 2017

superchromix commented Nov 28, 2017

mscipio commented Nov 28, 2017 •

edited

Loading

superchromix commented Nov 28, 2017

mscipio commented Nov 28, 2017

superchromix commented Nov 28, 2017

caowencai commented Nov 29, 2017 •

edited

Loading

superchromix commented Dec 1, 2017

caowencai commented Dec 4, 2017 •

edited

Loading

superchromix commented Mar 29, 2018

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

Output_parameters turn out to be all-zero inexplicably when n_fits is set up to large enough #31

Comments

caowencai commented Nov 25, 2017 • edited Loading

superchromix commented Nov 26, 2017

caowencai commented Nov 27, 2017 • edited Loading

jkfindeisen commented Nov 27, 2017

superchromix commented Nov 27, 2017 • edited Loading

caowencai commented Nov 28, 2017 • edited Loading

mscipio commented Nov 28, 2017 • edited Loading

mscipio commented Nov 28, 2017

superchromix commented Nov 28, 2017

mscipio commented Nov 28, 2017 • edited Loading

superchromix commented Nov 28, 2017

mscipio commented Nov 28, 2017

superchromix commented Nov 28, 2017

caowencai commented Nov 29, 2017 • edited Loading

superchromix commented Dec 1, 2017

caowencai commented Dec 4, 2017 • edited Loading

superchromix commented Mar 29, 2018

caowencai commented Nov 25, 2017 •

edited

Loading

caowencai commented Nov 27, 2017 •

edited

Loading

superchromix commented Nov 27, 2017 •

edited

Loading

caowencai commented Nov 28, 2017 •

edited

Loading

mscipio commented Nov 28, 2017 •

edited

Loading

mscipio commented Nov 28, 2017 •

edited

Loading

caowencai commented Nov 29, 2017 •

edited

Loading

caowencai commented Dec 4, 2017 •

edited

Loading