Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a node marked as lightweight throws an exception, deadlock can occur. This should be documented or corrected #420

Closed
aggieNick02 opened this issue May 27, 2021 · 7 comments
Labels

Comments

@aggieNick02
Copy link

aggieNick02 commented May 27, 2021

I had a very simple multifunction node that just translated incoming messages to outgoing messages that I had marked as lightweight. If the incoming message was non-sensical, it would throw an exception. When this happened, the graph would sometimes deadlock with the previous node trying to queue an incoming message to the lightweight node.

In hindsight, I can understand why an exception from a lightweight node might be problematic, but if it isn't something oneTBB wants to fix, it should be documented with warnings similar to the caution boxes at

https://spec.oneapi.com/versions/0.5.0/oneTBB/flow_graph/lightweight_policy.html

@aleksei-fedotov
Copy link
Contributor

I have a suspicion what might go wrong. Can you tell please what value of concurrency do you specify for such a multifunction node?

@aleksei-fedotov
Copy link
Contributor

Please note that when exception is thrown the graph would not process any new message until its wait_for_all() method is called. Is this the behavior you observe?

@aggieNick02
Copy link
Author

So the multifunction node was serial. The behavior observed was that the wait_for_all method never returned or threw an exception (it should throw an exception). Simply removing the lightweight mark made the behavior correct. I'll post a short reproduction shortly.

@aggieNick02
Copy link
Author

Here's the reproduction. To make things simple, I just have two identical input nodes that both continually output 1 to an indexer node, which passes its output to a serial multifunction node marked lightweight. The multifunction node sleeps for 100ms just to ensure both input nodes have passed output to the indexer node, and then throws an exception. The flow graph hangs with the indexer node trying to pass output to the multifunction node forever, and wait_for_all never returns or throws.

Removing the lightweight designation causes behavior to become correct. wait_for_all throws as it should.

#include <iostream>
#include "tbb/flow_graph.h"
#include <mutex>
#include <tuple>

class InputNodeBody
{
public:
    int operator() (tbb::flow_control& fc)
    {
        return 1;
    }
};

using IndexerNodeType = tbb::flow::indexer_node<int,int>;
using MultifunctionNodeType = tbb::flow::multifunction_node<IndexerNodeType::output_type, std::tuple<int>, tbb::flow::lightweight>;

class MultifunctionNodeBody
{
public:
    void operator()(MultifunctionNodeType::input_type in, MultifunctionNodeType::output_ports_type p)
    {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        throw std::exception();
    }
};

int main()
{
    tbb::flow::graph g;
    tbb::flow::input_node input1(g, InputNodeBody());
    tbb::flow::input_node input2(g, InputNodeBody());
    IndexerNodeType indexer(g);
    tbb::flow::make_edge(input1, indexer);
    tbb::flow::make_edge(input2, tbb::flow::input_port<1>(indexer));
    MultifunctionNodeType multi(g, tbb::flow::serial, MultifunctionNodeBody());
    tbb::flow::make_edge(indexer, multi);
    input1.activate();
    input2.activate();
    try
    {
        g.wait_for_all();
    }
    catch (const std::exception&)
    {
        std::cout << "Exception from wait_for_all";
    }
}

@aleksei-fedotov
Copy link
Contributor

Thank you for the code, we were able to reproduce the described behavior on our side! Indeed, the implementation of the indexer_node is incorrect in this regard. We will work on the issue resolution.

@aggieNick02
Copy link
Author

Great to hear, thanks! Just let me know if you need anything else.

@isaevil
Copy link
Contributor

isaevil commented Sep 30, 2022

Looks like the issue was fixed by #623 but issue wasn't closed. Closing it.

@isaevil isaevil closed this as completed Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants