Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some more work which might be relevant #2

Open
EndingCredits opened this issue Jan 26, 2018 · 0 comments
Open

Some more work which might be relevant #2

EndingCredits opened this issue Jan 26, 2018 · 0 comments

Comments

@EndingCredits
Copy link

EndingCredits commented Jan 26, 2018

Hi,

In your paper, one of the important points you make is that your method is O(n) whereas Relation nets are O(n^2).

In fact, if we write both architectures in a (very) simplified way (and ignoring certain complications such as softmax), we can see that RMNs are indeed a slight variation on relation nets, which prevent this problem:

Relation net:
r(X) = sum_i(sum_j(f_1(f_2(x_i), f_3(x_j)))
For some f_1, f_2, f_3.
(In the case of the paper, f_2 and f_3 are the identity, and f_1 is an MLP on the concatenation of the inputs, which is additionally parameterized by p)

FMN (2-stage):
r(X) = sum_i(g_1(g_2(x_i), sum_j(g_3(x_j)))) i.e. we move the sum inside the bracket so we can re-use it for all i
For some g_1, g_2, g_3.
(In this case g_3 is attention with parameter p, g_1 is another attention, and g_2 is identity)

Note: in both cases we can change the function r(X) into a transformation of the individual elements, rather than a representation of the set, by simply not taking the outer sum. This allows us to stack multiple layers on top of each other, as you do with your n-stage FMN.

While this similarity might seem superficial, since very different functions are used for f_1,f_2,f_3 and g_1,g_2,g_3, it turns out a lot of other architectures exist which also fall into this pattern. In particular, the self-attention used in https://arxiv.org/abs/1706.03762 is a better comparison to your architecture, as it uses attention mechanisms.

There's also been a few pieces of work done which would fall into this second category, in particular this paper https://arxiv.org/abs/1703.06114 . I've also written a bit about these variants (as well as relation nets) here. It's possible some of the notes in these might be useful, and it may be interesting to compare your architecture to theirs on their problems (or try theirs on yours), as ultimately they solve the same problem, but with a large difference in architecture.

I hope that all made sense. I have some block diagrams of various architectures that might make things clearer, although they don't really show much on the conceptual level.

Best wishes,

Will

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant