You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello thanks for publishing your fantastic work - I was wondering if can I use your encoding as a positional encoding step in the Vision transformer - the task is segmentation of 3D medical image , or the algorithm is created specifically for image reconstruction?
The text was updated successfully, but these errors were encountered:
Thanks for your interest. It is a general positional encoding method. It is the most 'ideal' one so the computation is vast. In this paper, we use the Kronecker property to speed it up. However, this only works when followed by a linear layer, which is true in transformer if you do some math. I think the problem is the output dimension. Here for image reconstruction, it's only 3 but for transformer it's much larger, so maybe it will be slow or out of memory (it also depends on the image size). What's more, from my understanding, positional encoding is not that important in vision transformers (I guess?). But if your method needs very good positional information, our method would be a good choice to try.
Thanks! yes it is not critical for the vision transformer some improvements can be observed in general, although here I am working on 3d medical images so in principle relative location is critical, ok Thanks I will experiment with it !
Hello thanks for publishing your fantastic work - I was wondering if can I use your encoding as a positional encoding step in the Vision transformer - the task is segmentation of 3D medical image , or the algorithm is created specifically for image reconstruction?
The text was updated successfully, but these errors were encountered: