qestion segmentation #1

jakubMitura14 · 2023-09-07T13:04:38Z

Hello thanks for publishing your fantastic work - I was wondering if can I use your encoding as a positional encoding step in the Vision transformer - the task is segmentation of 3D medical image , or the algorithm is created specifically for image reconstruction?

osiriszjq · 2023-09-08T12:26:44Z

Thanks for your interest. It is a general positional encoding method. It is the most 'ideal' one so the computation is vast. In this paper, we use the Kronecker property to speed it up. However, this only works when followed by a linear layer, which is true in transformer if you do some math. I think the problem is the output dimension. Here for image reconstruction, it's only 3 but for transformer it's much larger, so maybe it will be slow or out of memory (it also depends on the image size). What's more, from my understanding, positional encoding is not that important in vision transformers (I guess?). But if your method needs very good positional information, our method would be a good choice to try.

jakubMitura14 · 2023-09-08T13:42:34Z

Thanks! yes it is not critical for the vision transformer some improvements can be observed in general, although here I am working on 3d medical images so in principle relative location is critical, ok Thanks I will experiment with it !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qestion segmentation #1

qestion segmentation #1

jakubMitura14 commented Sep 7, 2023

osiriszjq commented Sep 8, 2023

jakubMitura14 commented Sep 8, 2023

qestion segmentation #1

qestion segmentation #1

Comments

jakubMitura14 commented Sep 7, 2023

osiriszjq commented Sep 8, 2023

jakubMitura14 commented Sep 8, 2023