Loss Functions #9

daquilnp · 2019-12-10T23:49:50Z

Hey again,
I had a few questions about the loss functions you used for the Localization net during training.

In the Out Of Image loss calculation you +/- 1.5 to the bbox instead of +/- 1 (like your paper), why do you do this?
Also why are you using corner coordinates for loss calculations?
Was the DirectionLoss used in your paper?

Bartzi · 2019-12-11T11:04:51Z

Good questions 😉

you are right, we set this value to 1.5 during some of our experiments, in order to allow the network to predict values that are a little outside of the image, but this did not change much, so using 1 or 1.5 does not really matter.
using the corner coordinates saves us some computation time, but gets us the same result
Yes, I think we used DirectionLoss, it is not necessary to achieve the results, but it keeps the network from manouvering into a bad state, where it predicts regions of interest that show a mirrored character.

Does that answer your questions?

daquilnp · 2019-12-11T15:13:52Z

Yesthat answers everything, thank you! :) I assumed using corner coordinates was to save computation time but I wanted to make sure. Also, what accuracy did you get on the SynthText validation set?

Bartzi · 2019-12-11T15:29:41Z

Happy, I could answer your questions!
We got about 91% validation accuracy on the SynthText validation set.

daquilnp · 2019-12-11T15:35:38Z

Awesome. Thank you again :) I'll try and aim for a similar accuracy, although I also cannot get the SynthAdd dataset (the authors of dataset have not been monitoring their issues :S)

daquilnp · 2019-12-11T21:36:34Z

Follow up question. When you say 91% do you mean percentage of correct characters or percentage of correct words? And does that include case sensitivity?

Bartzi · 2019-12-12T09:43:08Z

91% is the case insensitive word accuracy, should have told immediately 😅

daquilnp · 2019-12-19T19:25:14Z

Hello @Bartzi,
Im currently looking at the output of the chainer local net with the pretrained model.

I've noticed that the bounding boxes find characters in images from right to left. Is that what is supposed to happen?
I've also noticed theres a lot of overlap between the characters. Do you remove the duplicates some way?

Bartzi · 2019-12-20T11:49:19Z

The predictions of the characters from right to left is one of the interesting things the model does on its own. It is learning by itself which reading direction to use, as such right to left is perfectly acceptable.
I also think that this is a better choice for the network, since it essentially is a sequence-to-sequence model and it operates like a stack.

Yes, there is a lot of overlap and this is also intended. There is no need to remove the duplicates. This is what the transformer is for. The encoder takes all features from the rois and hands them to the decoder, which then predicts the characters without overlap.

daquilnp · 2019-12-20T14:51:02Z

Ok, that make sense. I just wanted to make sure I was running it correctly.

As for overlapping, I am aware that the transformer's decoder is meant to remove duplicates. However, I was testing the pretrained recognition model on this image from the Synth validation dataset

And the result from the decoder was: :::::::fffeeeerrrXXXXXX

Bartzi · 2019-12-20T14:54:31Z

Interesting... do you have some code that I could have a look at?

daquilnp · 2019-12-20T16:52:21Z

Ok very strange. I cleaned up my code to send to you. When I ran it, I got the correct result. I might have introduced an error in my original implementation and fixed it during the clean up. It looks like everything works as expected. I am getting result: Xref: :)

Bartzi · 2019-12-20T16:57:29Z

ah, good 😉

daquilnp · 2019-12-20T22:06:02Z

For future reference. The issue arises if you mix up num_chars and num_words. Intuitively, num_chars should be 23 and num_words should be 1, but for some reason in my npz they were reversed.

Bartzi · 2019-12-23T08:58:40Z

Yeah, that's right! It is interesting, though that the model still provides a good prediction if you set those two numbers wrongly.

borisgribkov · 2021-05-28T13:01:26Z

@Bartzi First of all, thanks for your code! Regarding num_chars and num_words in *.npz, I checked synthadd.npz and mjsynth.npz, in both cases num_chars = 1 and num_words = 23. Intuitively it should be swapped, is this correct? I have tried it, but got an error in Reshape layer. Thank you!

Bartzi · 2021-05-28T13:06:14Z

Yes, this is actually intended 😅
Our original work came from the idea that we want to extract one box per word with multiple characters. However, we thought what if we only have a single word, but want to localize individual characters?
The simplest solution is to redefine the way you look at it. Now we want to find a maximum of 23 words (each character is defined to be a single word) with one character each.

This is the way you have to think about it.

borisgribkov · 2021-05-28T13:13:27Z

I see, it's clear now! Thank you!

borisgribkov · 2021-06-02T09:21:01Z

Dear @Bartzi , sorry to disturb you, another question. According to your paper Localization network try to find and "crop" individual characters, for example FOOTBALL word at the Fig.1. In my case I see another behavior, looks like Localization network crops the regions with sets of characters and moreover these regions are significantly overlapped. Please see the example below. As far as I understand there is no limitation for that, whole system can work like this, but I'm a bit confused because of different behavior. Thank you!

PS training is converged with 96% of accuracy, so my model works fine!

Bartzi · 2021-06-02T11:08:21Z

Hmm, it seems to me that the localization network never felt the need to converge to localize individual characters as the task for the recognition network was too simple.
You could try a very simple trick: Start a new train run, but instead of random initialization of all parameters, you load the pre-trained weights of the localizer. In this way the localizer is encouraged to improve again because the recognition network behaves badly.

We did this in previous work and it worked very well in such cases.

Bartzi · 2021-06-02T11:10:29Z

You could also try to lower the learning rate of the recognition network to encourage the localization network to try harder to make t easier for the recognition network.

borisgribkov · 2021-06-02T11:20:30Z

Thank you! using pre-trained weights looks very promising, will try! Also, I was thinking about the image above too, you are right, the recognition task is very simple - license plate recognition sample, so no curved or some other complicated text at all, basically no need to apply an array of affine matrices, only one for the whole image is enough, maybe this is the reason.

Bartzi · 2021-06-02T11:23:11Z

Yes, it might not be necessary to use the affine matrices. You could also just train the recognition network on patches you extracted from a regular sliding window. So basically our model without the localization network where you provide the input to the recognition network yourself, using a simple and regular sliding window approach.

borisgribkov · 2021-06-02T11:26:33Z

Thank you!

borisgribkov · 2021-06-03T21:20:18Z

Hi @Bartzi Thank you for the good advise, usage of pre-trained Localizer weights helps a lot!

and final accuracy is about 2% better

Bartzi · 2021-06-04T08:08:23Z

Nice, that's good to hear. And the image looks the way it is supposed to 👍

daquilnp closed this as completed Dec 11, 2019

daquilnp reopened this Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss Functions #9

Loss Functions #9

daquilnp commented Dec 10, 2019

Bartzi commented Dec 11, 2019

daquilnp commented Dec 11, 2019

Bartzi commented Dec 11, 2019

daquilnp commented Dec 11, 2019

daquilnp commented Dec 11, 2019 •

edited

Loading

Bartzi commented Dec 12, 2019

daquilnp commented Dec 19, 2019 •

edited

Loading

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019 •

edited

Loading

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019

Bartzi commented Dec 23, 2019

borisgribkov commented May 28, 2021

Bartzi commented May 28, 2021

borisgribkov commented May 28, 2021

borisgribkov commented Jun 2, 2021 •

edited

Loading

Bartzi commented Jun 2, 2021

Bartzi commented Jun 2, 2021

borisgribkov commented Jun 2, 2021

Bartzi commented Jun 2, 2021 •

edited

Loading

borisgribkov commented Jun 2, 2021

borisgribkov commented Jun 3, 2021

Bartzi commented Jun 4, 2021

Loss Functions #9

Loss Functions #9

Comments

daquilnp commented Dec 10, 2019

Bartzi commented Dec 11, 2019

daquilnp commented Dec 11, 2019

Bartzi commented Dec 11, 2019

daquilnp commented Dec 11, 2019

daquilnp commented Dec 11, 2019 • edited Loading

Bartzi commented Dec 12, 2019

daquilnp commented Dec 19, 2019 • edited Loading

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019 • edited Loading

Bartzi commented Dec 20, 2019

daquilnp commented Dec 20, 2019

Bartzi commented Dec 23, 2019

borisgribkov commented May 28, 2021

Bartzi commented May 28, 2021

borisgribkov commented May 28, 2021

borisgribkov commented Jun 2, 2021 • edited Loading

Bartzi commented Jun 2, 2021

Bartzi commented Jun 2, 2021

borisgribkov commented Jun 2, 2021

Bartzi commented Jun 2, 2021 • edited Loading

borisgribkov commented Jun 2, 2021

borisgribkov commented Jun 3, 2021

Bartzi commented Jun 4, 2021

daquilnp commented Dec 11, 2019 •

edited

Loading

daquilnp commented Dec 19, 2019 •

edited

Loading

daquilnp commented Dec 20, 2019 •

edited

Loading

borisgribkov commented Jun 2, 2021 •

edited

Loading

Bartzi commented Jun 2, 2021 •

edited

Loading