OCR from Recent Files view takes forever #235

aslmx · 2019-11-04T17:56:17Z

Version: 4.4.16

Bug report

Expected Behavior

OCR conversion takes the same time and works the same, no matter from where it is used

Current Behavior

When using OCR Tool from the "..." Menu on files in the Recent File view, it takes forever (>3min; i doubt it would ever finish tbh) to complete.

When using the "..." menu on a file and clicking "View in folder", the folder with the respective file opens.

When then using the OCR tool from the "..." Menu on the same file it completes within ~20 seconds.

Your Environment

OCR version used: 4.4.16
Browser Name and version: Firefox 70
Operating System and version (desktop or mobile): Linux Mint 19.3 64Bit
nextcloud version: (see admin page or version.php) 15.0.11

janis91 · 2020-01-21T13:17:24Z

Cannot reproduce with the newest version (4.6.0) which comes with integration of #244
If your issue still exists with that version, please consider adding screenshots maybe, such that it is easier to reproduce the behavior.

aslmx · 2020-01-22T06:58:25Z

So...

i had upgraded to recent NextCloud yesterday already (17.something) and i think i also upgraded to OCR 4.4.something in the process.
With this setup i had tried this morning with a new test file. One page, PNG from OpenNoteScanner App.

From Recent view: gave up after 2 minutes
From file folder view: conversion took ~40 seconds

Then i upgraded to most recent OCR App version, 4.6.1.
This time I started the stopwatch ...

From recent file view: gave up after 3 minutes
From folder view: still running.. 5 minutes now.

So for me it seems broken completely now :-|

I'll probably have some time at the weekend to give this some further testing if it helps you.
Tbh I have created a Thunar-Custom-Action based workflow with local installation of tesseract and the NC app was barely used in the last weeks. But still I'd be happy if this is fixed :)

Whlie i wrote this, the stop watch passed the 7 minute mark... still running oO...

janis91 · 2020-01-22T07:26:18Z

Ok. Wow. For me (local development and test setup AND production setup on server) it works just perfectly. I took the time and on my production setup (also Firefox 70) it took 13 sec for a png and 28 sec for single page pdf.
So at the moment I cannot reproduce it.

janis91 · 2020-02-08T23:09:42Z

Can you test it with the newest version and give feedback? For me it works.

aslmx · 2020-02-13T13:49:37Z

so, updated to NC 17.0.3 today and in the process also made the upgrade to OCR 6.0.3.

I tried to repro the issue from the "Recent files" view and it was directly reproducible - so i thought.

After a few seconds an error popup/toast was shown that there was some tesseract issue. I was not fast enough to copy the error message apparently.
Then i used the "..." menu to go to the files folder.
It feels that from the folder view, the OCR Processing is now much faster. It took less than a minute.
So that was succesful.

Then I tried again to reproduce from the recent files menu, to get the exact error message for this issue.

However, then i got an error message the the target file existed. Of course, i had forgotten to delete the file i created. I then renamed the existing file and retried...

It worked now...

I'll keep an eye on this. I consider it working now. If i see something weird which is reproducible I will either log a new issue or add info to this one if it is okay...

aslmx · 2020-02-13T18:19:49Z

So...

I came home today to find a bill in the snail mail. So i used my photo-scan-app to scan it and NextCloud for Android App is set up to automatically sync this document to NC.
It was 2 pages!

Tried my luck:
Opened recent file view and tried to OCR the first page -> No luck
Error Message:

OCR: OCR processing failed: An unexpected error occured during Tesseract processing

So i used the "..." menu to go to the file in the folder and used "..." menu to OCR the file from there.

Worked great again...

Then i thought "well maybe it is alway the first time it fails".

And this seems to be key to the problem here.

Went back to the "recent file" view, used the "..." Menu to OCR the second image.

Worked like a charm...

So now i guess the defect here is

OCR from recent file view fails when it is the first time OCR is used in a NC WebUI Session

@janis91

do you know if there are logs that would help further debug this?
do you prefer to reopen this issue, or shall i open a new one?

janis91 · 2020-02-14T08:05:30Z

logs are in the browser console. Would help to have them here, for that problem.

aslmx · 2020-02-15T11:19:12Z

Hi, just tried again.

This is from my console, hope it helps find the problem:

12:14:36.105 Content Security Policy: Directive ‘child-src’ has been deprecated. Please use directive ‘worker-src’ to control workers, or directive ‘frame-src’ to control frames respectively. 2
12:15:08.248 Error in pixReadMem: Unknown format: no pix returned tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetSpp: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetDimensions: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetColormap: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixCopy: pixs not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetDepth: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetWpl: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetYRes: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.292 An error occured in OCR. Error: "An unexpected error occured during Tesseract processing."
    t OcrError.ts:8
    t app.js:1
    process TesseractService.ts:39
    c tslib.es6.js:99
    s tslib.es6.js:80
    a tslib.es6.js:70
 abort(21). Build with -s ASSERTIONS=1 for more info. ModalContent.vue:76:16
12:15:08.250 Error in pixClone: pixs not defined tesseract-core.wasm.js:8:2976896
12:15:08.252 Error: abort(21). Build with -s ASSERTIONS=1 for more info. createWorker.js:141:14
12:15:08.250 Please call SetImage before attempting recognition. tesseract-core.wasm.js:8:2976896
12:15:08.251 21 tesseract-core.wasm.js:8:3121823
12:15:08.251 21 tesseract-core.wasm.js:8:3121833

janis91 · 2020-02-18T19:52:47Z

Just that I get it right: You upload an image and go right away to "recent files" view, after that you try to process the file and it logs that error? Because it seems, that tesseract does not find any image in the input..

aslmx · 2020-02-21T07:35:56Z

Hi @janis91 yes, that is exactly the way to reproduce it.

Upload file
Go to recent files view
OCR from "..." menu

janis91 · 2020-02-23T12:22:35Z

Well, actually, I really cannot reproduce it. For me it seems like, the actual file is not given to tesseract to do the ocr job, but for me everything works out really well on newest version.. I'm not sure, if this could be a timing issue. The other option is the following:

when you are in your file browser, upload the document that fails into a new clean folder.
open up your browser console (F12) and paste the following into it: It registers another file action for the file context menu. Then go to your recent file view and start the "Test" action from the "..." menu of the file. Go to your console once again and copy the output of the log to this issue, such that I can investigate further on this :-)

OCA.Files.fileActions.registerAction({
        actionHandler: (_something, context) => {
            console.log(context.fileInfoModel.attributes);
        },
        altText: 'Test',
        displayName: 'Test',
        iconClass: 'icon-help',
        mime: 'image',
        name: 'Test',
        order: 100,
        permissions: OC.PERMISSION_UPDATE
      })

aslmx · 2020-02-24T09:11:14Z

Thx for your time investingating this.

So I tried to Repro this.

First I created a new folder via the WebUi. Then I pasted your snippet into console and had to allow the paste (never did this, interesting concept). After that i uploaded a file via drag'n'drop to that folder in the browser
Then changed to recent files view and tried the "Test" Menu from there. Output below. However, then I tried to OCR the file and it worked right away.
This reminded me, that OCR seems to be fine, when I open the actual folder once in the WebUi.


10:03:01.400
{…}

etag: "453811b7a5c3130a8f3d7eacecc1cba9"

hasPreview: true

id: 146008

isEncrypted: false

mimetype: "image/png"

mtime: 1582534964000

name: "AAAAA_DOC-20200213-185936.png"

path: "/NewCleanFolder"

permissions: 27

shareOwner: undefined

shareOwnerId: undefined

sharePermissions: undefined

size: 164624

type: "file"

<prototype>: Object { … }

debugger eval code:3:21

**Edit: **
I just saw there seems to be way more info in the Object{....}. Apparently there is no way to copy everything without having to expand all the tree items manually, is there?

/edit

Then i thought, lets do a more real life scenario. When it fails for me, i usually had uploaded the file via the android client (instant upload).

So i deleted the new folder. Created a new folder on my laptop, had it be synced up. Refreshed Nextcloud, went to recent files view. Pasted your snippet in the console, and ran the "test" Function on the file from recent file view. Output below. Then i tried to OCR that file and it failed.


10:05:00.202
{…}

etag: "46e800f43bb463d3caedf7924038a429"

hasPreview: true

id: 146014

isEncrypted: false

mimetype: "image/png"

mtime: 1582534964000

name: "AAAAA_DOC-20200213-185936.png"

path: "/NewCleanFolder_FromDesktop"

permissions: 27

shareOwner: undefined

shareOwnerId: undefined

sharePermissions: undefined

size: 164624

type: "file"

<prototype>: Object { … }
debugger eval code:3:21

Apparently i do not see any difference, do you?

By the way, i think this was the console output when it failed:


10:05:32.372 An error occured in OCR. Error: "An unexpected error occured during Tesseract processing."
    t OcrError.ts:8
    t app.js:1
    process TesseractService.ts:39
    c tslib.es6.js:99
    s tslib.es6.js:80
    a tslib.es6.js:70
 abort(21). Build with -s ASSERTIONS=1 for more info. ModalContent.vue:76:16

I'll have another run, if you need to add further code to your snippet that reveals more details.
My guess is, it has something todo with the file being once shown in its actual folder -> then it works.

janis91 · 2020-02-29T10:50:41Z

My guess is, it has something todo with the file being once shown in its actual folder -> then it works.

I think so, too.

Could you maybe try this:

OCA.Files.fileActions.registerAction({
        actionHandler: (_something, context) => {
            const file = context.fileInfoModel.attributes;
            console.log(file);
            const downloadUrl = OCA.Files.App.fileList.getDownloadUrl(file.name);
            console.log(downloadUrl);
            fetch(downloadUrl).then((resp) => {
               console.log(resp);
            }).catch(console.log);
        },
        altText: 'Test',
        displayName: 'Test',
        iconClass: 'icon-help',
        mime: 'image',
        name: 'Test',
        order: 100,
        permissions: OC.PERMISSION_UPDATE
      })

And maybe you could also try to copy the downloadUrl that will be logged and try to download the file yourself, by pasting it into a new tab's url / search bar and hitting enter. Is the file downloaded? Because I think there might be a problem with the download url being generated.

aslmx · 2020-03-02T07:40:05Z

And maybe you could also try to copy the downloadUrl that will be logged and try to download the file yourself, by pasting it into a new tab's url / search bar and hitting enter. Is the file downloaded? Because I think there might be a problem with the download url being generated.

So

i pasted your snippet (while on NC web Ui front page) into console
Uploaded a scan
openened "recent files" view
Clicked "..." -> Test on the newest scan/file

It says

08:32:39.480 /remote.php/webdav/DOC-20200302-083210.png

When i navigate to

https://my-nextcloud-domain.tld/remote.php/webdav/DOC-20200302-083210.png

I get a 404.

Then I went to "..." -> view in folder, used the "test" context menu and got this link in the console

remote.php/webdav/SofortUpload/OpenNoteScanner/2020/03/DOC-20200302-083210.png

When i paste this relative path behind my FQDN, I am offered to save the file.

So yes, I would guess that the path generated is not correct, hence OCR cant find the file and will not work.

Still funny... because when I then go back to recent files view, the correct path is printed into the console. So this is inline with my observation before.

janis91 · 2020-03-05T07:16:12Z

So yes, I would guess that the path generated is not correct, hence OCR cant find the file and will not work.

I think so to. I will try that in a more advanced scenario on my testing machine and hope that I can reproduce it by myself. If so, that might be a bug in the OCA.Files.App.fileList.getDownloadUrl function. Because I just use what's given by the Nextcloud API in the browser. I will let you know when I have reproduced it.

aslmx · 2020-03-05T09:48:13Z

great! Thanks! Let me know when you need anything from me, i'm happy to help to resolve this!

janis91 added the bug label Jan 20, 2020

janis91 self-assigned this Jan 20, 2020

janis91 closed this as completed Jan 21, 2020

janis91 reopened this Jan 22, 2020

janis91 closed this as completed Feb 8, 2020

janis91 reopened this Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR from Recent Files view takes forever #235

OCR from Recent Files view takes forever #235

aslmx commented Nov 4, 2019 •

edited

Loading

janis91 commented Jan 21, 2020

aslmx commented Jan 22, 2020

janis91 commented Jan 22, 2020

janis91 commented Feb 8, 2020

aslmx commented Feb 13, 2020

aslmx commented Feb 13, 2020

janis91 commented Feb 14, 2020

aslmx commented Feb 15, 2020 •

edited by janis91

Loading

janis91 commented Feb 18, 2020

aslmx commented Feb 21, 2020

janis91 commented Feb 23, 2020

aslmx commented Feb 24, 2020 •

edited

Loading

janis91 commented Feb 29, 2020 •

edited

Loading

aslmx commented Mar 2, 2020 •

edited

Loading

janis91 commented Mar 5, 2020

aslmx commented Mar 5, 2020

OCR from Recent Files view takes forever #235

OCR from Recent Files view takes forever #235

Comments

aslmx commented Nov 4, 2019 • edited Loading

Bug report

Expected Behavior

Current Behavior

Your Environment

janis91 commented Jan 21, 2020

aslmx commented Jan 22, 2020

janis91 commented Jan 22, 2020

janis91 commented Feb 8, 2020

aslmx commented Feb 13, 2020

aslmx commented Feb 13, 2020

janis91 commented Feb 14, 2020

aslmx commented Feb 15, 2020 • edited by janis91 Loading

janis91 commented Feb 18, 2020

aslmx commented Feb 21, 2020

janis91 commented Feb 23, 2020

aslmx commented Feb 24, 2020 • edited Loading

janis91 commented Feb 29, 2020 • edited Loading

aslmx commented Mar 2, 2020 • edited Loading

janis91 commented Mar 5, 2020

aslmx commented Mar 5, 2020

aslmx commented Nov 4, 2019 •

edited

Loading

aslmx commented Feb 15, 2020 •

edited by janis91

Loading

aslmx commented Feb 24, 2020 •

edited

Loading

janis91 commented Feb 29, 2020 •

edited

Loading

aslmx commented Mar 2, 2020 •

edited

Loading