Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR from Recent Files view takes forever #235

Open
aslmx opened this issue Nov 4, 2019 · 16 comments
Open

OCR from Recent Files view takes forever #235

aslmx opened this issue Nov 4, 2019 · 16 comments
Assignees
Labels

Comments

@aslmx
Copy link

aslmx commented Nov 4, 2019

Version: 4.4.16

Bug report

Expected Behavior

OCR conversion takes the same time and works the same, no matter from where it is used

Current Behavior

When using OCR Tool from the "..." Menu on files in the Recent File view, it takes forever (>3min; i doubt it would ever finish tbh) to complete.

When using the "..." menu on a file and clicking "View in folder", the folder with the respective file opens.

When then using the OCR tool from the "..." Menu on the same file it completes within ~20 seconds.

Your Environment

  • OCR version used: 4.4.16
  • Browser Name and version: Firefox 70
  • Operating System and version (desktop or mobile): Linux Mint 19.3 64Bit
  • nextcloud version: (see admin page or version.php) 15.0.11
@janis91 janis91 added the bug label Jan 20, 2020
@janis91 janis91 self-assigned this Jan 20, 2020
@janis91
Copy link
Owner

janis91 commented Jan 21, 2020

Cannot reproduce with the newest version (4.6.0) which comes with integration of #244
If your issue still exists with that version, please consider adding screenshots maybe, such that it is easier to reproduce the behavior.

@janis91 janis91 closed this as completed Jan 21, 2020
@aslmx
Copy link
Author

aslmx commented Jan 22, 2020

So...

i had upgraded to recent NextCloud yesterday already (17.something) and i think i also upgraded to OCR 4.4.something in the process.
With this setup i had tried this morning with a new test file. One page, PNG from OpenNoteScanner App.

From Recent view: gave up after 2 minutes
From file folder view: conversion took ~40 seconds

Then i upgraded to most recent OCR App version, 4.6.1.
This time I started the stopwatch ...

From recent file view: gave up after 3 minutes
From folder view: still running.. 5 minutes now.

So for me it seems broken completely now :-|

I'll probably have some time at the weekend to give this some further testing if it helps you.
Tbh I have created a Thunar-Custom-Action based workflow with local installation of tesseract and the NC app was barely used in the last weeks. But still I'd be happy if this is fixed :)

Whlie i wrote this, the stop watch passed the 7 minute mark... still running oO...

@janis91
Copy link
Owner

janis91 commented Jan 22, 2020

Ok. Wow. For me (local development and test setup AND production setup on server) it works just perfectly. I took the time and on my production setup (also Firefox 70) it took 13 sec for a png and 28 sec for single page pdf.
So at the moment I cannot reproduce it.

@janis91 janis91 reopened this Jan 22, 2020
@janis91
Copy link
Owner

janis91 commented Feb 8, 2020

Can you test it with the newest version and give feedback? For me it works.

@janis91 janis91 closed this as completed Feb 8, 2020
@aslmx
Copy link
Author

aslmx commented Feb 13, 2020

so, updated to NC 17.0.3 today and in the process also made the upgrade to OCR 6.0.3.

I tried to repro the issue from the "Recent files" view and it was directly reproducible - so i thought.

After a few seconds an error popup/toast was shown that there was some tesseract issue. I was not fast enough to copy the error message apparently.
Then i used the "..." menu to go to the files folder.
It feels that from the folder view, the OCR Processing is now much faster. It took less than a minute.
So that was succesful.

Then I tried again to reproduce from the recent files menu, to get the exact error message for this issue.

However, then i got an error message the the target file existed. Of course, i had forgotten to delete the file i created. I then renamed the existing file and retried...

It worked now...

I'll keep an eye on this. I consider it working now. If i see something weird which is reproducible I will either log a new issue or add info to this one if it is okay...

@aslmx
Copy link
Author

aslmx commented Feb 13, 2020

So...

I came home today to find a bill in the snail mail. So i used my photo-scan-app to scan it and NextCloud for Android App is set up to automatically sync this document to NC.
It was 2 pages!

Tried my luck:
Opened recent file view and tried to OCR the first page -> No luck
Error Message:

OCR: OCR processing failed: An unexpected error occured during Tesseract processing

So i used the "..." menu to go to the file in the folder and used "..." menu to OCR the file from there.

Worked great again...

Then i thought "well maybe it is alway the first time it fails".

And this seems to be key to the problem here.

Went back to the "recent file" view, used the "..." Menu to OCR the second image.

Worked like a charm...

So now i guess the defect here is

OCR from recent file view fails when it is the first time OCR is used in a NC WebUI Session

@janis91

  1. do you know if there are logs that would help further debug this?
  2. do you prefer to reopen this issue, or shall i open a new one?

@janis91 janis91 reopened this Feb 14, 2020
@janis91
Copy link
Owner

janis91 commented Feb 14, 2020

logs are in the browser console. Would help to have them here, for that problem.

@aslmx
Copy link
Author

aslmx commented Feb 15, 2020

Hi, just tried again.

This is from my console, hope it helps find the problem:

12:14:36.105 Content Security Policy: Directive ‘child-src’ has been deprecated. Please use directive ‘worker-src’ to control workers, or directive ‘frame-src’ to control frames respectively. 2
12:15:08.248 Error in pixReadMem: Unknown format: no pix returned tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetSpp: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetDimensions: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.249 Error in pixGetColormap: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixCopy: pixs not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetDepth: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetWpl: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.250 Error in pixGetYRes: pix not defined tesseract-core.wasm.js:8:2976896
12:15:08.292 An error occured in OCR. Error: "An unexpected error occured during Tesseract processing."
    t OcrError.ts:8
    t app.js:1
    process TesseractService.ts:39
    c tslib.es6.js:99
    s tslib.es6.js:80
    a tslib.es6.js:70
 abort(21). Build with -s ASSERTIONS=1 for more info. ModalContent.vue:76:16
12:15:08.250 Error in pixClone: pixs not defined tesseract-core.wasm.js:8:2976896
12:15:08.252 Error: abort(21). Build with -s ASSERTIONS=1 for more info. createWorker.js:141:14
12:15:08.250 Please call SetImage before attempting recognition. tesseract-core.wasm.js:8:2976896
12:15:08.251 21 tesseract-core.wasm.js:8:3121823
12:15:08.251 21 tesseract-core.wasm.js:8:3121833

@janis91
Copy link
Owner

janis91 commented Feb 18, 2020

Just that I get it right: You upload an image and go right away to "recent files" view, after that you try to process the file and it logs that error? Because it seems, that tesseract does not find any image in the input..

@aslmx
Copy link
Author

aslmx commented Feb 21, 2020

Hi @janis91 yes, that is exactly the way to reproduce it.

  1. Upload file
  2. Go to recent files view
  3. OCR from "..." menu

@janis91
Copy link
Owner

janis91 commented Feb 23, 2020

Well, actually, I really cannot reproduce it. For me it seems like, the actual file is not given to tesseract to do the ocr job, but for me everything works out really well on newest version.. I'm not sure, if this could be a timing issue. The other option is the following:

when you are in your file browser, upload the document that fails into a new clean folder.
open up your browser console (F12) and paste the following into it: It registers another file action for the file context menu. Then go to your recent file view and start the "Test" action from the "..." menu of the file. Go to your console once again and copy the output of the log to this issue, such that I can investigate further on this :-)

OCA.Files.fileActions.registerAction({
        actionHandler: (_something, context) => {
            console.log(context.fileInfoModel.attributes);
        },
        altText: 'Test',
        displayName: 'Test',
        iconClass: 'icon-help',
        mime: 'image',
        name: 'Test',
        order: 100,
        permissions: OC.PERMISSION_UPDATE
      })

@aslmx
Copy link
Author

aslmx commented Feb 24, 2020

Thx for your time investingating this.

So I tried to Repro this.

First I created a new folder via the WebUi. Then I pasted your snippet into console and had to allow the paste (never did this, interesting concept). After that i uploaded a file via drag'n'drop to that folder in the browser
Then changed to recent files view and tried the "Test" Menu from there. Output below. However, then I tried to OCR the file and it worked right away.
This reminded me, that OCR seems to be fine, when I open the actual folder once in the WebUi.


10:03:01.400
{…}

etag: "453811b7a5c3130a8f3d7eacecc1cba9"
​
hasPreview: true
​
id: 146008
​
isEncrypted: false

​mimetype: "image/png"
​
mtime: 1582534964000
​
name: "AAAAA_DOC-20200213-185936.png"
​
path: "/NewCleanFolder"
​
permissions: 27
​
shareOwner: undefined
​
shareOwnerId: undefined
​
sharePermissions: undefined

​size: 164624
​
type: "file"
​
<prototype>: Object { … }

debugger eval code:3:21

**Edit: **
I just saw there seems to be way more info in the Object{....}. Apparently there is no way to copy everything without having to expand all the tree items manually, is there?

/edit

Then i thought, lets do a more real life scenario. When it fails for me, i usually had uploaded the file via the android client (instant upload).

So i deleted the new folder. Created a new folder on my laptop, had it be synced up. Refreshed Nextcloud, went to recent files view. Pasted your snippet in the console, and ran the "test" Function on the file from recent file view. Output below. Then i tried to OCR that file and it failed.


10:05:00.202
{…}
​
etag: "46e800f43bb463d3caedf7924038a429"
​
hasPreview: true
​
id: 146014
​
isEncrypted: false
​
mimetype: "image/png"
​
mtime: 1582534964000
​
name: "AAAAA_DOC-20200213-185936.png"
​
path: "/NewCleanFolder_FromDesktop"
​
permissions: 27
​
shareOwner: undefined
​
shareOwnerId: undefined
​
sharePermissions: undefined
​
size: 164624
​
type: "file"
​
<prototype>: Object { … }
debugger eval code:3:21

Apparently i do not see any difference, do you?

By the way, i think this was the console output when it failed:


10:05:32.372 An error occured in OCR. Error: "An unexpected error occured during Tesseract processing."
    t OcrError.ts:8
    t app.js:1
    process TesseractService.ts:39
    c tslib.es6.js:99
    s tslib.es6.js:80
    a tslib.es6.js:70
 abort(21). Build with -s ASSERTIONS=1 for more info. ModalContent.vue:76:16

I'll have another run, if you need to add further code to your snippet that reveals more details.
My guess is, it has something todo with the file being once shown in its actual folder -> then it works.

@janis91
Copy link
Owner

janis91 commented Feb 29, 2020

My guess is, it has something todo with the file being once shown in its actual folder -> then it works.

I think so, too.

Could you maybe try this:

OCA.Files.fileActions.registerAction({
        actionHandler: (_something, context) => {
            const file = context.fileInfoModel.attributes;
            console.log(file);
            const downloadUrl = OCA.Files.App.fileList.getDownloadUrl(file.name);
            console.log(downloadUrl);
            fetch(downloadUrl).then((resp) => {
               console.log(resp);
            }).catch(console.log);
        },
        altText: 'Test',
        displayName: 'Test',
        iconClass: 'icon-help',
        mime: 'image',
        name: 'Test',
        order: 100,
        permissions: OC.PERMISSION_UPDATE
      })

And maybe you could also try to copy the downloadUrl that will be logged and try to download the file yourself, by pasting it into a new tab's url / search bar and hitting enter. Is the file downloaded? Because I think there might be a problem with the download url being generated.

@aslmx
Copy link
Author

aslmx commented Mar 2, 2020

And maybe you could also try to copy the downloadUrl that will be logged and try to download the file yourself, by pasting it into a new tab's url / search bar and hitting enter. Is the file downloaded? Because I think there might be a problem with the download url being generated.

So

  1. i pasted your snippet (while on NC web Ui front page) into console
  2. Uploaded a scan
  3. openened "recent files" view
  4. Clicked "..." -> Test on the newest scan/file

It says

08:32:39.480 /remote.php/webdav/DOC-20200302-083210.png

When i navigate to

https://my-nextcloud-domain.tld/remote.php/webdav/DOC-20200302-083210.png

I get a 404.

Then I went to "..." -> view in folder, used the "test" context menu and got this link in the console

remote.php/webdav/SofortUpload/OpenNoteScanner/2020/03/DOC-20200302-083210.png

When i paste this relative path behind my FQDN, I am offered to save the file.

So yes, I would guess that the path generated is not correct, hence OCR cant find the file and will not work.

Still funny... because when I then go back to recent files view, the correct path is printed into the console. So this is inline with my observation before.

@janis91
Copy link
Owner

janis91 commented Mar 5, 2020

So yes, I would guess that the path generated is not correct, hence OCR cant find the file and will not work.

I think so to. I will try that in a more advanced scenario on my testing machine and hope that I can reproduce it by myself. If so, that might be a bug in the OCA.Files.App.fileList.getDownloadUrl function. Because I just use what's given by the Nextcloud API in the browser. I will let you know when I have reproduced it.

@aslmx
Copy link
Author

aslmx commented Mar 5, 2020

great! Thanks! Let me know when you need anything from me, i'm happy to help to resolve this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants