Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fonts): make fonts async #7

Merged
merged 3 commits into from
Apr 30, 2018
Merged

feat(fonts): make fonts async #7

merged 3 commits into from
Apr 30, 2018

Conversation

itsachen
Copy link
Contributor

@itsachen itsachen commented Apr 25, 2018

What

  • Converts fonts method to perform work asynchronously, away from the main Node event loop.
    • fonts now takes a callback function as a second argument, that it will call with error, fonts on completion
  • Adds stricter validation on inputs

Why

We want to be use this package to log PDF fonts on API requests. Previously fonts was performing the work in the main Node event loop, thereby blocking other execution from happening on the loop. This PR performs the majority of the work in a separate worker loop, with the goal of improving performance so that it can be used in API without negative side effects.

Performance

I ran some tests, the results can be viewed here. A quick summary of the results:

  • The async version of the function finishes 5 times faster than the sync version when run against 500 pre-downloaded PDFs. This is the case with varying levels of concurrency, ranging from 5 to infinity. This may be the case because the async version performs the work in a separate thread so the workload is parallelized whereas the sync version performs the work serially in the main event loop.
  • When evaluated against 20,000 non-HTML letter PDFs from March 2018 the time to evaluate one PDF are as follow:
    • avg: 10.9 ms
    • median: 9.1 ms
    • st dev: 67.8 ms
    • min: 0.2 ms
    • max: 7670 ms*
  • When evaluated against 20,000 non-HTML postcards PDFs from the same time period:
    • avg: 3.5 ms
    • median: 3.2 ms
    • st dev: 2.2 ms
    • min: 0.3 ms
    • max: 95 ms

Notes

  • * If you see the distribution of letter processing times you'll see the majority of times are between 0-40ms, but there are some resources that took over 1s. When those resources are isolated and fonts is re-run on them, they do not take the same amount of time. My theory is that these times might have to do with the concurrency at which I'm running the tests on my laptop, as well as the resource limitations of my machine, and isn't representative of how it would perform in API.

Resources

Copy link
Contributor

@robinjoseph08 robinjoseph08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly just questions! we can talk about them irl in our meeting though

lib/pdffonts.js Outdated
module.exports = require('bindings')('pdffonts');
const Bluebird = require('bluebird');

const pdffonts = require('bindings')('pdffonts');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thoughts on capitalizing? PDFFonts? PdfFonts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDFFonts works for me

lib/pdffonts.js Outdated
const pdffonts = require('bindings')('pdffonts');

module.exports = {
fonts: Bluebird.promisify(pdffonts.fonts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to use Bluebird.promisifyAll?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason promisifyAll didn't work...it throws Error: expected 2 arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. w e i r d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we can use promisifyAll but I was just forgetting the prefix. Will investigate further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation it seems like the prefix is required. I'd prefer that the method name is fonts, so I'll leave as is.

});
});

it('handles Type 3 fonts', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what a type 3 font is and why we're explicitly testing for it

Copy link
Contributor Author

@itsachen itsachen Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't fully understand but its basically another type of font, characterized by the fact that "glyphs [are] defined using the full PostScript language" (from wiki). This enables them to do fancy stuff like shading, color or patterns in fonts. It does not support hinting which is a feature of certain font types that enables a font to adjust how it is rendered at small scales for readability.

For what it means to us - Type 3 fonts aren't that common in the wild. Enfocus says they're "primarily decorative ornaments fonts with a lot of tonal vari­ations". Type 3 fonts can not have a name, they can just be a collection of user-defined vector graphics bundled into a "font". I've seen this happen in PDFs generated from latex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. is there a reason we specifically have this test? was it initially not being included and this is to prevent regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I didn't mention why. Type 3 fonts can have a null name, so I needed to add this check to prevent a null pointer error. This test covers that code path for coverage.

}

NAN_MODULE_INIT(Init) {
Nan::Set(target, Nan::New("fonts").ToLocalChecked(), Nan::GetFunction(Nan::New<FunctionTemplate>(Fonts)).ToLocalChecked());
// Set globalParams once. globalParams is a global variable and is referenced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we just setting this once instead of during the function call? is this more efficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globalParams has already been declared once globally, so it only needs to be set once in order for it to be used. If we set it in the function call multiple invocations of the method could potentially re-set the value. I initially had it set and destroyed in the function call but I ran into intermittent segfaults - I believe it is because multiple threads were running and setting/destroying the same shared memory. I explain more IRL, @dckrooney helped me debug this.

src/pdffonts.cc Outdated
return;
// Validate that arguments are correct
if (!info[0]->IsString()) {
return Nan::ThrowError(Nan::New("expected arg 0: string filename").ToLocalChecked());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: usually its a slightly worse ux when an async function can possibly throw as well. cause then, depending on how you use it, youd have to wrap it in a try/catch to handle a possible error and also handle the possible rejection. since this is intending on being private, i dont think its a huge deal, but just thought id mention it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked IRL - since the sync errors are handled by Bluebird the public facing functionality is that errors are thrown asynchronously. I'm considering just leaving that for now.

src/pdffonts.cc Outdated
void HandleErrorCallback() {
Nan::HandleScope scope;
v8::Local<v8::Value> argv[] = {
Nan::New(this->ErrorMessage()).ToLocalChecked(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what part of this makes it an Error object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I don't think it is returned as an Error object, it is a string. I would prefer an Error - I'll investigate how to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait...checking again I think it is returned in JS as an Error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default behavior is pretty much what I've written, so I'll take this out:
https://github.com/nodejs/nan/blob/77d0046d4a8378a0cd31d8110130471b44c99d8d/nan.h#L1869-L1876

fontArray
};

callback->Call(2, argv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the 2 arg here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument is argc (I'm assuming the argument count) and the second is argv, the argument vector: https://github.com/nodejs/nan/blob/master/doc/callback.md

src/pdffonts.cc Outdated
callback->Call(2, argv);
}

void HandleErrorCallback() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this automatically get called over HandleOKCallback if this->ErrorMessage() is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked IRL - I believe this is the case: https://github.com/nodejs/nan/blob/master/nan.h#L1804-L1813

src/pdffonts.cc Outdated

const char *fontName;
if (font->getName() == NULL) {
fontName = "null";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want the string "null" over a null object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point - theres probably a more appropriate value. Maybe a null object or an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to make it the empty string

src/pdffonts.cc Outdated

info.GetReturnValue().Set(fontArray);
// Start the async worker
Nan::AsyncQueueWorker(new FontsWorker(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(keep in mind that i have no idea what im talking about) since we use new wont that put this on the heap so wed have to clean it up later? or does nan do that for us? or am i thinking about this wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my long-winded attempt at explaining V8 handle scopes - I believe that any V8/nan-related objects created are properly GC'd by V8/nan.

Copy link

@dckrooney dckrooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@itsachen itsachen assigned robinjoseph08 and unassigned dckrooney Apr 28, 2018
Copy link
Contributor

@robinjoseph08 robinjoseph08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks again for the irl reivew! it was super helpful 👍

@itsachen itsachen merged commit 044a18d into master Apr 30, 2018
@itsachen itsachen deleted the make-async branch April 30, 2018 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

3 participants