use locale independ way to convert string to double #51

Dushistov · 2019-09-20T00:37:04Z

Related to couchbase/couchbase-lite-core#849 , should fix issue.
Plus this change should speedup json parsing, because of according to dropbox/json11#38 (comment) when I replace in dropbox json library call strtod with google/double-conversation I see speedup in benchmark from 51ms to 40ms.

Though there is isdigit in vendor/jsonsl/jsonsl.c . isdigit locale dependent function also.
But this is related to other repository. I should note that I expect that replacing isidigit should also speedup things, because of most compilers can not inline isdigit, so check if (ch >= '0' && ch <= '9') is faster.

borrrden · 2019-09-20T04:04:11Z

I hesitate to pull in an entire new dependency just for this one line. If JSON requires the dot to be used, then stringstreams could be used with an imbued locale in order to ignore the global one. What do you think of that approach?

Dushistov · 2019-09-20T08:22:03Z

@borrrden

then stringstreams
What do you think of that approach?

I heard that it 3-4 times slower in compare with strtod.
And looks like this parsing code is used a lot via c4db_encodeJSON,
which as I understand may be invoked every time of document saving.
So this why I choose one of the most faster string to double library around.

Dushistov · 2019-09-20T11:08:47Z

And I find out that this is not final fix,
I got "17,42342" in json after document has been retrieved from database.
I suspect that this is because of:

            case kFloatTag: {
                if (_byte[0] & 0x8)
                    sprintf(str, "%.16g", asDouble());
                else
                    sprintf(str, "%.6g", asFloat());
                break;
            }

yeah, sprintf depend on locale also.

This prevents invalid JSON output if locale define different then "." for float/double conversation

setlocale used in similart way as Qt/QApplication does

Dushistov · 2019-09-20T18:41:56Z

@borrrden

I fix also sprintf case.

Not that after reading code I can say that I did not only fix locale issue,
but potentially also fix information loss.

The code before this patch used %.6g and %.16g, that works for most cases,
but not for all.
For example:

char buf[32];
float origin_f = 1.0f / 3.0f;
sprintf(s, "%.6g", x);
float new_f = strtod(buf, nullptr);

after this new_f != origin_f, so during save to database, load from database you can
loose bits of information in your number, for this particular case you need %.8g.
Google's ToShortest function take care about this case. So after that patch store/load from database should not cause information lost.

borrrden · 2019-09-21T03:11:47Z

c4db_encodeJson is actually not used anywhere in the library, so no need to optimize it (it is a convenience function that Couchbase Lite does not need, but may be useful to others and we use it in our tests too). I have been doing some experimentation and I found a locale specific sprintf and strtod and even used it before I realized that it's only available on certain platforms (BSD and Windows....). I'm leaning toward an implementation that can somehow push and pop the locale setting though, rather than a full on 3rd party library. The loss of information from float to string is an acceptable one and our advice has been "if you need that level of precision then use strings instead of numbers."

borrrden · 2019-09-21T08:50:57Z

Let me know what you think of #53

Dushistov · 2019-09-21T10:29:21Z

c4db_encodeJson is actually not used anywhere in the library,

I find out about this function from reading something like native_c4database.cc from couchbase-lite-java , and I used it a lot in in my Rust wrapper around couchbase-lite-core. Though have no idea that this is not frequently used function, but I don't see alternative for saving JSON directly.

The loss of information from float to string is an acceptable one and our advice has been "if you need > that level of precision then use strings instead of numbers."

But why? According to google developers 17 characters enough to encode any number (see kBase10MaximalLength constant in double-conversation code).
Why loose any information, if it is possible to not loose anything?

And it is really inconvenient, you see JSON encoding/decoding function is generated from language types, obviously I can mark any field with float/double to use special function that encode/decode them to stirng, but it is a lot of manual work, and what if I forget to mark field as such?
The more robust solution will be patch fleece every time and do not care about such issue in the main application.

Plus I want to share this data with somebody who use other language (frontend), it will be inconvenient to document every field like - it is really number, but I save it as string,
it is really, really strange.

Dushistov · 2019-09-21T10:52:23Z

@borrrden

Let me know what you think of #53

Access to thread-local variables in shared-library is not cheap.
And you call it every sprintf/strtod call (I mean uselocale). It may potentially slow down all program.

borrrden · 2019-09-21T23:12:37Z

Well there is no alternative for saving JSON directly because that's not what we do in Couchbase Lite. We turn domain objects (C# classes, etc) into Fleece by using FLEncoder. Thus we now have hundreds of lines of 3rd party code added for something that doesn't help the Couchbase Lite distributed product. We based the 6 and 16 values off of information about how many accurate bytes there are in each type (their precision).

I don't remember hearing about the Rust wrapper before, that's pretty neat!

@snej What are your thoughts on this?

snej · 2019-09-23T19:20:58Z

there is no alternative for saving JSON directly because that's not what we do in Couchbase Lite.

The replicator encodes documents into JSON when pushing. So JSON-generation issues are definitely important to Couchbase Lite.

snej · 2019-09-23T19:31:01Z

[multiply edited]

What about using strtod_l and sprintf_l, which explicitly take a locale_t parameter?

The (macOS) man page for newlocale says:

In order to create a C locale_t value, use newlocale(LC_ALL_MASK, NULL, NULL).

We can just put that in a static variable for use in Fleece.

snej · 2019-09-23T19:48:14Z

Found this in a post on the lua mailing list:

strtod_l is available on linux (at least both glibc and musl) if _GNU_SOURCE is defined

_strtod_l is available in MSVC since VS2005

strtod_l is available on OSX since Darwin 8.

strtod_l is available on FreeBSD since 9.1

I'm working on a fix using that.

snej · 2019-09-23T21:52:20Z

after this new_f != origin_f, so during save to database, load from database you can
loose bits of information in your number

Comparing floating point numbers for equality before and after some transformation is always a problem, because of roundoff. I've had to explain to testers many times that they can't write assertions that compare floats for equality.

This is especially bad with 32-bit floats. I don't think anyone who cares about accuracy is going to choose those over 64-bit, when you get fewer than 7 significant [decimal] digits.

I came up with the %.6g and %.16g formats after some experimentation, to get the best accuracy without getting a lot of badly-rounded strings for short decimal values. For example, 0.9f formatted to 8 figures is 0.89999998. We got several bug reports about this before I tweaked the formatting! I just did some tests, formatting all multiples of 0.001f from 0 to 1, and 7 figures seems to work well; I'm not sure why I went down to 6 originally, but I must have found some cases that formatted badly.

Dushistov · 2019-09-23T22:22:20Z

@snej

Comparing floating point numbers for equality before and after some transformation is always a problem,

In this particular case I suppose that comparing is the only and right way to test equality.

You see one thing that you calculate something, obviously float and double give you some round error. Another thing is serialization / deserialization.

For example you have JSON document with two properties A and B,
A has float point number type.
You change B, save and reload JSON document from database.
Repeat it several times. What if after several such transformations the bits representation of float/double was changed in the way that error would be accumulated?

If you store and load the same bits representation you can be sure that you have no problem at all. I am not sure that after several cycles for save/load .6g and .16g the error will be not accumulated for some specific numbers.

borrrden · 2019-09-25T00:38:31Z

Closing in favor of #56 (Thanks for getting us to address the issue though!)

snej · 2019-09-25T16:35:22Z

Though there is isdigit in vendor/jsonsl/jsonsl.c . isdigit locale dependent function also.

Actually it isn't; I just checked the C reference docs: "isdigit and isxdigit are the only standard narrow character classification functions that are not affected by the currently installed C locale."

Although you may be referring to this: "...although some implementations (e.g. Microsoft in 1252 codepage) may classify additional single-byte characters as digits."

I checked <ctype.h> in the macOS/iOS SDKs, and the implementation isn't locale-dependent but it looks less efficient than it could be — it's using table lookup.

snej · 2019-09-26T17:14:56Z

I expect that replacing isdigit [in jsonsl] should also speedup things

I tried this yesterday [on macOS] and the speedup was very small. I guess Clang's (Apple's?) isdigit is inlined.

use locale independ way to convert string to double

c0dc198

Dushistov added 3 commits September 20, 2019 17:11

use locale independent way for JSON generation

a70b17c

This prevents invalid JSON output if locale define different then "." for float/double conversation

Test code with different locales

226a8a3

setlocale used in similart way as Qt/QApplication does

fix ObjCTests.mm test

4bc6060

This was referenced Sep 23, 2019

Errors parsing & writing JSON floats, in some locales #54

Closed

Fixed issues with reading/writing JSON floats #56

Merged

borrrden closed this Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use locale independ way to convert string to double #51

use locale independ way to convert string to double #51

Dushistov commented Sep 20, 2019 •

edited

Loading

borrrden commented Sep 20, 2019

Dushistov commented Sep 20, 2019 •

edited

Loading

Dushistov commented Sep 20, 2019 •

edited

Loading

Dushistov commented Sep 20, 2019 •

edited

Loading

borrrden commented Sep 21, 2019 •

edited

Loading

borrrden commented Sep 21, 2019

Dushistov commented Sep 21, 2019

Dushistov commented Sep 21, 2019

borrrden commented Sep 21, 2019

snej commented Sep 23, 2019

snej commented Sep 23, 2019 •

edited

Loading

snej commented Sep 23, 2019

snej commented Sep 23, 2019

Dushistov commented Sep 23, 2019 •

edited

Loading

borrrden commented Sep 25, 2019

snej commented Sep 25, 2019

snej commented Sep 26, 2019

use locale independ way to convert string to double #51

use locale independ way to convert string to double #51

Conversation

Dushistov commented Sep 20, 2019 • edited Loading

borrrden commented Sep 20, 2019

Dushistov commented Sep 20, 2019 • edited Loading

Dushistov commented Sep 20, 2019 • edited Loading

Dushistov commented Sep 20, 2019 • edited Loading

borrrden commented Sep 21, 2019 • edited Loading

borrrden commented Sep 21, 2019

Dushistov commented Sep 21, 2019

Dushistov commented Sep 21, 2019

borrrden commented Sep 21, 2019

snej commented Sep 23, 2019

snej commented Sep 23, 2019 • edited Loading

snej commented Sep 23, 2019

snej commented Sep 23, 2019

Dushistov commented Sep 23, 2019 • edited Loading

borrrden commented Sep 25, 2019

snej commented Sep 25, 2019

snej commented Sep 26, 2019

Dushistov commented Sep 20, 2019 •

edited

Loading

Dushistov commented Sep 20, 2019 •

edited

Loading

Dushistov commented Sep 20, 2019 •

edited

Loading

Dushistov commented Sep 20, 2019 •

edited

Loading

borrrden commented Sep 21, 2019 •

edited

Loading

snej commented Sep 23, 2019 •

edited

Loading

Dushistov commented Sep 23, 2019 •

edited

Loading