Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect calling convention in UnmanagedCallersOnly #506

Closed
filipnavara opened this issue Aug 18, 2022 · 20 comments · Fixed by #507
Closed

Incorrect calling convention in UnmanagedCallersOnly #506

filipnavara opened this issue Aug 18, 2022 · 20 comments · Fixed by #507

Comments

@filipnavara
Copy link
Contributor

We are using the sqlite3_config_log API. Apparently somewhere between 2.0.7 and 2.1.0 you switched to use UnmanagedCallersOnly. Unfortunately, this means the decorated callback now defaults to the platform calling convention which is stdcall on Windows. The native library (e_sqlite3) expects cdecl though. This mismatch causes access violation once the registered callbacks returns back to the native code.

@ericsink
Copy link
Owner

Er, my bad.

Looks like I can just do something like:

[UnmanagedCallersOnly(CallConvs = new[] { typeof(CallConvCdecl) })]

@filipnavara
Copy link
Contributor Author

I expect that to be the solution but I didn't get to try it yet.

@ericsink
Copy link
Owner

FWIW, version 2.1.1-pre20220822172036 contains this fix. If there are no problems with it, I will publish it as a non-pre release next week. Reopening this issue until I ship a non-pre-release containing its fix.

@ericsink ericsink closed this as completed Sep 6, 2022
@ericsink
Copy link
Owner

AFAICT, the fix made in #507 was intended to address a problem that is Windows-specific, but it has caused iOS platforms to fail completely during AOT, probably due to a bug in the Mono AOT compiler. That bug has been logged at dotnet/runtime#75700.

What probably needs to happen here is to make the necessary fix to calling convention be Windows-specific.

In the meantime, the fix is a case where the cure is worse than the disease. I plan to revert, so I am reopening this bug.

@ericsink ericsink reopened this Sep 16, 2022
@filipnavara
Copy link
Contributor Author

filipnavara commented Sep 16, 2022

That's quite unfortunate since that will make it unusable for us on Windows again :-/ I will try to see if I can expedite the Mono fix.

@ericsink
Copy link
Owner

@filipnavara Aren't you on iOS as well? I mean, the change fixes Windows, but it completely breaks iOS. I sort of expected you to be affected on both sides. If the change somehow did not break iOS builds for you, then I'm misunderstanding something.

I'll try today to rework your fix to take place on Windows only. The T4 stuff already generates several variants of things, so it's not too hard to isolate what you need.

@filipnavara
Copy link
Contributor Author

It's complicated...

We are on legacy Xamarin.iOS which uses different flavor of SQLitePCL.raw that doesn't suffer from the issue.

For a moment we were running some forked builds but I really want to avoid that if possible. First, we could not switch to official builds because we needed the macOS fix for ARM64 varargs calling convention. Once that shipped we could not upgrade because Windows was blocked by this issue (builds but crashes at runtime). Now we are basically on 2.1.1 with our fork abandoned.

We do also have internal builds on net6.0-ios but they are not up to date with the X.iOS branches due to miscellaneous issues so we would not hit the SQLitePCL.raw breakage yet.

@ericsink
Copy link
Owner

Ah. That makes sense. Yes, quite "complicated".

I'm working now on reworking your fix to avoid the net6.0-ios problem.

@filipnavara
Copy link
Contributor Author

Thanks a lot. I really appreciate it.

@ericsink ericsink reopened this Sep 16, 2022
@ericsink
Copy link
Owner

ericsink commented Sep 16, 2022

@filipnavara Are you using the winsqlite3 provider?

(I'm pretty sure you're not.)

@ericsink ericsink mentioned this issue Sep 16, 2022
@ericsink
Copy link
Owner

2.1.2-pre20220916165053 is available on nuget and contains the revised version of the fix for this issue.

@ericsink
Copy link
Owner

@filipnavara One other question about this issue:

There is a test case for sqlite3_config_log:

public void test_log()
{
// Skip failing test on macOS system provided libsqlite3.dylib
if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX) && raw.GetNativeLibraryName() == "sqlite3")
return;
var msgs = new List<string>();
var rc = raw.sqlite3_config_log(
(v, errcode, msg) => msgs.Add(msg),
null
);
Assert.Equal(0, rc);
GC.Collect();
const string VAL = "hello!";
raw.sqlite3_log(0, VAL);
Assert.Single(msgs);
Assert.Equal(VAL, msgs[0]);
strdelegate_log no_cb = null;
rc = raw.sqlite3_config_log(no_cb, null);
Assert.Equal(0, rc);
}

but for some reason, it didn't fail back when I made the switch to use UnmanagedCallersOnly. It should have caught the failure you have been seeing, but it did not.

I have verified that this test is being included in test runs, and that it passes on Windows.

Any idea how to modify that test (or add another one) to make a verification for this issue?

@filipnavara
Copy link
Contributor Author

filipnavara commented Sep 16, 2022

Are you using the winsqlite3 provider?

We use the e_sqlite3 one.

Hmm, I'll have to check about the test. I remember it called the callback and trashed the stack after. It would be interesting to see what happens if sqlite3_log is called twice.

@ericsink
Copy link
Owner

FWIW, I tried modifying the test to call sqlite3_log multiples times, but I still couldn't get it to fail (with the CallConv stuff disabled).

I committed the changes to that test at b166926, but I messed up the commit message and referenced the wrong bug number instead of this one.

ericsink referenced this issue Sep 16, 2022
…mplicated to serve as a repro for #516.  no luck.  calling sqlite3_log() multiple times still seems to work, even when disabling the CallingConvention fix.
@filipnavara
Copy link
Contributor Author

One more thought, do the tests run only as 64-bit target or does it test win-x86 too? We observed it on 32-bit process which actually has different calling conventions.

@ericsink
Copy link
Owner

Ah, 32-bit. That's probably the difference. The test suites have not been running with win-x86.

And in fact, they don't pass. In fact, there are multiple problems, and even though your calling convention fix is apparently sufficient for your situation, it does not seem to be enough to get all my tests to pass.

So, I've apparently got some work to do for 32-bit.

When I release 2.1.2, I'll probably close this issue and open new one(s) for 32-bit problems.

@filipnavara
Copy link
Contributor Author

In retrospective that makes sense 😅 x86 is the weird one with three major calling conventions, x64 and arm64 both use only one (which differs per platform but not per method).

@ericsink
Copy link
Owner

@filipnavara So... for your Windows builds, you're using a target framework of net6.0-windows (or similar), rather than just net6.0, right?

When I tweaked your calling convention fix to be effective only on Windows, I used some existing stuff, and so that version of the provider only gets used for a net6.0-windows TFM. So, above, when I said I tried running those fixes on win-x86, and found that lots of things were broken, I was building with net6.0, so the calling convention fixes were not actually in play. But, if I run the test suite targeting net6.0-windows, everything passes.

I might need to tweak the nuspec to make that version of the provider assembly match net6.0 when the RID is win-x86.

@filipnavara
Copy link
Contributor Author

So... for your Windows builds, you're using a target framework of net6.0-windows (or similar), rather than just net6.0, right?

Correct. net6.0-windows TFM and win-x86 RID for the executable project.

ericsink added a commit that referenced this issue Sep 18, 2022
…th rid win-x86, to verify the fixes for calling convention of function pointers in #506
@ericsink
Copy link
Owner

Fixed in 2.1.2, which is now on nuget. Also logged #518 as a leftover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants