Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/Write Big/LittleEndian support for Guid #86798

Closed
tannergooding opened this issue May 26, 2023 · 60 comments · Fixed by #87993
Closed

Read/Write Big/LittleEndian support for Guid #86798

tannergooding opened this issue May 26, 2023 · 60 comments · Fixed by #87993
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@tannergooding
Copy link
Member

Rationale

System.Guid represents .NETs support for Globally Unique Identifiers or GUIDs (sometimes also referred to as Universally Unique Identifiers or UUIDs).

This type represents a 128-bit value in the general format of xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where each x represents a hexadecimal digit, where the 4 bits of M represent the version and the 4 bits of N represent the variant number. This is sometimes referred to as the 8-4-4-4-12 format string. And while the string representation is "well-defined", the actual underlying order of these bytes has a few different representation and there are several variants of the general RFC 4122 that may require a specific ordering or even may limit specific bytes to have a particular meaning.

.NET's System.Guid follows a field layout best matching variant 2 which is identical to variant 1 outside the endianness. In particular, variant 1 is "big endian" and variant 2 is "little endian". Variant 1 and 2 are used by the current UUID specification and are by far the most prominent while variant 0 is largely considered deprecated. Outside of the endianness these variants are differented by minor bit pattern requirements.

Given this is largely an endianness difference and is otherwise just a minor difference in the bits used for M and N, we would prefer to not introduce a new type just to handle this and would instead prefer to introduce explicit APIs and overloads on Guid that help identify and handle these differences.

Proposed APIs

namespace System
{
    public partial struct Guid
    {
        // public Guid(byte[] value);
        // public Guid(ReadOnlySpan<byte> value);
        public Guid(ReadOnlySpan<byte> value, bool isBigEndian);

        // public byte[] ToByteArray();
        public byte[] ToByteArray(bool isBigEndian);

        // public bool TryWriteBytes(Span<byte> destination);
        // public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8
        public bool TryWriteBytes(Span<byte> destination, bool isBigEndian, out int bytesWritten);
    }
}

namespace System.Buffers.Binary
{
    public static partial class BinaryPrimitives
    {
        public static Guid ReadGuidBigEndian(ReadOnlySpan<byte> source);
        public static Guid ReadGuidLittleEndian(ReadOnlySpan<byte> source);

        public static bool TryReadGuidBigEndian(ReadOnlySpan<byte> source, out Guid value);
        public static bool TryReadGuidLittleEndian(ReadOnlySpan<byte> source, out Guid value);

        public static bool TryWriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static bool TryWriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);

        public static void WriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static void WriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);
    }

Drawbacks

As discussed on #86084, there is a general concern that users may not be aware that these other overloads exist -or- may not be aware that the difference between variant 1 and variant 2 is endianness and that .NET defaults to variant 2.

However, the same general considerations exists from exposing a new type such as System.Uuid. There are then additional considerations on top in that it further bifurcates the type system, doesn't easily allow polyfilling the support downlevel without shipping a new OOB package, and may further confuse users due to the frequent interchange of the GUID and UUID terminology.

After discussion with a few other API review team members, the general consensus was that shipping a new type is undesirable and we should prefer fixing this via new APIs/overloads and potentially looking into additional ways to surface the difference to users (such as analyzers, API documentation, etc).

Additional Considerations

Given the above, we may want to consider how to help point users towards their desired APIs given the overloads on Guid that do not require specifying endianness.

We can clearly update the documentation, but an analyzer seems like a desired addition that can help point devs towards specifying the endianness explicitly. Obsoleting the existing overloads was also proposed, but may be undesirable since the current behavior isn't "wrong", it just may be the undesired behavior in some scenarios.

We may also want to consider whether a static Guid NewGuid() overload that allows conforming to Version 4, Variant 1 is desired. The docs only indicate it is version 4 and calls into the underlying System APIs. It does not indicate if it produces Variant 1, Variant 2, or truly random bits for N.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 26, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 26, 2023
@tannergooding tannergooding added area-System.Runtime api-ready-for-review API is ready for review, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 26, 2023
@tannergooding tannergooding added this to the 8.0.0 milestone May 26, 2023
@ghost
Copy link

ghost commented May 26, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Rationale

System.Guid represents .NETs support for Globally Unique Identifiers or GUIDs (sometimes also referred to as Universally Unique Identifiers or UUIDs).

This type represents a 128-bit value in the general format of xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where each x represents a hexadecimal digit, where the 4 bits of M represent the version and the 4 bits of N represent the variant number. This is sometimes referred to as the 8-4-4-4-12 format string. And while the string representation is "well-defined", the actual underlying order of these bytes has a few different representation and there are several variants of the general RFC 4122 that may require a specific ordering or even may limit specific bytes to have a particular meaning.

.NET's System.Guid follows a field layout best matching variant 2 which is identical to variant 1 outside the endianness. In particular, variant 1 is "big endian" and variant 2 is "little endian". Variant 1 and 2 are used by the current UUID specification and are by far the most prominent while variant 0 is largely considered deprecated. Outside of the endianness these variants are differented by minor bit pattern requirements.

Given this is largely an endianness difference and is otherwise just a minor difference in the bits used for M and N, we would prefer to not introduce a new type just to handle this and would instead prefer to introduce explicit APIs and overloads on Guid that help identify and handle these differences.

Proposed APIs

namespace System
{
    public partial struct Guid
    {
        // public Guid(byte[] value);
        // public Guid(ReadOnlySpan<byte> value);
        public Guid(ReadOnlySpan<byte> value, bool isBigEndian);

        // public byte[] ToByteArray();
        public byte[] ToByteArray(bool isBigEndian);

        // public bool TryWriteBytes(Span<byte> destination);
        // public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8
        public bool TryWriteBytes(Span<byte> destination, bool isBigEndian, out int bytesWritten);
    }
}

namespace System.Buffers.Binary
{
    public static partial class BinaryPrimitives
    {
        public static Guid ReadGuidBigEndian(ReadOnlySpan<byte> source);
        public static Guid ReadGuidLittleEndian(ReadOnlySpan<byte> source);

        public static bool TryReadGuidBigEndian(ReadOnlySpan<byte> source, out Guid value);
        public static bool TryReadGuidLittleEndian(ReadOnlySpan<byte> source, out Guid value);

        public static bool TryWriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static bool TryWriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);

        public static void WriteGuidBigEndian(ReadOnlySpan<byte> destination, Guid value);
        public static void WriteGuidLittleEndian(ReadOnlySpan<byte> destination, Guid value);
    }

Drawbacks

As discussed on #86084, there is a general concern that users may not be aware that these other overloads exist -or- may not be aware that the difference between variant 1 and variant 2 is endianness and that .NET defaults to variant 2.

However, the same general considerations exists from exposing a new type such as System.Uuid. There are then additional considerations on top in that it further bifurcates the type system, doesn't easily allow polyfilling the support downlevel without shipping a new OOB package, and may further confuse users due to the frequent interchange of the GUID and UUID terminology.

After discussion with a few other API review team members, the general consensus was that shipping a new type is undesirable and we should prefer fixing this via new APIs/overloads and potentially looking into additional ways to surface the difference to users (such as analyzers, API documentation, etc).

Additional Considerations

Given the above, we may want to consider how to help point users towards their desired APIs given the overloads on Guid that do not require specifying endianness.

We can clearly update the documentation, but an analyzer seems like a desired addition that can help point devs towards specifying the endianness explicitly. Obsoleting the existing overloads was also proposed, but may be undesirable since the current behavior isn't "wrong", it just may be the undesired behavior in some scenarios.

We may also want to consider whether a static Guid NewGuid() overload that allows conforming to Version 4, Variant 1 is desired. The docs only indicate it is version 4 and calls into the underlying System APIs. It does not indicate if it produces Variant 1, Variant 2, or truly random bits for N.

Author: tannergooding
Assignees: -
Labels:

area-System.Runtime, api-ready-for-review

Milestone: -

@danmoseley
Copy link
Member

I don't think I see a reasoning for why these are worth adding other than they are in use. When is this format needed? Is there any relationship with execution on big endian architecture, or not particularly?

@tannergooding
Copy link
Member Author

When is this format needed?

The current UUID spec has 2 commonly used formats variant 1 and variant 2. COM and therefore most of Windows uses variant 2. Many other domains use variant 1 instead and many of them were called out on the other thread linked above.

The difference between the two, in terms of layout, is variant 1 is big endian and variant 2 is little endian.

Thus, these functions are needed for users to be able to correctly interact with such systems and to support the full UUID spec.

Is there any relationship with execution on big endian architecture, or not particularly?

The relationship is to how a sequence of raw bytes is interpreted. While machines may operate in big or little endian mode, endianness comes up in many contexts. Networking is a large one where bytes are almost exclusively sent in big endian format (so much so that a common description is "network order").

@Kirill-Maurin
Copy link

It seems that the history of the worst feature of the .NET BCL (DataTime.Kind) has not taught anyone anything

@tannergooding
Copy link
Member Author

tannergooding commented May 26, 2023

This is not a Kind, it is an endianness concern.

The proposed Uuid from #86084 would have the same consideration because the official UUID spec defines and supports both variant 1 and variant 2. Thus, you could not expose a type called Uuid and have it only support 1.

You could have some UuidVariant1, but that still comes with the same general considerations and problems. It still introduces additional confusion to end users on which to use and when and surfaces what is effectively a serialization concern into the exposed type system.

This ultimately comes down to:

  1. The official UUID spec does not itself have a de-facto layout.* It defines and supports both variant 1 and variant 2.
  2. The difference between variant 1 and variant 2 comes in two parts. The primary difference being the endianness of the layout. The other is that in creation of the guid, there may be a specific pattern required for the 4-bit N specifier to differentiate which variant it is, but not all systems follow that.
  3. Given the above, any new System.Uuid type would itself need to support the exact new API surface being proposed for Guid in Read/Write Big/LittleEndian support for Guid #86798 such that it could be used for either variant 1 or variant 2 scenarios
  4. Given the above, we are down to a scenario where users are requesting a new type that only differs in behavior in how new Uuid(byte[]) and byte[] ToByteArray() behave. The difference is that one uses Read/WriteInt32BigEndian and the other uses Read/WriteInt32LittleEndian
  5. Introducing a new type simply to handle a minor behavioral difference on reading/writing raw byte sequences is generally undesirable. Not only is this not how we handle any other built-in type, but it introduces the risk of confusing users as to which type should be used and when.
  6. It introduces interchange and back-compat problems, particularly for existing APIs that are already using Guid because its been around for 20 years and has been the thing to use for both variant 1 and variant 2 types. Such APIs now have to decide to support one, the other, or both and must determine how to interop between other systems that are already taking one, the other, or both.
  7. The general consideration of which to take in managed code doesn't matter. The only time it does matter is when you are converting to or from a raw byte sequence, such as for serialization purposes.

Edit: The spec does largely detail itself following variant 1 and describes it as "network order". With most of the callouts to variant 0/2 being noted as backwards-compatible, and variant 3 being reserved. But, that does not preclude the need to work with the other variants/versions nor the general descriptions/support that exists in the spec covering them

@DaZombieKiller
Copy link
Contributor

It seems that the history of the worst feature of the .NET BCL (DataTime.Kind) has not taught anyone anything

That isn't quite comparable, DateTime.Kind represents information about an instance. These APIs are about the binary representation of a GUID at a serialization boundary only. The internal byte order of a System.Guid isn't something that can differ per instance.

@tannergooding
Copy link
Member Author

This is not a Kind, it is an endianness concern.

We likewise do not expose the UUID versions or other information in the type system, nor would we.

@aloraman
Copy link

Several nitpicks:

  1. Guids (Uuids) in the wild do support RFC-4122, but in practice they are just containers for 16 bytes worth of data, i.e., any xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx-like value can be used, despite not conforming to requirements for variant/version flag allowed values.
  2. Internal structure of Guid (Uuid) is not just an implementation detail, because it is externally observable. And I'm not talking about difference between byte array and string construction - there's another observable behavior - sorting. System.Guid implements IComparable and, therefore, has a natural order. And that order is the same in .NET application and in SqlServer, but doesn't match the order in every other place. Major PITA when trying to use it to implement bounding-box optimizations for containment checks.
  3. All "It's not .NET way" is rather nonsensical. When I look on proposed API - all I can think about is that it has the same eerily feel as support for unsigned arithmetic operations in Java. Thankfully, we have System.UInt32, not a bunch of XXXUnsigned methods in System.Int32. And Int32/UInt32 don't have a minor difference in bit patterns - they are the same! Well, it was 20+ years ago, yes... But the same goes for System.DateOnly/System.TimeOnly, which are a new thing. And when DateOnly/TimeOnly were designed, it was stated that new types are preferrable over a maze of obscurely named methods on System.DateTime.

So, IMHO, separate Uuid type is preferrable over the proposed API. I don't think System.Guid should be obsoleted, rather both Guid and Uuid should be available, so one or the other can be used in a specific scenario. If only we could come with better naming. GUID sound so much nicer than UUID.

@DaZombieKiller
Copy link
Contributor

System.Guid implements IComparable and, therefore, has a natural order.

I'm not sure how this is relevant to byte order (endianness). Can you clarify? The proposal is not to change System.Guid's layout, that will remain the same. This is about offering new APIs to use at the serialization boundary.

System.Uuid would be completely identical to System.Guid with the exception of the serialization methods, which is where some of the concerns regarding its usefulness come from. It has minimal benefit over just using System.Guid for anything but serialization (and that benefit only applies if you expect to serialize/deserialize in big endian.)

Internal structure of Guid (Uuid) is not just an implementation detail, because it is externally observable.

That said, this part is still accurate. System.Guid currently has the same layout as the Win32 GUID structure, which allows it to be used in blittable interop. That does not affect this proposal because it's not about changing that layout, but it is part of why System.Guid's layout will most likely not change.

And when DateOnly/TimeOnly were designed, it was stated that new types are preferrable over a maze of obscurely named methods on System.DateTime.

As mentioned earlier, the problems with DateTime aren't quite comparable to this situation. DateTime's issues stem from it representing more than one kind of data, while this is about serialization. System.Guid can still only represent one kind of data: a GUID/UUID.

@tannergooding
Copy link
Member Author

Internal structure of Guid (Uuid) is not just an implementation detail, because it is externally observable

This has nothing to do with the internal structure, but rather has everything to do with the implementation of IComparable. We could choose to treat that as UInt128 or 2x UInt64 to be "more efficient" if we really wanted to.

We could choose to implement Guid as a byte[] or as 16x doubles each containing a byte between 0 and 255. Those are implementation details that don't matter to the consumer of the API and how it operates. We don't do these other approaches because they aren't efficient and limit the broader usability of the types.

Things like IComparable just have to be consistent and the behavior of IComparable on a Guid that is created using new Guid(byte[], isBigEndian: true) would be identical to the proposed Uuid type.

And Int32/UInt32 don't have a minor difference in bit patterns - they are the same!

Yes, but they have a strong semantic difference that appears as part of comparisons, addition, subtraction, multiplication, division, remainder, conversions, min, max, and almost every other operation you can think of.

Guid vs Uuid have a minor semantic difference in that when reading from or writing to a raw byte sequence, you swap the endianness. We correspondingly do not have UInt32LittleEndian and UInt32BigEndian.

But the same goes for System.DateOnly/System.TimeOnly, which are a new thing.

This also comes about from perf advantages and significantly reduced complexity that frequently comes up in common usages of the types.

What you've effectively asked is that the .NET BCL expose:

public readonly struct Guid { }
public readonly struct Uuid { }

You could give these any number of names:

public readonly struct UuidLittleEndian { }
public readonly struct UuidBigEndian { }

public readonly struct UuidVariant1 { }
public readonly struct UuidVariant2 { }

public readonly struct UuidMachineOrder { }
public readonly struct UuidNetworkOrder { }

etc

The Uuid spec also covers that it encodes "version" information (the 4 M bits) in addition to the "variant" information (the 4 N bits). This does not mean we would or should also expose UuidVersion5 just to handle that semantic. We would likewise not want to add or enforce validation that Uuid or Guid only allow in their respective variants.

This is not how .NET exposes types in the BCL today, and its not something that we want to do moving forward either. We want to grow and expand existing types to support new scenarios instead.

If we were designing this today, without any prior concerns, we'd probably name it System.Uuid and it would have the exact same methods as Guid including those covered in this proposal. We would then similarly reject a proposal to expose some System.Guid or System.MsGuid type.

@jkotas
Copy link
Member

jkotas commented May 26, 2023

Do you have examples in our own libraries or in the code out there that would use these APIs?

We do provide efficient allocation-free access to underlying Guid bytes. If needed, anybody can write their own binary serializer/deserialized that fits the given format in just a few lines. I do not think we need to be providing all possible variants of binary serializers/deserializers for BCL types. For example, we do not provide helpers for different variants of RLE-encoding of integers either.

#85891 is very similar proposal, it is also asking for providing a specific binary serializers/deserializers helpers.

@tannergooding
Copy link
Member Author

Do you have examples in our own libraries or in the code out there that would use these APIs?

This was opened because we got an issue (linked in the top proposal) that had a massive influx of upvotes/support.

Our default behavior for new Guid/ToByteArray/TryWriteBytes is efficient, but can also be error prone and less efficient for users. Such users working with variant 1 UUIDs then need an additional ReadInt32LittleEndian/WriteInt32BigEndian, slice, ReadInt16LE/WriteInt16BE calls to fix it up the span prior to new Guid() or post ToByteArray/TryWriteBytes. They then need to pass the original span back through to their target API. This can often require additional copying or work beyond the proposed APIs above. This is different from say DateTime or TimeSpan where the way to roundtrip is to use the same APIs over the Ticks property, so users can already do it efficiently and with no loss of perf.

These APIs cover the core need, efficiently, and help make it visible that the existing APIs may not do exactly what the user may be expecting.

@tannergooding
Copy link
Member Author

It certainly would not be an end of the world scenario if these weren't exposed. But given the number of upvotes on the original issue asking for a System.Uuid type and the additional clarity this can bring, I do think it's worthwhile.

-- There is notably still a decent number of users from the System.Uuid proposal that don't want this new functionality, they only want System.Uuid, so the exact number of satisfied users won't quite be the same; but exposing Uuid is not something that would have passed API review at all and it would have resolved down to this proposal or do nothing anyways.

@tannergooding
Copy link
Member Author

tannergooding commented May 26, 2023

The user implemented code would effectively be:

public byte[] ToBigEndianByteArray(Guid guid)
{
    byte[] tmp = guid.ToByteArray();
    WriteInt32BigEndian(tmp, ReadInt32LittleEndian(tmp)):
    
    Span<byte> tmp2 = tmp.Slice(4);
    WriteInt16BigEndian(tmp2, ReadInt16LittleEndian(tmp2)):

    tmp2 = tmp2.Slice(2);
    WriteInt16BigEndian(tmp2, ReadInt16LittleEndian(tmp2)):

    return tmp;
}

and similar for the inverse direction

@aloraman
Copy link

I'm not sure how this is relevant to byte order (endianness). Can you clarify?

It affects the implementation of sorting/serialization (see https://github.com/dotnet/runtime/pull/81650/files ) and corresponding performance cost.

System.Uuid would be completely identical to System.Guid with the exception of the serialization methods, which is where some of the concerns regarding its usefulness come from.

Well, maybe author of #86084 just wants consistency between string and byte serialization, but I want more. That is, if System.Uuid is to be added, let's make it as feature rich as it is in different programming languages, e.g., let it be a network-ordered set of 16 octets, with vectorizable initialization from string and byte array, have it to be transparently convertible to UInt128, so it will be easy to do arithmetic, sorting and so on. It is such widely used primitive - it's shameful there's no similar primitive available out-of-box in .NET. Let System.Guid continue to be GUID - that's fine, but let's not pretend it doesn't have a myriad of quirks and inconsistencies.

That isn't quite comparable, DateTime.Kind represents information about an instance.

But the trouble is just the same. It takes just a single element in a large chain of interacting code fragments with multiple serialization boundaries to mishandle or loose the DateTimeKind data to completely break the process. Which results in garbage data and countless hours spent at bugfixing.

This is about offering new APIs to use at the serialization boundary.

But introduction of these new APIs won't solve the problem. (The aforementioned problem being inconsistency between variant-1 and variant-2 compatible treatments of binary and string serialization). You can always implement such API yourself (and people do implement similar API). But 'ordinary' enterprise programmer rarely interacts with serialization boundary directly - it's handled by 1st and 3rd party libraries for serialization/model binding/ object-relational mapping. The problem comes from inconsistencies in handling Guid/Uuid data within these libraries - which is not observable without digging through internals of these libraries. Well, maybe some libraries will replace their implementations with these API's, but even if they do - the problem will persist. Solving the problem will cause them either to break backward compatibility, which isn't what good libraries do often, or to pass BE/LE switches up to the API surface, which is, honestly, looks ugly and javaesque. Separate System.Uuid type will, however, allow a parallel construction process - treat System.Guid in backwards compatible way, treat System.Uuid in variant-1 native way.

This was opened because we got an issue (linked in the top proposal) that had a massive influx of upvotes/support.

There's not much of a surprise here. The issue touches on a major sour subject for modern enterprise programming. Back in the day you had full MSFT-stack, where GUID behaved consistently in OS, DB (SqlServer) and codebase. Nowadays, in the clouds and heterogenous applications, inconsistencies with other programming stacks are a consistent source of PITA. A programmer can understand the difference between Uuid V1 vs V2, but it doesn't stop the stream of questions/bugs, regularly risen by QA, PM/PO, Analysts and Power Users. "Why the order is different in-app and in-db?", "Why the order of digits in Swagger and DbExplorer is different?". Difference between uuid and uniqueidentifier is like the third largest problem with database primitives for migration between SqlServer and PostgreSql (the first two being the lack of DateTimeOffset and different treatment of text-like types)

@kolosovpetro
Copy link

I have experienced the following case using GUIDs.
When someone calls Guid.NewGuid or passed it not as GUID, but as a parameter in the request, forming it as

  1. through a call to ToByteArray
  2. through a call to ToByteArray that called Convert.FromHex
  3. through a call to ToString

instead of a special correct generation algorithm, then problem appears as sequence of bits produced by ToString() method differs from ToByteArray() etc.

Generally, GUID serves its purpose fine and there is no need to touch or modify it. As I see new API methods will just confuse GUID's API. Also, the motivation of such API extension is not clear from first glance.

Instead, the UUID type may be added to BCL along with MSDN documentation that explicitly states the problem UUID type solves. It is much simpler just to fetch out of box methods ToString(), ToByteArray() etc. to produce similar sequence of bits in both, byte array and string representation of 16-bit number.

@Szer
Copy link

Szer commented May 27, 2023

5 cents regarding my recent experience with UUIDs on other platforms.

I needed UUID v7 support (sortable one). In JVM it just works with java.util.UUID because if I would parse some API response where UUID is serialized as a string and then store it in PGSQL, I know that property of UUID v7 won't be lost on an Application <-> DB boundary because of byte order.

With System.Guid and the proposed API I need to go into every single driver source code (ADO.Net, NpgSQL, DataStax, etc) to make sure that driver serializes my Guid correctly, otherwise, my data will be corrupted. Different byte order will make the sortable column non-sortable and the index magically becomes non-clustered.

Moreso, UUID spec for new UUID versions mentions network order
https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html#section-6.7-4

UUIDs created by this specification are crafted with big-ending byte order (network byte order) in mind. 
If Little-endian style is required a custom UUID format SHOULD be created using UUIDv8.

How UUID v7 can be implemented correctly with the existing System.Guid and the new proposed API?

@Szer
Copy link

Szer commented May 27, 2023

The business case for UUIDv7 - cursor-based pagination which requires sorting.

Why UUID in the first place? I need a random, unique, opaque identifier.
Cursor-based pagination requires sortability as well, which v4 doesn't have as a property. Workaround - introduce additional cursor tokens/columns/ids for such pagination, but why?

Snowflake is dead, UUID v7 is the industry standard now.
So I would really like to hear how this proposal will help us build industry-graded applications with new UUID versions in mind

@tannergooding
Copy link
Member Author

It affects the implementation of sorting/serialization (see https://github.com/dotnet/runtime/pull/81650/files ) and corresponding performance cost.

Endianness has a relatively minor impact on the implementation in that it determines which fields are compared first vs last and in a couple edge cases involving construction, it determines if any byte swapping is required.

That is, the difference between a big endian layout and little endian layout is in the worst case scenario no more than 1 additional instruction on any hardware which RyuJIT supports that has shipped in the last 17 years. We have slightly less efficient implementations in a few cases, but there's nothing stopping us from optimizing those if it is considered a core need.

For sorting, Guid today already implements itself as such that 00000001-0000-0000-0000-000000000000 is less than 00000002-0000-0000-0000-000000000000 and so on. That is, from the printed string you could functionally remove the dashes and treat it as a single UInt128 integer literal. Uuid would function identically as the desire is to not compare based on the field layout, but rather on the type as a whole. Regardless of the layout, the Guid is functionally a 128-bit unsigned integer and it compares itself from most significant byte to least significant byte.

e.g., let it be a network-ordered set of 16 octets

This has no impact on the actual vectorizability of the code and in fact is the least efficient way to set up the bytes. The "ideal" layout performance wise is 2x uint64 fields that are stored in the same endianness format as the native machine (typically little endian). This is true across all possible operations that the type could support except for serialization to or from a network order byte sequence where it requires 1 additional instruction. Inversely, storing as forced big endian layout requires 1 additional instruction as part of every other operation being doing, including comparisons.

with vectorizable initialization from string and byte array,

We already can do this on Guid and are doing it in some of the locations today. We could optimize even more of the scenarios, but it hasn't bubbled up as a core need thus far. User input can help direct that to happen.

have it to be transparently convertible to UInt128

This is a non-starter, we do not transparently convert between non-equivalent types like that as it breaks type safety, introduces a range of ambiguities and risks of breaking change, etc. It goes against many of the core Framework Design Guidelines.

but let's not pretend it doesn't have a myriad of quirks and inconsistencies.

There are no real quirks or inconsistencies in Guid. It represents a Uuid and can represent any 128-bit sequence of data. It supports all the core operations one would expect a Uuid to support, it behaves correctly for equality, comparison, string formatting, hashing, and every other API we expose.

The one piece of functionality it is missing is the ability to easily serialize it as a big-endian sequence of bytes (that is, serialize it in a format equivalent to a variant 1 UUID). This proposal adds that one piece of functionality.

As has been stated many times, some proposed System.Uuid would be literally identical to System.Guid except the constructor would use ReadInt32BigEndian rather than ReadInt32LittleEndian and the ToByteArray method would use WriteInt32BigEndian rather than WriteInt32LittleEndian.

There are some minor adjustments that could be made to field layout, but nothing that is actually meaningful to the implementation or to the user-observable semantics of the type in safe code and nothing that would meaningfully impact the performance or usability of the type.

It takes just a single element in a large chain of interacting code fragments with multiple serialization boundaries to mishandle

This is really a non-argument. The same "concern" exists for serializing any primitive type to the network (char, decimal, double, short, int, long, nint, float, ushort, uint, ulong, nuint, etc). The same "concern" exists every time someone uses BinaryReader to interact with an ELF/PE file, or when reading a UTF16-BE file on disk. The same "concern" would exist for interacting between code that takes Uuid and code that takes Guid, particularly for the 20 years of code that is already taking Guid and then manually doing the endianness fixups so that it serializes in network order (big endian).

It takes just a single element in a large chain of interacting code fragments with multiple serialization boundaries to mishandle

There is no data to lose here. The serialization process already includes all bytes, the only thing that a user can mess up is that they use isBigEndian: true on one side and isBigEndian: false on the other. The same issue would exist for someone using Uuid on one side and Guid on the other (which will happen). The same issue would exist when working with another language and picking the wrong one of their types/methods.

This is something that is trivially caught and handled by a single test for the APIs that perform serialization. The Guid 00010203-0405-0607-0809-0A0B0C0D0E0F is sufficient for validating that endianness remains correct/expected as you simply need to validate that input.ToString() == output.ToString(). If it does, then the bytes were successfully serialized using the same endianness on both sides regardless of what the underlying field layout happens to be.

But introduction of these new APIs won't solve the problem. (The aforementioned problem being inconsistency between variant-1 and variant-2 compatible treatments of binary and string serialization).

There is no inconsistency with regards to string serialization. String serialization is deterministic regardless of field layout, regardless of endianness, etc.

There is currently today user error in the use of the binary serialization APIs because some users don't pick up that ToByteArray() and new Guid(byte[]) are exclusively expecting the bytes to be in a little-endian format.

This is functionally no different from a developer using BinaryReader.ReadInt32 on the byte sequence of a network packet and now complaining that value.ToString() is "inconsistent". It isn't, the developer used the wrong API and didn't account for the fact that BinaryReader reads using LittleEndian but network packets are transmitted as BigEndian.

The fix is for the user to use the correct BinaryPrimitives API and to explicitly read the Int32 as BigEndian such that the read byte is now correct for the little endian machine they are running against.

But 'ordinary' enterprise programmer rarely interacts with serialization boundary directly - it's handled by 1st and 3rd party libraries for serialization/model binding/ object-relational mapping

Exposing Uuid is directly surfacing that serialization difference in the type system where instead it should in fact be hidden and only handled by the serialization boundaries by correctly passing isBigEndian based on the requirements of the specification they are interoperating with.

If the system requires a variant 1 UUID, they use isBigEndian: true. If the system requires a variant 2 UUID, they use isBigEndian: false

Ultimately which is used doesn't really matter provided that both the producer and consumer agree on which is being used. In the majority case this isn't something to surface to the user. In the case you have a general purpose serialization library, then it is something to surface much as it would be for int32

treat System.Guid in backwards compatible way, treat System.Uuid in variant-1 native way.

System.Guid is already being used, succesfully, for both variant 1 and variant 2 scenarios. Users who are utilizing it for variant 1 currently have to roll their own equivalents to the methods proposed above.

Exposing a new System.Uuid type doesn't fix the general problem. It compounds the existing problem and there will be developers who take System.Uuid and utilize it for variant 2 as well. It will likely make the scenario even worse in practice due to UUID being the more cross platform/modern terminology. Developers will have to rationalize the subtle differences between these two types that use historically interchanged terms. They will have to rationalize that there is 20 years worth of code already utilizing System.Guid for both variant 1 and variant 2 scenarios. They will have to rationalize the interop and exchange between these two types. They will have to rationalize that the interchange of these types directly exposes a serialization concern to the type system. They will have to rationalize that converting between the types may introduce additional bugs.

A programmer can understand the difference between Uuid V1 vs V2, but it doesn't stop the stream of questions/bugs, regularly risen by QA, PM/PO, Analysts and Power Users. "Why the order is different in-app and in-db?"

The formatted string should never differ between the two systems. If it does, you have a bug. The formatted string would never differ between equivalent Guid and Uuid.

The byte sequence of a Uuid written using little-endian representation would be identical to the byte sequence of a Guid written using little-endian representation. The same is true for Uuid vs Guid written using big-endian representation.

The bugs occur because developers do not correctly account for the fact that they are reading a big-endian (network-order, variant 1, etc) ordered byte sequence. The exact same, but inverse, scenario would exist if we always serialized as big-endian. That is, there would be developers that pass in something that is in little-endian (variant 2) order and then be confused that the byte order as visualized by ToString is different from what they expected.

Part of this comes from us not having an API that allows them to trivially work with big-endian data. That is what this proposal covers exposing.

The other part comes from the existing APIs not clearly surfacing that there is potentially an endianness concern users should be aware of. That endianness concern will not be addressed by System.Uuid, it will likely only be compounded given the reasons above, particularly that UUID variant 2 is itself an endianness swapped byte sequence to UUID variant 1. It may be partially addressed by having the new overload but would likely only be truly addressed by obsoleting the current APIs and requiring that users always pass in the desired endianness instead.

A subtle but potential alternative would be to call these Guid.CreateFromVariant1(byte[]), Guid.CreateFromVariant2(byte[]), Guid.ToVariant1ByteArray(), and Guid.ToVariant2ByteArray(). That is, however, inconsistent with how we expose other APIs where the real concern is endianness and does raise potential user considerations around strictness and whether or not such APIs might validate it is a conforming variant 1/2 byte sequence.

@tannergooding
Copy link
Member Author

As I see new API methods will just confuse GUID's API

Instead, the UUID type may be added to BCL along with MSDN documentation that explicitly states the problem UUID type solves.

This is not how we typically view such things in API review.

We do generally consider the concerns around additional overloads causing potential confusion. However, new overloads to existing APIs are most frequently considered significantly less confusing than a new but similar type.

We do also factor in the chance for user confusion. In the case of DateTime vs DateOnly and TimeOnly. The names make it fairly clear that one is a combination and the others are "only" the respective part.

In the case of Guid vs Uuid they are frequently and historically interchanged terms. We then have to account for the 20 years of history in which Guid has been used for both variant 1 and variant 2 purposes. We then also have to account for the fact that Uuid is itself not a term (generally or even in a domain specific context) that exclusively means variant 1. The official spec excplicitly covers variant 0, variant 1, variant 2, and variant 3 UUIDs, the latest version of that spec largely covers variant 2 but it is not exclusive to it and thus Uuid itself is equally as ambiguous as Guid and will take all the current concerns and then compound on them.

On the other hand, exposing ToByteArray(bool isBigEndian) and .ctor(byte[], bool isBigEndian) methods as proposed is not a new thing. We have successfully done this on a great number of types. We have successfully surfaced these endianness concerns for the various primitive types, we have surfaced them as part of the new generic math feature, we have surfaced them on types such as BigInteger and more.

So while some users are surfacing the potential concern about ambiguity, it does not at all match the concrete experience we have from other types that have already done this exact thing to great success. And its worth noting according to API usage metrics (https://apisof.net/), such APIs are actually used by developers in a non-trivial number of projects, so its not like we exposed them and simply no one uses them.

is much simpler just to fetch out of box methods ToString(), ToByteArray() etc. to produce similar sequence of bits in both, byte array and string representation of 16-bit number.

That is not how any other type provided by the BCL works and is not how types work in most other languages/ecosystems either.

Numeric strings are functionally always displayed in big endian format. On the other hand, most machines natively operate in little-endian format (there are relatively few exceptions such as IBM System z9) and thus the raw byte sequence in memory is swapped compared to the byte sequence displayed by string formatting for almost every type.

@tannergooding
Copy link
Member Author

How UUID v7 can be implemented correctly with the existing System.Guid and the new proposed API?

You pass isBigEndian: true. Network Order == Big Endian

As I've repeatedly detailed above, ToString() always displays data in big endian format with the most significant byte printed first. This is irrespective of the underlying field layout.

Likewise, System.Guid is always compared considering the most significant byte and thus 00000001-0000-0000-0000-000000000000 is less than 00000002-0000-0000-0000-000000000000 which is less than 10000000-0000-0000-0000-000000000000

The behavior of Guid vs the alternatively proposed Uuid would be identical for ToString, for Equals, for CompareTo, and every other API except for ToByteArray and new Uuid(byte[]) where the subtle difference is that Guid defaults to isBigEndian: false and Uuid defaults to isBigEndian: true.

We are not going to expose an entirely new type just to minorly differentiate a serialization/deserialization only concern. We would not ship a UInt32BigEndian just to guarantee a big endian field layout just because UInt32 is itself typically little-endian on most hardware.

So I would really like to hear how this proposal will help us build industry-graded applications with new UUID versions in mind

All the different variations of UUID are 16-byte/128-bit integers. They minorly differ in terms of how they should be serialized (that is what order the bytes should be emitted) and in some cases what values are expected for particular nibbles in the byte sequence to identify version and variant.

Today, Guid is already capable of supporting any different 16-byte sequence. The one real issue is that we don't have any APIs that make it trivial to serialize/deserialize the one alternative byte sequence that currently exists. That is, today we only make it easy to serialize/deserialize little-endian ordered data.

The new APIs make it easy to also serialize/deserialize big-endian ordered data (network order) and thus make it much simpler for developers to correctly handle UUIDv7/UUID variant 2/etc values at the serialization/deserialization boundary. They likewise make it trivial to continue working with the 20 years of existing types/APIs, many of which already use System.Guid for the same purpose.

The new APIs completely remove the abiguity that a second type would introduce and the general interchange problems that would arise. They completely remove the consideration of whether existing APIs taking System.Guid for UUIDv7/`variant 2/etc values would need to deprecate or obsolete their APIs.

Code that is already working and already doing the right thing continues to work and do the right thing. They can potentially simplify their own existing wrappers that are fixing up the endianness to simply use these new APIs. New or existing code that finds the byte order is not what they expected can now trivially use the new APIs to do the right thing.

@aneteanetes
Copy link

First of all: I urge you not to compare a data type that is unambiguous in many languages with an exaggerated representation of a particular case. By translating such comparisons, you divert our conversation from the point.

As for unambiguous behavior: "Except" is the opposite of "identical behavior." If you expose a new type, you can accurately describe its behavior. If you add flags, there will be additional cognitive load and ambiguity. And it's not just serialization. It's about compatibility with: data stores (not just RDBMS), other languages, interoperability with other languages, making language input easier for people with experience using the uuid type.

I just can't explain to a colleague (python) what a GUID is and how it differs from a uuid.

It seems to me that the current proposal can only distract us from creating applications with new versions of uuid, precisely because we will encounter type assignment ambiguities. Conversely, if we have two types guid and uuid, we can accurately separate them, understand the purpose of each separately and develop them in parallel, taking into account the needs of these particular types. Such an API does not make it easier for developers to process values, but only expands the field for decision-making, forcing each time to think about the ambiguity of design decisions 20 years ago.

I'm an average developer and I have no idea why guid supports any other byte sequence. If we talk about serialization, then there are a lot of ambiguities for me too, the absence of an API at all creates a situation where I have to think about things that I cannot know in advance. And by the way, as an average developer, I believe that the new API adds a few methods that can be added to a regular nuget package, I don't see the point of adding them to bcl like this.

If we talk about a new type as an alternative to this api (which is not correct), then creating a new type does not replace the old one in any way, but only adds new features. No one insists on marking the GUID as obsolete, it would be wrong, besides, there is a decision field in the GUID where it cannot be replaced. But the type is already overloaded with internal information, and the need to add new flags and methods only speaks to the need for a "new", unambiguous data type, which is in all other languages.

I would like to emphasize that the current interfaces work with the GUID type and do not require a transition to a new type. Scenarios are defined where the "correct" byte sequence needs to be used (and it's good that they are different, not specific - that way we can see a whole group of needs, not just one) - serialization, integration and disambiguation with other languages, use of different types guid for enterprise database, uuid for small projects) for different databases.

In general, in any design, new flags and methods only add to the ambiguity. A type whose constructor needs to be passed multiple flags does not become unambiguous with the addition of a new flag. In addition to the fact that we separate the types of numbers into different bit depths to determine their size, they can also be stored in completely different ways!

P.S.
I cannot but agree with the thesis about the working code. The code really works, and really does the right thing. The proposed API (which fits perfectly into the nuget package) will help solve the serialization problem, which has already been solved one way or another, but not because the problem is serialization, but because the problem is in the type. And a more elegant and simple solution seems to be replacing the type with a new one, rather than adding flags, methods, and nuget packages.

@tannergooding
Copy link
Member Author

The difference is not only in the methods that take bytes as input, but also in the methods that output bytes.

Yes, sorry. This is a place I forgot to reiterate both sides of serialization/deserialization.

And this is what ToByteArray, TryWriteBytes, and the constructor that takes bytes force you to do.

Yes. APIs that deal with raw byte sequences require you to think about endianness. Not thinking about endianness when dealing with raw byte sequences will only lead to bugs.

When the output artifact of a library's functionality is 16 bytes, it is important to know exactly how they were created, whether simply by calling ToByteArray or using a construction like this.

Yes, you must understand the endianness when dealing with raw byte sequences. The same would be true for Uuid.

In that case, developers should be aware that they cannot simply call the constructor that takes bytes or use ToByteArray/TryWriteBytes methods. They must prepare the byte sequence beforehand because the code in these libraries uses the string representation as the source value.

Yes, they must be aware that ToByteArray and Guid(byte[]) currently require the bytes to be little-endian. The overloads give them the option of saying the bytes are instead in big-endian format, removing the need for them to write or maintain their own custom logic.

And this is something that the System.Guid API does not do.

It is explicitly something that System.Guid does. The APIs that deal with raw byte sequences are explicitly documented to be little-endian today.

From ToByteArray: https://learn.microsoft.com/en-us/dotnet/api/system.guid.tobytearray?view=net-7.0

You can use the byte array returned by this method to round-trip a Guid value by calling the Guid(Byte[]) constructor.

Note that the order of bytes in the returned byte array is different from the string representation of a Guid value. The order of the beginning four-byte group and the next two two-byte groups is reversed, whereas the order of the last two-byte group and the closing six-byte group is the same. The example provides an illustration.

The wording could be improved in some cases for the constructors.

The new APIs then allow developers to specify which format their raw byte sequence is in.

And this falls into the realm of explicit differences between two types, rather than remaining in the implicit realm of which serialization API was called.

There is no more difference between Guid and Uuid than there would be between a UInt128BigEndian and UInt128LittleEndian. There is only one way to represent a given value for the type.

Fixing technicalities using processes is not an efficient approach.

There is a fundamental requirement for developers doing binary serialization to use the same interpretation on both sides of the serialization/deserialization boundary. Having different types does not solve this problem.

@jeffhandley jeffhandley added api-needs-work API needs work before it is approved, it is NOT ready for implementation api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Jun 1, 2023
@jeffhandley
Copy link
Member

With the ongoing conversation here, I've removed the api-ready-for-review label and put this back into api-suggestion (after accidentally clicking api-needs-work at first).

@PJB3005
Copy link
Contributor

PJB3005 commented Jun 1, 2023

In that case, developers should be aware that they cannot simply call the constructor that takes bytes or use ToByteArray/TryWriteBytes methods. They must prepare the byte sequence beforehand because the code in these libraries uses the string representation as the source value.

If I am understanding correctly, this whole use case seems to rely on the developer getting a byte[16] representing a GUID/UUID from somewhere (such as a binary protocol or file format1), and then wanting to consume that as a Guid in .NET. I would expect developers working with such things to be aware of what endianness is. If they do not, I do not think any of the proposals given so far would ever save them.

The use case of "writing it to a database" is certainly one that has been brought up much in this discussion, however there is nothing special about it. I have already explained in my previous comment how there is (in most cases) only one valid thing for your database layer to do once it encounters a GUID. Judging by the code links posted, all of them do exactly that. You would have the exact same issues with a use case as simple as converting a binary file format to a textual format. It just happens that databases sometimes have switches that allow you to make a right here with two wrongs. The bug here happened the moment the developer passed the wrong endianness to new Guid(), and no combination of connection string madness is a valid way to fix it.

(This entire post can be inverted to go from reading a binary GUID to writing one)

Footnotes

  1. If you're getting a GUID from anywhere else like NewGuid() or Parse(), you wouldn't be running in into this.

@stephentoub
Copy link
Member

@tannergooding has done a good job of representing the viewpoints of those of us who maintain the .NET APIs. I understand not everyone is happy with those viewpoints, but at this point no additional arguments are being presented and the discussion has run its course. I'll mark this as api-ready-for-review again so that we can discuss the specifics of this proposal again at the next API review meeting opportunity and decide whether to add the proposed APIs or a subset of them or none at all. Thanks to all who shared their perspectives on the matter.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 2, 2023
@aloraman
Copy link

aloraman commented Jun 3, 2023

no additional arguments are being presented

Well, this discussion is now mostly an essay exchange, with timezone differences, job and sleep it takes a lot of time to produce another argument, so it mostly halted to a stop. Though

discussion has run its course

is probably correct. For single post here, there's one weighted and thought-out reply by @tannergooding here, but there are ten giggles and insults from the crowd on Discord as well. This is no longer maintainable. So, I'll address the latest arguments and will eagerly await this issue to be closed.


Bool parameters exist and are used in a plethora of places

At the cost of readability. Consider the aforementioned BigInteger:

var bytes = bigInt.ToByteArray(true, false); // is it signed little-endian? unsigned big-endian?

It's no surprise in the fact top search results for "bool method parameter" are generally negative, e.g., "What is wrong with boolean parameters?", "Is it wrong to use a boolean parameter to determine behavior", "Clean code: The curse of a boolean parameter", "Do Boolean Parameters Make You a Bad Programmer?"
From the top of my head, there are three general scenarios of using boolean parameters where readability doesn't suffer:

  • They are used within the same file where the method definition is referenced, i.e. Dispose(true) is fine as long as Dispose(bool disposing) or one of its overrides is present
  • The meaning is obvious from the method name, e.g. SetIsActive(true)
  • The parameter name is present, e.g. guid.ToByteArray(isBigEndian: true)

Proposed overloads are to be used outside Guid.cs file, so it's not the first case. Second case is largely absent in C# - due to property syntax availability. For third case, there are named parameter syntax in C#, but you can't actually enforce the usage (Though creating analyzer for such scenarios is a good idea). Inlay hints with parameter names alleviate the issue, though they are present only in IDE.


RFC-4122 explicitly defines itself as a sequence of named fields
Per the RFC, the field layout is the same between variant 1 (0b10x) and variant 2 (0b110)

Even the original RFC-4122 document from 2005 claims to describe only variant 1, and even specifically requires big-endianness by stating in the same section:

The fields are encoded as 16 octets, with the sizes and order of the
fields defined above, and with each field encoded with the Most
Significant Byte first (known as network byte order)

But newer drafts define a different structure for v7 and v8, and even redefine the field structure for versions 2-5. Compare the v1 structure

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           time_low                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           time_mid            |  ver  |       time_high       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|         clock_seq         |             node              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              node                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

with v4

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           random_a                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          random_a             |  ver  |       random_b        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|                       random_c                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           random_c                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

or v7

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           unix_ts_ms                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          unix_ts_ms           |  ver  |       rand_a          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |var|                        rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand_b                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Also, System.Guid is not just a Variant 2 implementation. Guid.NewGuid() produces values that claims to be Variant 1, which can be easily checked with the following code snippet:

Console.WriteLine(Convert.ToString((Guid.NewGuid().ToByteArray()[8] & 0xF0) >> 4,2))

It produces the 10xx pattern of Variant 1, not 110x of Variant 2

This issue notwithstanding, I hope Runtime and BCL will be prepared for eventual popularization of UUID v6/v7/v8


As for the "Why don't you write your own library" argument. It was addressed already, actually. And these libraries already exist. The problem is with integrating these libraries with other libraries. In .NET Community there's a general preconception against taking a hard dependency on another third-party library (fueled not only by recent drama with IdentityServer of ImageSharp, but the whole 20 years of history with open source in .NET circles). The simple, blittable (big-endian. Machines are generally little-endian, yes, but network is generally not. At least middle-endian is dead and forgotten) primitive type in BCL would help a lot in easing cross-integration and operation between these libraries. System.Uuid is the easiest one to implement.
If it is not to be, there are alternatives, but they are much farther away, beyond the horizon. That is, support for configurable size value type fixed buffers,

public struct Binary<int n>
{
     byte[n] _buffer;
}

with the support from EntityFrameworkCore in mapping these types (which would promote support for these types in other libraries) would, I hope, alleviate all the issues current UUID libraries have.

@vanbukin
Copy link
Contributor

vanbukin commented Jun 6, 2023

Before everything is completed, I would like to highlight a few points.
Developers have been facing the non-obvious behavior of this API for a very long time. Here are a few examples:

Creating your own primitive to solve the issues described here is not difficult. Moreover, I have been maintaining one such package myself since 2019.
As correctly noted above, the problem is integrating it somewhere.
As a developer, it disappoints me that I have to solve such issues and related problems myself. Primitives are the work that BCL should take on. In any other popular technology stack, there is no such issue at all simply because their roots do not go deep into Windows and its API.

@Ilchert
Copy link

Ilchert commented Jun 20, 2023

Made quick search in Githib of Guid.ToByteArray usage and there are 3 common usage:

  1. Just as random byte generator/identifier link, link. In this case it does not matter what ToByteArray returns.
  2. .NET <-> .NET serialization serialize - deserialize serialize - deserialize serialize - deserialize. In this case consistence between ToByteArray and new Guid(byte[]) is enought.
  3. Cross platform serialization serialize - deserialize serialize - deserialize serialize - deserialize. In this case authors already made workaround around Guid.

Also, some libs use own serialization mechanism for Guid npgsql.

I would like to suggest do not change Guid api to not affect case 1, 2 and add only proposed BinaryPrimitives api to provide standardized api for Guid ->Big/Little endian conversion that can be consumed in case 3 or replace own serialization mechanizm. Also, BinaryPrimitives distributes via nuget System.Memory and user can use new api without additional preprocessor conditions link.

@bartonjs
Copy link
Member

bartonjs commented Jun 22, 2023

Video

  • The BinaryPrimitives alternatives don't seem to be necessary given the new members on Guid, so they were cut.
  • We removed the is prefixes from the isBigEndian parameters
  • We discussed adding Guid..ctor(byte[], bool) and think it's OK to leave it off.
namespace System
{
    public partial struct Guid
    {
        // public Guid(byte[] value);
        // public Guid(ReadOnlySpan<byte> value);
        public Guid(ReadOnlySpan<byte> value, bool bigEndian);

        // public byte[] ToByteArray();
        public byte[] ToByteArray(bool bigEndian);

        // public bool TryWriteBytes(Span<byte> destination);
        // public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8
        public bool TryWriteBytes(Span<byte> destination, bool bigEndian, out int bytesWritten);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 22, 2023
@AlexRadch
Copy link
Contributor

Can you assign this issue to me?

@AlexRadch
Copy link
Contributor

public bool TryWriteBytes(Span<byte> destination, out int bytesWritten); -- new in .NET 8 does not exist yet.
Should it be created also?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 24, 2023
@huoyaoyuan
Copy link
Member

Should this supersede #53354 ?

@AlexRadch
Copy link
Contributor

Should this supersede #53354 ?

Yes. #53354 can be closed.

@AlexRadch
Copy link
Contributor

#30940 issue can be closed also.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.