Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Create method Guid.NewSequentialGuid() #22721

Closed
balivo opened this issue Jul 11, 2017 · 20 comments
Closed

Proposal: Create method Guid.NewSequentialGuid() #22721

balivo opened this issue Jul 11, 2017 · 20 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@balivo
Copy link

balivo commented Jul 11, 2017

Make a new method to create a sequential Guid.

@stephentoub
Copy link
Member

What is a "sequential Guid"? Do you mean one that uses the machine's MAC address rather than being entirely random?

@Clockwork-Muse
Copy link
Contributor

Note too that "sequential ids" of any sort have all kinds of problems, mostly dealing with recovery/concurrency scenarios. What sort of problem are you trying to solve here?

@balivo
Copy link
Author

balivo commented Jul 11, 2017

@stephentoub I use Guid (Prefix) + Bytes from DateTime (Ticks) to create a new guid.

@Clockwork-Muse currently I use to minimize index fragmentation when using Guids at database table PKs.

My class:

public static class IdentityGenerator
    {
        /// <summary>
        /// Create new sequential Guid
        /// </summary>
        /// <returns>Guid</returns>
        public static Guid NewSequentialGuid()
        {
            byte[] uid = Guid.NewGuid().ToByteArray();
            byte[] binDate = BitConverter.GetBytes(DateTime.UtcNow.Ticks);

            byte[] secuentialGuid = new byte[uid.Length];

            secuentialGuid[0] = uid[0];
            secuentialGuid[1] = uid[1];
            secuentialGuid[2] = uid[2];
            secuentialGuid[3] = uid[3];
            secuentialGuid[4] = uid[4];
            secuentialGuid[5] = uid[5];
            secuentialGuid[6] = uid[6];
            // set the first part of the 8th byte to '1100' so     
            // later we'll be able to validate it was generated by us   

            secuentialGuid[7] = (byte)(0xc0 | (0xf & uid[7]));

            // the last 8 bytes are sequential,    
            // it minimizes index fragmentation   
            // to a degree as long as there are not a large    
            // number of Sequential-Guids generated per millisecond
            secuentialGuid[9] = binDate[0];
            secuentialGuid[8] = binDate[1];
            secuentialGuid[15] = binDate[2];
            secuentialGuid[14] = binDate[3];
            secuentialGuid[13] = binDate[4];
            secuentialGuid[12] = binDate[5];
            secuentialGuid[11] = binDate[6];
            secuentialGuid[10] = binDate[7];

            return new Guid(secuentialGuid);
        }
    }

So, I call IdentityGenerator.NewSequentialGuid() to create a new Guid using "sequential" logic... Works fine, but I'm thinking about "NewSequentialGuid" as static member at "root" Guid struct...

EDIT: @karelz fixed code block formatting - ```c# instead of just ` (which is for inline code)

@jnm2
Copy link
Contributor

jnm2 commented Jul 11, 2017

How do you guarantee uniqueness? You have to assume multiple threads, multiple processes, and multiple machines all running this logic at the same time.

@Clockwork-Muse
Copy link
Contributor

Side note: SQL Server at least can generate unique, sequential ids for you. They even mention using them for index issues.
This wouldn't help with your "we generated it" prefix, though. And I don't know what RDBMS vendor you're using, either.

@JonHanna
Copy link
Contributor

How do you guarantee uniqueness?

MAC addresses and having the sequence system-wide is how the Windows UuidCreateSequential function does it (and how UuidCreate used to do it before the leaking of MAC addresses when not specifically requested was deemed a security risk).

@svick
Copy link
Contributor

svick commented Jul 11, 2017

I believe "sequential GUIDs" are designed to solve a very specific problem: you want to use GUIDs as a clustered primary key in a database. As such, I think this method might belong in code dealing with databases, but not in the general-purpose Guid type. Especially if the implementation does not use any of the standard GUID versions and does not guarantee that the generated GUIDs are globally unique, as I believe the one you showed does.

@balivo
Copy link
Author

balivo commented Jul 12, 2017

I agree with @svick it's too especial use and not general-purpose... Tks!

@JonHanna
Copy link
Contributor

Another reason is performance, as (in Windows anyway) the sequential algorithm is faster, especially for large batches.

@balivo
Copy link
Author

balivo commented Jul 12, 2017

How I could validate the generation of unique sequential Guids? I made some tests and everything works good... @jnm2

@jnm2
Copy link
Contributor

jnm2 commented Jul 13, 2017

@balivo Even across processes and machines?

@Clockwork-Muse
Copy link
Contributor

@balivo -
Validate them how? For uniqueness? For sequential-ness? Both would require you to keep a (presumably large) database of the ones you've seen, timestamped with the finest granularity you can manage (which may not be enough if your code is fast enough).

Note that attempting to validate the individual values is not only problematic, but doomed to failure. It's like trying to validate that alloc will give you a specific memory address.

@balivo
Copy link
Author

balivo commented Jul 13, 2017

@Clockwork-Muse uniqueness validation... The sequential-ness its ok for me...
I use this logic for last 9 years (approximately) and no conflicts between created Guids (I make some validations to verify this question, just for "duble"-safety)... I have a logic for validation of questions above (multi-thread and multi-process testing), I kept the applications running 30 days and no conflicts...

And I just use NewSequentialGuid at specific scope. In mobile apps, I create a new Order (SFA, example) and make an API request to save at database.

So @jnm2 ask about uniqueness, I don't have a sufficient powerful environment to make 100% test before Universe rest in peace (LoL)... I can upload my validation logic and we can make the test happens! ;)

@svick
Copy link
Contributor

svick commented Jul 13, 2017

@balivo I think for global uniqueness, you need theory, not tests. According to Wikipedia:

the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion

Version 4 GUIDs have 122 random bits. If I use the same formula, but with 60 random bits, which is what your IdentityGenerator has, then it says that to get 50 % probability of collision, you need 1.26 billion GUIDs. But to collide, all those GUIDs would have to be generated with the same value for DateTime.UtcNow.Ticks. Pessimistically, the period during which that's true can be 15 ms (I believe that's the value for older OSes).

That sounds good, until you realize this would mean 50 % probability of collision during every 15 ms interval. To get 50 % probability of collision during a larger interval, say, 10 years, the probability of collision during each interval would have to be much smaller. If I get my math right, it means the probability of collision for each 15 ms interval can be only 2.20 × 10-12. For that probability, you can have only 8720 GUIDs in each 15 ms interval, or 581 GUIDs per millisecond.

TLDR: If my calculations are correct, if you generate more than about 500 sequential GUIDs per millisecond, world-wide, then over a 10 year period, you will get a 50 % chance of at least one collision. That might be more than okay for your specific use-case, but I don't think it can be called "globally unique".

@balivo
Copy link
Author

balivo commented Jul 14, 2017

@svick tks...

DateTime.UtcNow have a 10ms precision...
https://msdn.microsoft.com/en-us/library/system.datetime.utcnow(v=vs.110).aspx

DateTime.Now is 15ms precision...
https://msdn.microsoft.com/en-us/library/system.datetime.now(v=vs.110).aspx

Sounds good at specific scope... But not for general-purpose.

Tks!

@Clockwork-Muse
Copy link
Contributor

@balivo - Not on Windows on corefx, it's not (I'm unsure where Linux has gone).

@JonHanna
Copy link
Contributor

@svick

Version 4 GUIDs have 122 random bits.

I would have assumed from the name that the proposal was to have Version 1 GUIDs, so if I'd wanted them in Windows-only code I'd use:

[DllImport("rpcrt4.dll", SetLastError=true)]
static extern int UuidCreateSequential(out Guid guid);

I understand that linux's uuid program lets one select version 1, and perhaps getting a batch-per-call could be used to reduce overheads, but I've never dealt with UUIDs in Linux myself except when a database was doing the generation work for me, so I'm not sure.

@mariuszkochanowski
Copy link

mariuszkochanowski commented Jul 16, 2017

Sequential guids compatible with Sql server implementation:
http://developmenttips.blogspot.com/2008/03/generate-sequential-guids-for-sql.html

public class SqlServerSequentialGuidGenerator
   {
       //        With UIDs As(
       //Select ID = 3, UID = cast('01000000-0000-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 2, UID = cast('00010000-0000-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 1, UID = cast('00000100-0000-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 0, UID = cast('00000001-0000-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 5, UID = cast('00000000-0100-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 4, UID = cast('00000000-0001-0000-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 7, UID = cast('00000000-0000-0100-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 6, UID = cast('00000000-0000-0001-0000-000000000000' as uniqueidentifier)
       //Union Select ID = 8, UID = cast('00000000-0000-0000-0100-000000000000' as uniqueidentifier)
       //Union Select ID = 9, UID = cast('00000000-0000-0000-0001-000000000000' as uniqueidentifier)
       //Union Select ID = 10, UID = cast('00000000-0000-0000-0000-010000000000' as uniqueidentifier)
       //Union Select ID = 11, UID = cast('00000000-0000-0000-0000-000100000000' as uniqueidentifier)
       //Union Select ID = 12, UID = cast('00000000-0000-0000-0000-000001000000' as uniqueidentifier)
       //Union Select ID = 13, UID = cast('00000000-0000-0000-0000-000000010000' as uniqueidentifier)
       //Union Select ID = 14, UID = cast('00000000-0000-0000-0000-000000000100' as uniqueidentifier)
       //Union Select ID = 15, UID = cast('00000000-0000-0000-0000-000000000001' as uniqueidentifier)
       //)
       //Select* From UIDs Order By UID
       private static readonly int[] SqlOrderMap = { 3, 2, 1, 0, 5, 4, 7, 6, 9, 8, 15, 14, 13, 12, 11, 10 };


       private readonly byte[] _currentGuidArray;
       private Guid _currentGuid;

       public Guid CurrentGuid
       {
           get { return _currentGuid; }
       }

       public SqlServerSequentialGuidGenerator()
           :this(Guid.NewGuid())
       {
       }

       public SqlServerSequentialGuidGenerator(Guid initialGuid)
       {
           _currentGuid = initialGuid;
           _currentGuidArray = initialGuid.ToByteArray();
       }

       public  Guid Generate()
       {
           byte[] guidArray = _currentGuidArray;
           var orderMap = SqlOrderMap;
           for (int mapIndex = 0; mapIndex < 16; mapIndex++)
           {
               int bytesIndex = orderMap[mapIndex];
               guidArray[bytesIndex]++;
               if (guidArray[bytesIndex] != 0)
               {
                   break; // No need to increment more significant bytes
               }
           }
           _currentGuid =  new Guid(guidArray);
           return _currentGuid;
       }
   }

@Clockwork-Muse
Copy link
Contributor

Um, if you're trying to use sequential GUIDs on SQL Server, I'd probably just ask the server to create them. Which would solve the uniqueness/concurrency problem (ie, running the given code from multiple machines is going to result in collisions, which might be costly).

Note that generating sequential ids is useful to avoid index fragmentation, and not to provide some ordering of data. Row ids should be considered the same as memory addresses in a regular program, and the actual value considered undefined. Never rely on the id for ordering information, especially if the client is/might be allowed to generate it. Always use an explicit, appropriate value column instead (ie, a created_at timestamp); the row id can be used for tiebreakers, if necessary.

Also, we most likely can't use that code, because the author hasn't released the copyright/licensed it to us.

@stephentoub
Copy link
Member

Thank you for the suggestion, but we don't plan to do anything to expose this. Such functionality could easily be implemented however is desired in a library outside of Guid itself.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

8 participants