Generating sortable Stripe-like IDs with Segment's KSUIDs
Learn how Clerk generates resource IDs with inspiration from Stripe and Segment.
On the shoulders of giants, as they say...
Early in Stripe's lifetime, they launched a new style of resource ID's to a great deal of fanfare. The ID's are prefixed with an abbreviation of the object they represented. For example:
ch_represents a Charge object
cus_represents a Customer object
Patrick Collison offered the motivation for these prefixes on Quora:
At Clerk, we're big fans of the prefix and knew early on that we wanted to offer their convenience to our own customers.
That was the easy part of generating our IDs— we spent a lot more time deciding what to put after the prefix.
For the uninitiated, ID generation is a surprisingly wide subject area with many important considerations.
For a long time, it was commonplace for developers to rely on sequential IDs generated by the database. There are artifacts of this practice all over the web - but these days, they tend to be avoided because:
- They can easily be guessed. This isn't inherently a problem, but many, many security vulnerabilities are made worse when attackers can guess resource IDs. On a more light-hearted note, a "bug" involving guessable IDs led to the creation Reddit self-posts.
- They can reveal a lot of information about application usage. Again, this isn't inherently a problem, but it can lead to unintended side effects. One example came recently when StackOverflow sold to Prosus, and there was discussion on HackerNews about the low user IDs some users had, indicating they were an early user.
- They're impossible to generate in distributed systems. For modern applications, this should be the primary concern. If a system has servers in two different regions, it's impossible for them to generate sequential IDs independently, without coordination - but coordination would likely undermine the point of distributing the system in the first place.
To mitigate these issues, randomness has been introduced into ID generation. Enough randomness ensures that IDs cannot be guessed and that collisions are avoided in distributed systems.
But purely random IDs - like UUIDs - also eliminate a trait of sequential IDs that developers love: they're sortable.
Finding a middle ground between sortable and unique IDs was a primary motivation behind two newer ID generators:
To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number.
It borrows core ideas from the ubiquitous UUID standard, adding time-based ordering and more friendly representation formats.
Clerk's ID generator
After considering both Snowflake and KSUID, we decided to use KSUID for our primary ID generator. The key factor in our decision was that Twitter's Snowflake included a worker number, which we did not need.
Combined with Stripe-like prefixes, our IDs look like this: