The era of serial number IDs is over!

Published
48

ID is important!

ID is so important!

I have to mention it twice, as the word, IDs play a crucial role in uniquely identifying records in a database.

An ID must be unique in the system, and there are many strategies to generate a unique ID, but not all of them are good for the developer and the system.

Auto-increment IDs

Auto-increment IDs (or sequential IDs) are numeric IDs that are automatically generated by the database when a new record is inserted.

Pros:

  • Easy to read and understand.
  • Can be generated directly in the database.
  • Quick and easy to implement in the database schema.
  • Ensures data integrity by providing a unique ID for each record.
  • Efficient for search queries when used in indexes.

Cons:

  • May reveal sensitive information about the database’s size and structure, making it predictable.
  • When multiple databases are merged, auto-increment IDs can create conflicts.
  • May not be suitable for applications where users need to create their own IDs.
  • Can cause issues for horizontal scaling systems if different nodes generate the same ID, leading to data inconsistencies.

When using REST APIs, it’s essential to consider security risks associated with predictable auto-increment IDs.

If you use an API endpoint that fetches data using numeric IDs, such as /users/:id, an attacker could easily crawl through your data by incrementing the ID number in the endpoint URL.

For example, an attacker could use /users/1, /users/2, and so on, to access sensitive information.

Well folks, if you want to keep your database on lockdown, you might want to think twice before using those easy-breezy auto-increment IDs.

Those little number sequences might make it simple to fetch your data, but they also make it a cakewalk for any hacker to crawl through it like a kid in a ball pit.

So, unless you want to invite every cybercriminal to your virtual house party, I suggest you skip the predictability and go for something more random.

UUID – GUID

UUIDs (Universally Unique Identifiers) or GUIDs (Globally Unique Identifiers) are a better option than auto-increment IDs in software development due to their guaranteed uniqueness and added security benefits.

UUID was originally introduced in 1997 by the Internet Engineering Task Force (IETF) in RFC 4122. Since then, it has been widely adopted in many different applications and systems.

The most commonly used version is UUIDv4, which is generated using a combination of the current time and a random number.

Pros:

  • Guaranteed uniqueness
  • No central coordination needed
  • Suitable for distributed systems
  • Can be used to obfuscate sensitive information

Cons:

  • Longer length than other ID types (128bits)
  • Difficult to remember or work with manually
  • Potentially slower performance than auto-increment IDs

Generate a new UUID in a simple way (by WebCrypto API):

let uuid = crypto.randomUUID();
console.log(uuid);

It is very easy to use the code directly in your browser console:

Get a random UUID in the browser console.

And more than 62millions of codes on Github are relevant to UUID!

UUID on Github

UUIDs are reliable and unique, but they can be long and hard to remember.

NanoID offers a user-friendly alternative that’s smaller and more readable. It’s not always a replacement for UUID, but it can be used when a shorter, more easily digestible ID is needed. Let’s explore NanoID and how it compares to UUID.

NanoID

NanoID is a tiny, secure, and friendly unique string ID generator library developed by Andrey Sitnik.

It was first released on August 2017 and is now available on GitHub at https://github.com/ai/nanoid with over 20k stars (2023)!

If you want a unique ID generator that is similar to the one used by YouTube, you may want to consider using NanoID.

Here are some advantages of NanoID:

  • Small. 130 bytes (minified and gzipped). No dependencies.
  • Safe. It uses hardware random generator. Can be used in clusters.
  • Short IDs. It uses a larger alphabet than UUID (A-Za-z0-9_-). So ID size was reduced from 36 to 21 symbols.
  • Portable. Nano ID was ported to 20 programming languages, including Golang and C#…

Comparison with UUID

Nano ID is quite comparable to UUID v4 (random-based). It has a similar number of random bits in the ID (126 in Nano ID and 122 in UUID), so it has a similar collision probability:

For there to be a one in a billion chance of duplication, 103 trillion version 4 IDs must be generated.

There are two main differences between Nano ID and UUID v4:

  • Nano ID uses a bigger alphabet, so a similar number of random bits are packed in just 21 symbols instead of 36.
  • Nano ID code is 4 times smaller than uuid/v4 package: 130 bytes instead of 423.

With NanoID, you can easily customize the length of the ID, making it more readable and user-friendly, and will be helpful in many cases.

And if you still want a number like ID? Take a look at Twitter Snowflake!

Twitter Snowflake ID

Twitter’s Snowflake ID is another popular distributed ID generation algorithm.

It was designed to generate unique 64-bit IDs at a high scale with a timestamp, a worker ID, and a sequence number.

The timestamp is accurate to the millisecond and the worker ID is assigned to the machine generating the ID. The sequence number is incremented for each ID generated within the same millisecond and worker ID combination.

This ensures that each ID is globally unique and sortable by time.

How twitter snowflake ID work

The advantages of Snowflake ID include:

  • High scalability and distributed generation across multiple machines or clusters.
  • Sorting by time is easy because the timestamp is included in the ID.
  • The structure of Snowflake ID is simple and can be easily understood and implemented.
  • The generation of IDs is fast and can support high concurrency.

However, Snowflake ID also has some limitations:

  • The maximum number of worker IDs is limited, which can constrain the scalability of the system.
  • The timestamp in Snowflake ID is based on the local machine’s clock, which can lead to issues if clocks are not synchronized across machines.
  • The structure of the ID can make it more difficult to implement in some programming languages or systems.

For those interested in learning more about Snowflake ID, Twitter has open-sourced the implementation in several programming languages, including Java, Go, Ruby, and Scala. The implementation is available on GitHub: https://github.com/twitter-archive/snowflake.

Or you can find another implementation of Snowflake in Go a repo with over 2k stars here: https://github.com/bwmarrin/snowflake

Which ID should I choose?

In conclusion, it’s clear that using simple numerical IDs is not a good idea, as they can be easily guessed and pose security risks.

For most use cases, UUID is a solid choice, providing a unique identifier that can be generated easily in a variety of programming languages.

However, NanoID and Snowflake offer some advantages over UUID, such as being more user-friendly and offering more customization options.

Ultimately, the choice of ID will depend on the specific requirements of your project and the tradeoffs between security, usability, and complexity. Whatever your choice may be, it’s important to prioritize the unique identification of your data to ensure the integrity and security of your systems.