How I’m generating a really unique client ID for every user
Creating a unique number for a user that is publicly shareable without compromising sensitive information can be challenging.
The term client_id
may have already given away what I’m working on, but if that’s not the case, it’s Google Analytics.
In the context of Predifix—a condo management app that’s a side project of mine—the Head of Marketing, also known as my wife, wanted to track a few things in the app like when a user signs up, when a subscription is canceled, created, updated, etc. All of these happen in the server, so the JavaScript implementation of Google Analytics wouldn’t help here.
So I created a GoogleAnalyticsService
class in Ruby where I’m leveraging the Google Analytics Measurement Protocol API, which we can use to send events to GA. Perfect for what I needed, since I won’t have client JavaScript code when a subscription get’s automatically canceled, for instance—it’s a webhook that calls this function.
The algorithm
Context aside, the way I ended up doing this assures that no two users are going to get the same client_id
attached to them. This is accomplished by using a bunch of data in this big number we’re generating.
Oh, now it’s a good time to mention that the client_id
that Google expects must have a very specific format, such as 123456.0123456789
. Digits only, a dot after the first 6 digits, for a total of 16 digits.
So I ended up using a 6 digit random number, a 6 digit user ID and a 10 digit timestamp. If your math is correct you’ll notice that this would give you a 22 digit, not the 16 we’re looking for.
The missing part is that the 6 digit random number and the 6 digit user ID get summed up together, and then forced again into a 6 digit number.
The code
def generate_unique_id(user_id)
# Get a 10 digit timestamp
timestamp = Time.now.to_i
# Generate a random 6 digit number
random_part = format("%06d", rand(0..999999))
# Transform the user_id argument into a 6 digit number
user_id_part = format("%06d", user_id % 1_000_000)
# Combine user_id with the generated random number
combined_part = (random_part.to_i + user_id_part.to_i) % 1_000_000
# Returned value in the format of 123456.0123456789
"#{format("%06d", combined_part)}.#{timestamp}"
end
Let’s go over a few examples to see how this thing actually works.
The timestamp
is 1755953349
.
The random_part
is 651270
.
The user_id
is 64
, so the user_id_part
would be 000064
.
Lastly, the combined_part
where we mix the random_part
with the user_id_part
would be 651322
.
So the returned value would be 651322.1755953349
.
There you have it! Your unique client_id
!
Edge cases
Since we’re using the user_id
—and I didn’t mention this before but this actually comes from the database and it’s incremental—we can assure there will be no two users with the same user_id
.
Although that’s true, since we’re reformatting the user_id_part
to a 6 digit number, as soon as we reach 1 million users in the database, we will start to have “duplicates”. Not really, but let’s go over it.
If the user_id
is going to be something like 1200321
, the user_id_part
is going to end up as 200321
and this would be an already existing user_id
. Well, that’s one thing.
Another relevant aspect is the fact that we’re mixing it up with a 6 digit random number. So the chances of ending up with an actual duplicate client_id
are slim.
Besides, we’re also adding a timestamp to the mix, which will inevitably solve any possible duplicates. Unless you’re expecting hundreds of thousands of users signing up to your app at the exact same time. And even if that was the case, the server won’t process all those requests at the exact same time.
Again, very slim chances of getting the exact same client_id
, but it’s there.
Considerations
In this case, I got away with only generating a client_id
only when the user signs up. But if for some reason you need to generate one as soon as the user lands on your app or website, you won’t be able to rely on an incremental number coming from your database, the user_id
.
Before clarifying this need with the marketing department, I was going for this scenario, and I would probably end up relying on the user’s IP address as a somewhat unique number.
And in this case, the timestamp would also have a second use. You see, when the user signs up, I know the date when that happened through the created_at
date in the users
table. But if you don’t have a user to begin with, as they haven’t signed up yet, the timestamp in their client_id
would also tell your when was the first contact they had with your app or website. Might be useful for your marketing needs!
One last point is that in the scenario of a client_id
generated before a user sign up you will likely need to rely on a cookie of some sort, as the user will need to carry that ID throughout their journey up until they sign up. Which also means you will need to inject it into the signup form, somehow—which may or may not be complex depending on which domains you’re using. And obviously you will need to decide what to do with that cookie once the user signs up. When that happens the client_id
will be accessible through the user
record in the database, so you effectively don’t need it in the client side anymore.
Should you clean it, or just leave it there? Up to you to consider.
A last point after the last point regarding the client side. If you also have a client implementation of Google Analytics, through JavaScript, chances are you will also need the client_id
in the client. The way I’m doing this is by attaching it to the session
object and then expose it in a <body>
attribute so that I can fetch it before initializing Google Analytics through JavaScript.
Bottom line
That’s it, now you know how to generate an actually unique ID for a user, without revealing sensitive data such as the user_id
from the database record or even worse, some personal identifiable information.
Gotta love these small projects inside a big project!