For the last year or so, it's been a constant game of cat and mouse with spam signups. Every website likely deals with this to some degree, but due to our product offering, it's been more of a problem.
Our product is data as a service with a generous free tier. You can sign up and try our API for free with a limited number of requests that refreshes each month. If you like what you see and you need more requests, you can upgrade to a paid tier.
Why it's a problem
People will create multiple accounts and pool the credentials to avoid upgrading to a paid tier. This can range from an individual person manually creating accounts, to automated bots creating thousands accounts within a short period of time.
This is problematic not only due to the financial loss of people essentially stealing our data, but also the increased load on the system. The extra load increases our costs and can potentially impact paying customers.
The steps we took
Like any startup, finding product/market fit is the number one concern when getting started. So a lot of optimizations are generally put off until they actually become a problem - scaling problems are good problems to have.
In our case, we initially did not have any controls or limits around signups as it wasn't a problem. Secondly, we didn't have an email verification step as a requirement after signing up. Everyone hates having to verify their email, and we wanted to eliminate friction as much as possible.
That changed after launching the latest version of our product. Our data quality went up significantly, and we started marketing about it more. Within a few weeks after the launch, we received an alert one night that hundreds of accounts were being created.
We realized this was a problem and that we needed to do something.
1. IP based rate limiting
We were hesitant to add an email verification step, so we decided to start with IP rate limiting. We used the rate-limiter-flexible package, which made it easy to set up. This limited both the number of requests you could make in a given time period, along with the number of accounts a person could ever create.
While this did slow down sign ups at first, spammers adapted and began to change their IPs, which is relatively easy with a proxy service.
Next up was reCAPTCHA, specifically v3, which doesn't require the user to do anything (no clicking on bicycles or cars in a set of pictures). It uses a ML model to predict if the user is likely a bot and provides a simple score for us to set a threshold on.
It was relatively easy to integrate and immediately stopped the mass spam signups. We thought we may have solved it, but about a month later they started up again.
We could tell they were spam, as they were obscure usernames and/or obviously fake domains with examples such as:
It became clear that people had figured out how to bypass reCAPTCHA and programmatically simulate that they were a real person. So it was obvious at this time that we'd have to add email verification.
3. Email verification
We added email verification and required all new accounts to first verify their email. This again immediately stopped the mass signups.
The peace lasted much longer this time, and it seemed that we had finally solved it. Yet again though, spam signups slowly started to increase.
We noticed it, because we tend to usually get a steady stream of signups with people using their work email. Then one week it switched and the stream changed to predominately free email providers such as gmail / hotmail or disposable emails with obscure usernames such as firstname.lastname@example.org and email@example.com.
4. Work email requirement
We've seen some websites mandate signing up with your "work" email, which basically tries to block any free or disposable emails.
We really wanted to avoid going there, as we've had the personal frustration of not wanting to provide a work email to try out a service. Separately, we've also noticed that many customers from large enterprises tend to use a gmail address to initially hide what company they are from.
After weighing our options though, it seemed that there was no other technical way to viably stop these spam signups. So we decided to implement a ban, and to include an explanation via a link to this article explaining our reasons.
As to how we're detecting free and disposable emails, we're currently using the freemail package. It's not perfect, but it covers the majority of the known providers out there.
5. Domain blacklist
While blocking free and disposable email addresses cut down on the latest wave of spam, we knew that there would eventually be spammers creating multiple accounts via a domain not on the free / disposable list. To handle this, we created a domain blacklist.
We'd like to automate this list at some point, but we're still debating on how best to approach that. So for now, our team handling this manually will suffice.
If you're coming from our signup page and reading this, we apologize. We did not want to go to this level and require people to use "work" emails, but the spammers have given us no other viable option.
Verifying that signups are real people is a hard problem. While the current measures in place are not ideal, it helps us block fake accounts and keep our service affordable.