Blog

Get the Latest

News, tips and tricks

Enriching 1M+ domains with company data

We recently had a customer that came to us with the request to enrich 1M+ domains.

He was on the marketing content team at his company and was working on a project to build a monthly report that analyzed the top 1M websites. Automating it was key, but he had limited technical resources to build it out.

He had already spoken with Clearbit, but the pricing wasn't feasible. So he was looking for an alternative that could reliably handle the large request volume.

BigPicture Bulk

We have customers with large volumes in a given month, but 1M requests all at once was new territory. Basically, he wanted to build his report, without it taking all month.

We tackled this problem in a few ways:

  • Make our rate limits customizable so he can hit the API faster
  • Make our auto scaling more intelligent to automatically handle the increased load
  • Provide him with a framework to reliably call the API without having to loop in his dev team

To help run bulk enrichment requests like this, we've open sourced the framework - BigPicture Bulk. It can reliably enrich domains at scale and handles all the lower level details such as:

  • Memory usage via streaming the CSV file
  • Rate limit handling of the BigPicture API
  • Error handling and retries

Usage

After downloading the project, all you need is your CSV file and BigPicture API key.

Setup a simple CSV of your domains like this:

uber.com  
microsoft.com  
stripe.com  

Then call the script via:

API_KEY=YOUR_API_KEY INPUT_FILE="domains.csv" node run.js  

NOTE: You'll need to replace the parameters with your API key and file name.

Result

Your output will be a nice CSV file with records like this:

{
    "domain": "stripe.com",
    "logo": "http://logo.bigpicture.io/logo/stripe.com",
    "name": "Stripe",
    "tags": [
        "Finance",
        "FinTech",
        "Mobile Payments",
        "SaaS"
    ],
    "type": "private",
    "category": {
        "sector": "Technology",
        "industry": "Software & Computer Services",
        "subIndustry": "Internet",
        "industryGroup": "Technology"
    },
    "metrics": {
        "raised": 395000000,
        "employees": 3001,
        "alexaUsRank": 624,
        "alexaGlobalRank": 1339,
        "employeesRange": "1K-5K",
        "...": "..."  
    },
    "...": "..."
}

Depending on how big your file is, it could take some time to run. While our default rate limit is 600 requests a minute, we can increase your limit as needed.

Check it out on Github and let us know what you think!