Blog

Get the Latest

News, tips and tricks

IP to Company: How it works

How does your API work? Where do you get your data?

These are some of the top questions people have when evaluating our IP to Company API. It’s a somewhat technical topic that delves into the core of how the internet works, as mapping IP ownership is a dynamic, imperfect science.

IP ownership

At a high level, there are typically 3 tiers for matching an IP to a company:

  1. ASN
  2. WHOIS
  3. Fuzzy match

1. ASN

An autonomous system (AS) is a collection of connected IP ranges that are all managed, controlled, and supervised by a single entity or organization. Each AS is assigned an autonomous system number (ASN) from the Internet Assigned Numbers Authority (IANA), which is used to route traffic on the Internet.

An AS may lease out or assign sections of their network for smaller organizations to use, who may do the same, and so on.

Lookup examples:

72.14.192.0

  • ASN: 15169
  • Owned by: “GOOGLE”
  • Network range: 72.14.192.0 - 72.14.255.255

161.181.252.0

  • ASN: 46564
  • Owned by: “Nordstrom, Inc.”
  • Network range: 161.181.252.0 - 161.181.255.255

2. WHOIS

WHOIS (pronounced as the phrase "who is") refers to the publicly available databases that store the registered users or assignees of an Internet resource such as a domain name, an IP address block, or an AS.

WHOIS servers are operated by regional Internet registries (RIR) and can be queried directly via the Registration Data Access Protocol (RDAP).

WHOIS lookup examples

Below is an example for 161.181.252.0 (Nordstrom, Inc.).

{
 "handle": "NET-161-181-0-0-1",
 "startAddress": "161.181.0.0",
 "endAddress": "161.181.255.255",
 "ipVersion": "v4",
 "name": "NORDSTROM",
 "parentHandle": "NET-161-0-0-0-0",
 "entities": [
   {
     "handle": "NORDST",
     "vcardArray": [
       "vcard",
       [
         [
           "fn",
           {},
           "text",
           "Nordstrom, Inc."
         ],
         [
           "adr",
           {
             "label": "1600 7th Ave\nSeattle\nWA\n98191\nUnited States"
           },
           "text"
         ],
         [
           "kind",
           {},
           "text",
           "org"
         ]
       ]
     ]
   }
 ]
}

In the above case, while the data structure is somewhat verbose, the data we care about (the name and address) are pretty clear to parse out.

Now compare that to this record for 65.68.166.240.

{
 "handle": "NET-65-68-166-240-1",
 "startAddress": "65.68.166.240",
 "endAddress": "65.68.166.247",
 "name": "SBCIS-101615-121138",
 "parentHandle": "NET-65-64-0-0-1",
 "entities": [
   {
     "handle": "C00191867",
     "vcardArray": [
       "vcard",
       [
         [
           "fn",
           {},
           "text",
           "0530051fastenal Company"
         ],
         [
           "adr",
           {
             "label": "Private Residence\nRichardson\nTX\n75082\nUnited States"
           },
           "text"
         ],
         [
           "kind",
           {},
           "text",
           "org"
         ]
       ]
     ]
   }
 ]
}

In this case, the data isn’t as "clean". The name of company seems to include a random series of numbers in front of it (“0530051fastenal Company”), along with a vague address (“Private Residence”).

Our system takes this data, cleans / normalizes it, and crawls the web to find the matching company. The final output from the API looks like this:

{
 "ip": "65.68.166.240",
 "fuzzy": false,
 "geo": {
   "city": "Gruver",
   "postal": "79040",
   "stateCode": "TX",
   "state": "Texas",
   "countryCode": "US",
   "country": "United States",
   "timeZone": "America/Chicago",
   "...": "..."
 },
 "type": "business",
 "company": {
   "url": "https://www.fastenal.com/",
   "logo": "http://logo.bigpicture.io/logo/fastenal.com",
   "legalName": "Fastenal Company",
   "category": {
     "sector": "Industrials",
     "industry": "Support Services",
     "subIndustry": "Industrial Suppliers",
     "industryGroup": "Industrial Goods & Services"
   },
   "tags": [
     "Construction",
     "Customer Service",
     "Wholesale"
   ],
   "type": "public",
   "domain": "fastenal.com",
   "ticker": "FAST",
   "metrics": {
     "employees": 18094,
     "marketCap": 30929086464,
     "annualRevenue": 5697299968,
     "employeesRange": "10K+",
     "alexaGlobalRank": 49904,
     "estimatedAnnualRevenue": "$1B-$10B",
     "...": "..."
   },
   "geo": {
     "streetNumber": "2001",
     "streetName": "Theurer Boulevard",
     "city": "Winona",
     "state": "Minnesota",
     "country": "United States of America",
     "...": "..."
   }
 }
}

This is a summarized version of the full data available. The rest of the data can be found in the docs here.

3. Fuzzy match

Fuzzy matches are sourced by inspecting various streams of signals from transactional business data. Wherever professionals are emailing, filling out forms, consuming content - our system maps links between IP and domain.

Whether these actions happen from the office on corporate network IPs or from a professional working from home, the activity is helping to power BigPicture’s IP data.

Using these streams of data, our algorithms will then predict which companies own a particular IP network.

For a given IP, you can tell if the company is a fuzzy match or not via the "fuzzy" attribute in the response.

{
 "ip": "...",
 "fuzzy": true,
 "type": "business",
 "company": {
   "...": "..."
 },
 "...": "..."
}

How do we decide what company to match to?

Bigger organizations are straightforward, such as Bank of America. For each tier of IP ownership, their ASN, WHOIS, and fuzzy match activity all point to bankofamerica.com being the owner of the IP network.

For smaller companies that purchase internet from an internet service provider (ISP), the records for the ASN, WHOIS, and fuzzy match could all point to different companies. This makes it more challenging to predict who the final owner is - the company actively using the IP range for their office.

So for simplicity, we currently return a single company in our API response. Essentially, our best prediction of the company behind / actively using the IP address. Our systems are driven by a mix of automated, ML driven processes and manual fixes by our team, with the fuzzy match activity taking the highest priority.

IP classification

After matching to a company, the last step is to classify the IP / company type.

Most traffic on the internet is from automated bots and are typically run from hosting providers such as AWS or Microsoft Azure. Separate from bots, the vast majority of traffic you’ll see on your website is from ISPs - typically an unknown person or organization using the ISP for service.

In both cases, this data isn’t particularly useful if you’re interested in knowing the real company behind an IP. Traffic from AWS doesn’t mean someone from Amazon is on your website, nor does traffic from AT&T or Xfinity.

For every IP, our process will predict the likelihood of the IP being an ISP, hosting provider, or business. The system is powered by a ML model that is continually improving as we feed it more data. If it detects that an IP is an ISP or hosting provider, the API currently does not return a result for the company.

The IP classification can be found under the type attribute in the API response:

{
 "ip": "...",
 "fuzzy": false,
 "type": "isp",
 "company": null,
 "...": "..."
}

The future

The current version of the API was designed for simplicity and a marketing / sales use case. We're open to any feedback, as the goal is to provide the most value for our customers.

If you’re interested in trying it out, you can create a free account here. If you have any questions, feedback, or would like to speak with someone on our team, feel free to contact us here.