FAQ | Specifications | Typosquatting Data Feed | WhoisXML API

FAQ

FAQ version: 3.0 dated 25 August, 2020.

Data definition

The "Typosquatting data feed" captures at least 3-member groups DNS domains so that

  • each domain in the group appeared on the same day in the DNS zone file
  • the domain names within the group are similar to each other.

Appearance on the same day in the zone file normally means that the domains were registered on the same day, close to the given date. By "similar to each other" we mean similarity with respect to a suitably chosen algorithmically calculable mathematical characterization.

The coverage of the search for such a domain by the data generator is the set of domains in the top- level domains covered by the following daily data feeds:

This set covers almost all generic top-level domains and a number of country-code top-level domains.

Reasons for a domain to appear listed in the feed

Important disclaimer: a domain listed in this feed is not necessarily malicious or related to any dangerous activity. Nevertheless, it is a fact that if a domain is listed:

  • There exist at least two additional domains which have a name similar to the given domain, so there is an increased risk of getting to another domain because of mistyping or misunderstanding the domain name.
  • The domain has been registered together with similarly named domains on the same day.

The possible reasons for a malicious domain to appear in this feed include:

  • Being registered in a burst for phishing purposes, typically to resemble the name of a brand.
  • Being registered in bulk in a group of machine-generated domain names to be used by malware, e.g. as a potential command-and-control server.
  • Being registered for typosquatting pages related to known other pages or brands (for ad money collection, or sometimes for phishing).

The possible reasons for a benign domain appearing in the feed include:

  • Resulting from brand protection by the brand owner of a similar domain name,
  • Being an element of a set of algorithmically generated domain names used e.g. for load balancing or assigning domain names to entities in bulk.

Data format

Recommended applications

The data feeds hold domains which are, though possible benign, prone to typosquatting or phishing attacks and malware-related activity. Therefore it is recommended to doubly check them when used for any purpose (e.g. opening in a web browser) to maintain cybersecurity.

The listing can also be useful in studying manually or algorithmically with various purposes:

  • Proactive IT security investigations to reveal a future typosquatting or phishing attack. Correlating them with malware blacklists can extend the set of compromised domains in a list. Checking WHOIS and other technical data of the domains is also recommended.
  • Legal investigations related to past cybersecurity incidents.
  • Brand protection to reveal or prevent misuse of brand names and domain names.
  • Research of domain name registration activity trends, etc.

Consult our blogs at the product webpage for further details.