How It Works

Reading this chapter is optional.

There are several approaches to detecting and filtering unwanted visitors in web traffic. In this chapter we will touch upon the basics of all of the major techniques of automatic filtering and explain what makes Adspect innovative and unique on the market.

Blacklisting

This is the most primitive, naïve, and widespread approach. It normally involves comparing a narrow set of features of a visitor (IP address, HTTP request headers, etc.) against a pre-collected blacklist. A match signals that the visitor should not be allowed further. While popular, this approach suffers from two major flaws:

  1. Blacklists are never exhaustive and thus are trivial to circumvent, e.g. by cycling through a very long list of available IP addresses during each campaign review, as often facilitated by specialized proxy services. One cannot blacklist everything, there will always be wide gaps that allow malicious parties to get through. There are entire companies that do their business by maintaining vast pools of clean residential IP addresses ready for use for a fee. Maintaining up to date blacklists of these proxy IP addresses is infeasible.
  2. Blacklists may be too broad, yielding false positives. This is especially bad with IPv4 address blacklists. The rather narrow 32-bit IPv4 address space has been exhausted, prompting Internet service providers and carriers to employ NAT (network address translation) to aggregate entire networks of subscribers behind a single shared IP address. This means that blacklisting, say, a single shared residential IP address under suspicion of proxy (yes, there are ways to maintain proxies behind NATs) in a large metropolitan area leads to blacklisting thousands of legitimate potential visitors and high bounce rate.

Blacklisting is the most common–and often the only–approach used by cloaking services in the affiliate marketing domain. While a viable solution in many cases, it is rough and unreliable and cannot be used on its own. Blacklist false negatives are the most common reason of cloaking faults.

Adspect maintains massive built-in IP address blacklists of positively bad traffic sources that count up to two billion addresses.

Fingerprinting

Fingerprinting is, as the name suggests, the process of collecting a fingerprint of a visitor that identifies them. However, unlike human fingerprints that are universally unique, machine fingerprints aren’t unique. Depending on implementation, they are composed of varying numbers of features, some of which are very common, like user agent strings of popular browsers. But some of the less common features happen to indicate with high accuracy the exact “bad” traffic that we protect against. And we know which.

Fingerprinting is a much more advanced technique normally used by business-oriented fraud protection companies. You may see their services employed, in particular, by value-added services (VAS) providers, protecting mobile “wap-click” offers from click fraud. Adspect is proud to call itself the pioneer of fingerprint scanning in the adtech industry.

Adspect has great expertise in JavaScript fingerprinting, that is, analyzing fingerprints composed of features of the visitor’s JavaScript execution environment. Our fingerprints usually consist of 1600 to 2200 different facts, giving us a very detailed view into every visitor’s internal works. We run collected fingerprints against dozens of high-precision tests that allow us to detect malicious visitors with unmatched accuracy. Adspect aims to bring high-end fraud protection into the realm of affiliate marketing.

Machine Learning

Machine learning (ML) is a broad term colloquially referring to making computers learn and then use what they have learned to do their task. With respect to traffic protection, machine learning is used to analyze the features of each visitor to classify them as either legitimate or malicious. This can be done with great precision, given enough information to teach the learning model.

Machine learning makes a perfect solution for inspecting fingerprints. Adspect is powered by a proprietary machine learning technology called VLA™, constantly trained to detect features of bad traffic well beyond the criteria initially built into it.

VLA™ stands for Virtual Learning Appliance. It is the trademark of our machine learning technology that powers the most advanced filtering capabilities of Adspect. In simple terms, it is a self-adapting mathematical machine that observes incoming traffic and finds suspicious recurring patterns in its fingerprints (thousands of features in every fingerprint) that indicate moderators, fraud, and other malicious activity. VLA constantly teaches itself, evolving and adapting to new types of threats as they emerge. We believe that VLA is our strongest weapon in the race of arms of affiliate marketing as it is able to see well beyond what we initially put into it. What a human analyst may overlook will never escape the mathematically strict scrutiny of a carefully programmed machine.

The concept behind machine learning is best described by analogy. Suppose a policeman at an airport is instructed to detain all passengers with a specific tattoo as they are known to be part of a dangerous gang. The policeman detained ten such persons during the last month, each time noticing that they all were also wearing T-shirts with the same symbol as on their tattoo. Now, the policeman will also stop people wearing those T-shirts under the same suspicion, regardless of whether they have the tattoo.

Whereas fingerprint checks yield a close to 100% confidence in that a given fingerprint belongs to a bot (moderator, spy service, etc.), VLA is inherently probabilistic in nature. The real deal here is that fingerprint checks encompass only those threats that we already know of while VLA detects previously unknown dangers. It takes a fingerprint, inspects every feature encoded in it, and yields a confidence percentage, as if saying, e.g., “I am 97% sure that this fingerprint belongs to someone you better filter out!”

Now, it only remains to determine what confidence is high enough to trigger the filter, and the choice is yours where to draw that line. The VLA section of every stream has a “VLA precision” setting that serves that very purpose: you specify the minimum confidence that you require VLA to have in order to filter out a visitor. For example, if you set VLA precision to 95%, then VLA will filter out all visitors for which it yields certainty of 95% and above, but will let through those that it is less confident about. This single precision parameter lets you fine-tune the system in accordance to your own idea of what is “confident enough”. Our tests have shown that 95% is a good value to begin with.

Under the hood, VLA is a self-trained discrete Bayes classifier that maintains an extensive global dataset (template) and offspring per-stream datasets (specializations.) This means that it will accumulate stream-specific knowledge over time, adapting to the features of each particular traffic stream in Adspect.

Our Approach

Adspect employs all three of these techniques together without relying wholly on any single one. This allows us to make accurate decisions with the lowest rates of false positives and false negatives. We firmly believe that extensive fingerprinting coupled with machine learning appliances will play the leading role in defensive adtech because of the immense potential of both technologies, especially if combined.