How It Works¶
Reading this chapter is optional.
There are several techniques used to detecting unwanted visitors in web traffic.
This is the most primitive and naïve approach. A visitor gets blocked if one or several of its superficial attributes (IP address, user agent string, country, etc.) is blacklisted. This approach has a fatal flaw: blacklists are never complete and thus are trivial to circumvent.
A couple of examples:
- Google has access to virtually unlimited residential and mobile IP addresses around the world via its Google Global Cache (GGC) infrastructure and uses them to review ads, so blacklisting their well-known corporate IP networks is largely useless.
- Facebook occasionally reviews ads using Bright Data (ex-Luminati) residential proxies, which are impossible to blacklist due to their ever-changing nature.
Blacklisting is the only approach used by existing cloaking services on the market, Adspect being one of just a few exceptions. Many of our competitors lack feasible IPv6 address blacklists and thus insist on blocking of all visitors using IPv6, which is a terrible overkill.
Adspect maintains massive built-in IP address blacklists that count over two billion IPv4 addresses and tens of thousands of IPv6 networks.
Fingerprinting is the process of collecting and analyzing device fingerprints that identify visitors. Unlike human fingerprints that are universally unique, device fingerprints aren’t unique. Depending on implementation, they are composed of varying numbers of features, some of which are very common, like user agent strings of popular browsers. But some of the less common features happen to indicate with high accuracy the exact “bad” traffic that we protect against. And we know which.
Fingerprinting is a much more advanced technique normally used by business-oriented fraud protection companies. You may see their services employed, in particular, by value-added services (VAS) providers, protecting mobile “wap-click” offers from click fraud. Adspect is proud to call itself the pioneer of fingerprint scanning in the adtech industry.
Machine learning (ML) is a broad term colloquially referring to making computers learn and then use what they have learned to do their task. With respect to traffic protection, machine learning is used to analyze the features of each visitor to classify them as either legitimate or malicious. This can be done with great precision, given enough information to teach the learning model.
Machine learning makes a perfect solution for inspecting fingerprints. Adspect is powered by a proprietary machine learning technology called VLA™, constantly trained to detect features of bad traffic well beyond the criteria initially built into it.
VLA™ stands for Virtual Learning Appliance. It is the trademark of the machine learning technology that powers the most advanced filtering capabilities of Adspect. In simple terms, it is a self-adapting mathematical machine that observes incoming traffic and finds suspicious recurring patterns in its fingerprints (thousands of features in every fingerprint) that indicate moderators, fraud, and other malicious activity. VLA constantly teaches itself, evolving and adapting to new types of threats as they emerge. We believe that VLA is our strongest weapon in the race of arms of affiliate marketing as it is able to see well beyond what we initially put into it. What a human analyst may overlook will never escape the mathematically strict scrutiny of a carefully programmed machine.
Whereas fingerprint checks yield a close to 100% confidence in that a given fingerprint belongs to a bot (moderator, spy service, etc.), VLA is inherently probabilistic in nature. The real deal here is that fingerprint checks encompass only those threats that we already know of while VLA detects previously unknown dangers. It takes a fingerprint, inspects every feature encoded in it, and yields a confidence percentage, as if saying, e.g., “I am 97% sure that this fingerprint belongs to someone you better filter out!”
Now, it only remains to determine what confidence is high enough to trigger the filter, and the choice is yours where to draw that line. The VLA section of every stream has a “VLA precision” setting that serves that very purpose: you specify the minimum confidence that you require VLA to have in order to filter out a visitor. For example, if you set VLA precision to 95%, then VLA will filter out all visitors for which it yields certainty of 95% and above, but will let through those that it is less confident about. This single precision parameter lets you fine-tune the system in accordance to your own idea of what is “confident enough”. Our tests have shown that 95% is a good value to begin with.
Under the hood, VLA is a self-trained discrete Bayes classifier that maintains an extensive global dataset (template) and offspring per-stream datasets (specializations.) This means that it will accumulate stream-specific knowledge over time, adapting to the features of each particular traffic stream in Adspect.
Adspect employs all three of these techniques together without relying wholly on any single one. This allows us to make accurate decisions with the lowest rates of false positives and false negatives. We firmly believe that extensive fingerprinting coupled with machine learning appliances will play the leading role in defensive adtech because of the immense potential of both technologies, especially if combined.