November 2, 2016 By Paul Gillin 4 min read

If you want to know how important spam filters are to your online experience, try turning them off for a day. You’ll quickly see why these tools we tend to take for granted are so essential. We may not know how spam filters work, but we’re grateful that they do.

Spam volumes have been dropping in recent years, but there’s still plenty of junk out there. According to Trend Micro’s Global Spam Map, volumes exceed 400 billion messages on some days, but we almost never see spam in our inbox. Why is that?

In the cat-and-mouse game of cybersecurity, spam is one area where the good guys have kept reasonably well ahead of the bad. And the outlook for the future is bright: Machine learning could take spam filtering to a new level.

There are many approaches to catching spam, but they all do basically the same thing: scan header information for evidence of malice, look up senders on blacklists of known spammers and filter content for patterns that point to junk mail. The first two tasks are mostly science — the third is art.

Deciphering Header Data

Header information is that long river of text at the top of an email that you thankfully never have to see. It looks like this:

Received: by 10.107.191.69 with SMTP id p66csp1537538iof

X-Received: by 10.107.175.218 with SMTP id p87mr2784731ioo.80.1477075567036

Fri, 21 Oct 2016 11:46:07 -0700 (PDT)

Buried beneath all that gobbledygook is important information. It shows things like the IP address of every server that touched the email, date and time stamps, security signatures and other stuff you don’t need to know, but is useful in understanding where that mail came from. Spam filters look for attempts to deceive the recipient (e.g., g00gle.com instead of google.com) and compare addresses to blacklists of known spammers to automatically filter out those that match.

Blacklists

Blacklists are lists of known spammers collected by internet service providers (ISPs), email providers and server administrators. Anyone can create and publish a blacklist, but the most popular ones, such as SpamCop, Spamhaus and URIBL, have the most credibility. Publishers create these lists by monitoring spam reports from users. That’s why it’s important to label unwanted email as spam. When you do so, you’re helping to keep everyone’s mailbox pristine.

Smart spammers have ways of disguising header information to make their messages look genuine. Not all spammers are smart, however, so header analysis alone catches a lot of the most obvious spam. Even spammers who are good at cloaking information may overlook some telltale details. If delivery reporting is disabled, for example, it’s a sign that the sender is transmitting a large volume of mail and doesn’t want to be bothered with bounce messages. That’s a possible spammer.

There’s no one rule for how spam filters work. Each has its own quirks. Some frown on email sent from free services like Hotmail and Gmail, for example, or may downgrade messages targeted just to an email address without an accompanying name. Each engine is unique. Fortunately, email administrators can manipulate most of these settings to their liking.

Content Filters

The art of spam filtering comes into play when analyzing the contents of a message. This is where the best filters shine, but it’s also where legitimate messages can end up in spam purgatory.

Some content tactics are almost certain to land a message in the spam folder. Emails containing attached executable files or links to blacklisted websites are sure giveaways, as are those with common spam keywords. A few years ago, many spam filters flagged emails containing short codes from services like bit.ly and 3.ly. With the profusion of short codes spawned by Twitter, however, that tactic is less common today.

If those schemes are so easily detected, you might wonder why spammers continue to use them. Unfortunately, there are enough gullible people out there that even a very low hit rate can be profitable. High-volume spammers don’t expect more than about a .1 percent open rate, but that still translates to 1,000 people for every 1 million messages sent.

“When you get a reply, it’s 70 percent sure that you’ll get the money,” one spammer told the Los Angeles Times in a 2005 interview. Although much has changed since then, even a minuscule response rate can be profitable if the volumes are large enough, and spam is free and easy to send.

Machine Learning: Changing How Spam Filters Work

With the advent of powerful machine learning algorithms and big data economics, there’s potential to change how spam filters work.

Apache SpamAssassin is a widely used platform that incorporates advanced statistical techniques to score incoming messages. The same tactics that are applied to detecting fraudulent reviews on travel and e-commerce sites can work in spam analysis as well. When you mark a message as spam, it goes into a hopper with millions of messages that others have flagged. Algorithms churn through these messages to find similar characteristics, such as word proximity or misspellings, that show up frequently in spam.

Cloud computing is also changing the rules of spam filtering by making more powerful filters available to a broader audience at lower cost. Cloud services are increasingly displacing on-premises filters, bringing the benefits of economies of scale. Because cloud providers collect data from many sources, they can compile large databases for machine learning processing. The result should be better content filtering.

You can fine-tune your own spam settings by specifying senders or domains to exclude. Some email administrators even like to loosen controls to be sure legitimate messages don’t get caught. Either way, it’s a good idea to check your spam folder every few days to ensure messages you’ve been waiting for aren’t lurking there. Spam filters are pretty good these days, but nothing’s perfect.

Read the white paper: Accelerating growth and digital adoption with seamless identity trust

More from Fraud Protection

Virtual credit card fraud: An old scam reinvented

3 min read - In today's rapidly evolving financial landscape, as banks continue to broaden their range of services and embrace innovative technologies, they find themselves at the forefront of a dual-edged sword. While these advancements promise greater convenience and accessibility for customers, they also inadvertently expose the financial industry to an ever-shifting spectrum of emerging fraud trends. This delicate balance between new offerings and security controls is a key part of the modern banking challenges. In this blog, we explore such an example.…

Remote access detection in 2023: Unmasking invisible fraud

3 min read - In the ever-evolving fraud landscape, fraudsters have shifted their tactics from using third-party devices to on-device fraud. Now, users face the rising threat of fraud involving remote access tools (RATs), while banks and fraud detection vendors struggle with new challenges in detecting this invisible threat. Let’s examine the modus operandi of fraudsters, prevalence rates across different regions, classic detection methods and Trusteer’s innovative approach to RAT detection through behavioral analysis. A rising threat As Fraud detection methods become more and…

Gozi strikes again, targeting banks, cryptocurrency and more

3 min read - In the world of cybercrime, malware plays a prominent role. One such malware, Gozi, emerged in 2006 as Gozi CRM, also known as CRM or Papras. Initially offered as a crime-as-a-service (CaaS) platform called 76Service, Gozi quickly gained notoriety for its advanced capabilities. Over time, Gozi underwent a significant transformation and became associated with other malware strains, such as Ursnif (Snifula) and Vawtrak/Neverquest. Now, in a recent campaign, Gozi has set its sights on banks, financial services and cryptocurrency platforms,…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today