LAST UPDATED April 4, 2022
The following is the first post in a three-part series surrounding bot detection and neutralization based on botnet analysis. The series will begin by addressing commodity form/comment spam.
One of the unfortunate realities of running a site on the Internet is the amount of “background noise” — the automated, unsophisticated, poorly targeted attacks, which make up the bulk of malicious web traffic. For the sake of this series, we’re calling this ‘botnet’ traffic.
Detecting Bot Behavior – Form Spam
The Behavior
We’ve written before about active interrogation as a method for distinguishing bots from human users, but sometimes telling the two apart can require much lower precision and zero active measures.
Tracking Spammers
Form spam looks very different from legitimate use of the forms — where a legitimate user will post once or twice, at a rate consistent with hand-typing messages, consistent with, you know, having other things to do, sleeping, etc. A spammer or commodity bot, on the other hand, just doesn’t look right. POSTs are sustained, even at a low rate, the content doesn’t match the site it’s posted to, and they overwhelmingly include links.
Some form spam examples:
comment=
331>Air+Max+90+Gialle +Make+existence+skills+an+element+of+your+home+schooling+encounter.
+Training+a+young+child+to+harmony+a+checkbook,+prepare+a+dish+or+shingle+a+roof+top+
has+many+worth.+Moreover,+various+subject+areas,+which+include+math+concepts,+looking+
at+and+technology+may+be+incorporated+into+these+ability+lessons.+It+is+a+smart+way+for
+a+child+to+have+genuine-world+encounter,+achieve+a+valuable+expertise+and+require+a+
hands-on+strategy+to+their+discovering+path. +http://www.some-more-spam[.]com/balenciaga
-envelope-clutch-with-strap-review-545.html +Engage+in+golf+using+a+buddy+rather+than+single+
if+you+really+want+to+boost+your+video+game.+Not+only+will+you+be+capable+of+talk+about+tips+
and+phrases+of+assistance+having+a+close+friend,+and+viceversa,+but+there’s+yet+another+tiny+
rivalry+there+that+will+draw+out+the+most+effective+within+you.
[.]it/nike-free-5.0-uomo-nere>Nike+Free+5.0+Uomo+Nere +http://wow-i-guess-youre-going-
for-3-spam-sites[.]com/635-converse-black-high-tops-mens.html
author=EugeneLieft&email=inbox458@a-mail-host[.]&comment=it’s+my+first+time+visiting+
your+site+and+I+am+very+fascinated.+Thanks+for+sharing+and+keep+up+;)+ [url=http://www.
spam-site[.]nl/2016/03/12/essay-writing-online-service-custom-paper-writing/]http://www.
spam-site[.]nl/2016/03/12/essay-writing-online-service-custom-paper-writing/[/url]
&recaptcha_challenge_field=it’s+my+first+time+visiting+your+site+and+I+am+very+
fascinated.+Thanks+for+sharing+and+keep+up+;)+ [url=http://www.spam-site[.]nl/2016/03/12/
essay-writing-online-service-custom-paper-writing/]http://www.spam-site[.].nl/2016/03/12/
essay-writing-online-service-custom-paper-writing/[/url] &recaptcha_response_field=
manual_challenge&submit=Post+Comment&comment_post_ID=841&comment_parent=0
ohid=709498&chkshowshipaddr=1&savestep=g1&nextstep=&offer_2=offer_2_182_US_IGZN&ship_
firstname=daytona&ship_lastname=180&ship_address=180&ship_address2=180&ship_city=New+
York&ship_country=US&ship_state=68&ship_zip=180&countryinput=180&giftemail=1&
giftemailaddress=11849@another-mail-host[.]com&giftdate=60165@another-mail-host[.]&
gifttext=[b]daytona[/b][b][url=http://
spam-site[.]com/cgi_bin/]daytona[/url][/b][b][url=http://spam-site[.]com/cgi_bin/]
rolex+oyster+perpetual+date+price+list[/url][/b]
spam-site[.]com/cgi_bin/”>daytona
com/cgi_bin/”>daytona
cgi_bin/”>rolex+oyster+perpetual+date+price+list
&giftfromname =41053@a-mail-host[.]com&submit=Next+Recipient
|
These posts are easily recognized as spam to the human reader and many popular blogging platforms can detect them and send them to a spam folder if configured to do so, but this typically relies on either IP reputation (error prone), analysis of each and every post (inefficient), or a phrase denylist (inefficient AND error prone).
Protecting Your Application
Bot Behavior – Distributed Attacks
Distributed Attack Behavior
Botnets enable an attacker to avoid basic volumetric detection of their malicious traffic by spreading the same overall number of malicious requests across multiple bots under their control. To a web server with no capability to aggregate or group these bots (because each reports a unique source IP address), a distributed attack can look very similar to legitimate user traffic.
As a security analyst or site administrator, this simply means you can’t ask “Why is this IP address sending us 1000 times the requests of a regular user?” Or, better yet, you can’t ask: “This source IP address has failed login 100 times in the past few minutes, can we block it?”, leaving your web application open to several classes of distributed attacks illustrated below:
Case #1: Distributed Password Guessing/Brute Force Attack
The following example traffic shows a number of HTTP requests to a WordPress login page. Note that each request comes from a different IP address but sends the same log=Administrator username field.
10.73.140.3 POST https://wordpress.threatxlabs.local/wp-login.php log=Administrator&pwd=Password1
10.73.140.4 POST https://wordpress.threatxlabs.local/wp-login.php log=Administrator&pwd=Abcdefg
10.73.140.5 POST https://wordpress.threatxlabs.local/wp-login.php log=Administrator&pwd=1qwerty
10.74.220.1 POST https://wordpress.threatxlabs.local/wp-login.php log=Administrator&pwd=@admin@
10.74.220.2 POST https://wordpress.threatxlabs.local/wp-login.php log=Administrator&pwd=Administrator99
|
Case #2: Distributed Parameter Fuzzing (local file inclusion)
Our second example demonstrates several bot actors attempting to use a debugging script to access sensitive system data using a potential local file inclusion vulnerability.
10.120.10.6 GET https://www.threatxlabs.local/path/to/debug/script.php?path=../../../../../../../etc/passwd
10.120.10.6 GET https://www.threatxlabs.local/path/to/debug/script.php?path=../../../../../../etc/passwd
10.115.70.8 GET https://securecheckout.threatxlabs.local/path/to/debug/script.php?path=../../../../../../etc/passwd
10.115.70.8 GET https://securecheckout.threatxlabs.local/path/to/debug/script.php?path=’/system/.restricted’
10.115.70.4 GET https://securecheckout.threatxlabs.local/path/to/debug/script.php?path=/system/.restricted
10.115.70.4 GET https://securecheckout.threatxlabs.local/path/to/debug/script.php?path=../../../etc/passwd
|
Gaining Visibility Into Malicious Traffic
Blocking individual bots or source IP addresses is a losing battle against a distributed attack. If there are any particularly unique characteristics about the attack traffic, it may make sense to block requests matching those (Maybe a User-Agent HTTP header, or the Request URL is very unique). Ultimately, however, we need a more robust approach.
So how can you detect and block a distributed attack?
To detect and block these kinds of distributed attacks we need to instead look for trends in the distribution of traffic among webpages on a given site. Simple traffic baselining allows us to know how much traffic a page typically receives as well as the number of requests a typical client may send to it. A basic heuristic for alerting, then, might be: “We usually receive 10 login requests a minute, suddenly we’re seeing 1000.”
Behavioral Analysis – Grouping Bot Actors
Tracking Behavior Across Multiple Actors
As we saw in the previous post, Bot Behavior – Distributed Attacks, bots can be used to evade basic volumetric attack detection. Because each bot may use a unique source IP address, aggregating the traffic for alerting, reporting, or blocking can be difficult.
We can try to work around this by baselining traffic patterns for a given URL within a web application, but developing rules around these traffic norms is not always sufficient to distinguish botnet traffic and take appropriate action for offenders.
To complement web application baselining we also need a method for grouping attackers by their behavior. Below is an example heuristic for clustering attacks based on common characteristics: the first step in using behavior to identify likely botnets.
Clustering Attacks – Analysis
For the purpose of this analysis, the overall profile for an attacker or “threat actor” is made up of all their attacks, which are just a subset of HTTP requests (those which we’ve already decided are malicious in some way).
If the attacker’s attacks are similar in enough ways to another threat actor’s, we can group them and call the combined entity a ‘botnet’, ascribing control of each of the threat actors to a single entity or individual, i.e. they’re bots and each of their individual risks, behaviors, etc. make up some component of the whole entity’s.
Example attacks.json
The following 25 records are a stripped down and sanitized set of attack logs from the ThreatX WAF, containing only enough information for the clustering example in the next section. In practice, these attacks log over 15 fields each on which we could potentially make clustering decisions.
[
{“ip”:”10.120.10.6″,”timestamp”:”2018-06-25T13:09:03+00:00″,”hostname”:”www.threatxlabs.local”,”path”:”/path/to/de
bug/script.php”},
{“ip”:”10.120.10.6″,”timestamp”:”2018-06-25T13:09:03+00:00″,”hostname”:”www.threatxlabs.local”,”path”:”/path/to/deb
ug/script.php”},
{“ip”:”10.254.40.5″,”timestamp”:”2018-06-25T13:13:51+00:00″,”hostname”:”a
pp-dev.threatxlabs.local”,”path”:”/”},
{“ip”:”10.254.40.5″,”timestamp”:”2018-06-25T13:17:33+00:00″,”hostname”:”ap
p-dev.threatxlabs.local”,”path”:”/”},
{“ip”:”10.115.70.8″,”timestamp”:”2018-06-25T13:23:54+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/pa
th/to/debug/script.php”},
{“ip”:”10.115.70.8″,”timestamp”:”2018-06-25T13:23:54+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/”},
{“ip”:”10.115.70.8″,”timestamp”:”2018-06-25T13:23:54+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/pat
h/to/debug/script.php”},
{“ip”:”10.115.70.4″,”timestamp”:”2018-06-25T13:23:54+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/pat
h/to/debug/script.php”},
{“ip”:”10.115.70.4″,”timestamp”:”2018-06-25T13:23:55+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/pat
h/to/debug/script.php”},
{“ip”:”10.73.140.3″,”timestamp”:”2018-06-25T13:24:02+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-login
.php”},
{“ip”:”10.73.140.4″,”timestamp”:”2018-06-25T13:24:21+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-login
.php”},
{“ip”:”10.73.140.5″,”timestamp”:”2018-06-25T13:24:30+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-login
.php”},
{“ip”:”10.73.150.8″,”timestamp”:”2018-06-25T13:32:14+00:00″,”hostname”:”payments.threatxlabs.local”,”path”:”/includes
“},
{“ip”:”10.73.150.8″,”timestamp”:”2018-06-25T13:35:08+00:00″,”hostname”:”payments.threatxlabs.local”,”path”:”/wwwstats
“},
{“ip”:”10.121.30.9″,”timestamp”:”2018-06-25T13:36:07+00:00″,”hostname”:”store.threatxlabs.local”,”path”:”/reply”},
{“ip”:”10.73.150.8″,”timestamp”:”2018-06-25T13:39:45+00:00″,”hostname”:”payments.threatxlabs.local”,”path”:”/redirect
“},
{“ip”:”10.55.130.2″,”timestamp”:”2018-06-25T13:40:11+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-content/plugins/lazy-content-slider/lzcs_admin.php”},
{“ip”:”10.55.130.2″,”timestamp”:”2018-06-25T13:40:11+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-content/plugins/lazy-content-slider/lzcs_admin.php”},
{“ip”:”10.121.30.9″,”timestamp”:”2018-06-25T13:43:55+00:00″,”hostname”:”store.threatxlabs.local”,”path”:”/wp-config.
php”},
{“ip”:”10.212.40.7″,”timestamp”:”2018-06-25T15:56:37+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/bac
kup”},
{“ip”:”10.212.40.7″,”timestamp”:”2018-06-25T15:56:51+00:00″,”hostname”:”securecheckout.threatxlabs.local”,”path”:”/hta
ccess”},
{“ip”:”10.74.220.1″,”timestamp”:”2018-06-25T15:57:07+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-login.php”},
{“ip”:”10.74.220.2″,”timestamp”:”2018-06-25T15:57:14+00:00″,”hostname”:”wordpress.threatxlabs.local”,”path”:”/wp-login.php”}
]
|
Clustering Attacks – Python Example
The example code below performs clustering of the attach json provided in the previous section. In order to cluster attacks we’re using a very basic set of rules:
1. Set a likeness score for each attack
2. If the attack hostname (HTTP Host:) header matches an existing cluster, add 0.25 to likeness
3. If the attack path ( resource path portion of the URL in the request) header matches an existing cluster, add 0.50 to likeness
4. If the attack timestamp is within ~16 hours (60000s) of the last timestamp on an existing cluster, add 0.25 to likeness
5. If the attack’s total likeness is >= 0.75 for an existing cluster, add the attack to that cluster (and stop checking likeness against the rest of the clusters)
6. If the attack’s total likeness is create a cluster for the attack, using the attack’s hostname, path, and timestamp as metadata for the new cluster
Clustering based primarily on resource path gives us a pretty simple approximation of intent:
The actor intended to perform some kind attack against the resource located at this path.
At this point, we don’t really know if the actor was successful, the type of attack (we’ve stripped those parts out), or who actually controls the actor, but adding the other scores and clustering based on total likeness allows us to at least state:
These actors intended to perform some attacks against this resource path maybe somewhere around the same time, and possibly against the same host.
Which, though a big leap, is enough to say for the purpose of this example:
These actors could be coordinated, let’s consider them to be controlled by the same entity (botnet).
Example Code
#! /usr/bin/env python3
# cluster-example.py
import argparse, datetime, hashlib, json
# get attacks source file
parser = argparse.ArgumentParser()
parser.add_argument(‘-a’,’–attack_file’, required=True, help=”attacks.json”)
args = parser.parse_args()
# attempt to cluster the attack with others
def cluster(attack,clusters):
# get an epoch timestamp we can work with easily
last_timestamp =
int(datetime.datetime.strptime(attack[‘timestamp’],’%Y-%m-%dT%H:%M:%S+00:00′).strftime(‘%s’))
for cluster in clusters:
likeness = 0.0
# test attack likeness to existing cluster based on field matches
if attack[‘hostname’] == cluster[‘prototype_hostname’]:
likeness += 0.25
if attack[‘path’] == cluster[‘prototype_path’]:
likeness += 0.50
if abs(last_timestamp – cluster[‘last_timestamp’])
likeness += 0.25
# if alike enough, add the attack to cluster
if likeness >= 0.75:
cluster[‘attacks’].append(attack)
# update the cluster last_timestamp if newer
if last_timestamp > cluster[‘last_timestamp’]:
cluster[‘last_timestamp’] = last_timestamp
return clusters
# if the attack didn’t match a cluster, it is unique enough to prototype new cluster
cluster= {}
cluster[‘attacks’] = [attack]
# cluster metadata
cid = “%s:%s:%s” % (attack[‘ip’],attack[‘hostname’],attack[‘path’])
cluster[‘cluster_id’] = hashlib.sha1(cid.encode(‘utf-8’)).hexdigest()
cluster[‘last_timestamp’] = last_timestamp
# prototype defines the cluster
cluster[‘prototype_hostname’] = attack[‘hostname’]
cluster[‘prototype_path’] = attack[‘path’]
# add new cluster to list of clusters
clusters.append(cluster)
return clusters
def main():
# init clusters data
clusters = []
# load the attack data
with open(args.attack_file, ‘r’) as data:
attack_data = json.load(data)
# cluster the attacks
for attack in attack_data:
clusters = cluster(attack,clusters)
# print the clusters
print(json.dumps(clusters, indent=4))
if __name__ == ‘__main__’:
main()
|
Example Clustering Results
Below are selected results from our basic clustering algorithm. It was able to successfully identify the behavior seen in Part II: Case #1 – Distributed Password Guessing /wp-login.php and Part II: Case #2 – Distributed Parameter Fuzzing /path/to/debug/script.php as related even though each attack came from multiple source IP addresses, and, in the case of the debug script, also targeted different hosts.
[
{ “cluster_id”: “2d303dc16dc0a40c5c2b97d28a3c2d32cf881362”, “last_timestamp”: 1529963834, “prototype_path”: “/wp-login.php”, “prototype_hostname”: “wordpress.threatxlabs.local”, “attacks”: [ {“ip”:”10.73.140.3″,”timestamp”:”2018-06-25T13:24:02+00:00″,”path”:”/
wp-login.php”,”hostname”:”wordpress.threatxlabs.local”},
{“ip”:”10.73.140.4″,”timestamp”:”2018-06-25T13:24:21+00:00″,”path”:”/
wp-login.php”,”hostname”:”wordpress.threatxlabs.local”},
{“ip”:”10.73.140.5″,”timestamp”:”2018-06-25T13:24:30+00:00″,”path”:”/
wp-login.php”,”hostname”:”wordpress.threatxlabs.local”},
{“ip”:”10.74.220.1″,”timestamp”:”2018-06-25T15:57:07+00:00″,”path”:”/
wp-login.php”,”hostname”:”wordpress.threatxlabs.local”},
{“ip”:”10.74.220.2″,”timestamp”:”2018-06-25T15:57:14+00:00″,”path”:”/
wp-login.php”,”hostname”:”wordpress.threatxlabs.local”}
] }, { “cluster_id”: “94d0b5a7b4f6ff580d7474f26eac0a6a3ecc9c10”, “last_timestamp”: 1529954635, “prototype_path”: “/path/to/debug/script.php”, “prototype_hostname”: “www.threatxlabs.local”, “attacks”: [ {“ip”:”10.120.10.6″,”timestamp”:”2018-06-25T13:09:03+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”www.threatxlabs. local”}, {“ip”:”10.120.10.6″,”timestamp”:”2018-06-25T13:09:03+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”www.threatxlabs. local”}, {“ip”:”10.115.70.8″,”timestamp”:”2018-06-25T13:23:54+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”securecheckout.
threatxlabs.local”},
{“ip”:”10.115.70.8″,”timestamp”:”2018-06-25T13:23:54+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”securecheckout.
threatxlabs.local”},
{“ip”:”10.115.70.4″,”timestamp”:”2018-06-25T13:23:54+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”securecheckout.
threatxlabs.local”},
{“ip”:”10.115.70.4″,”timestamp”:”2018-06-25T13:23:55+00:00″,”path”:”/path/to/debug/script.php”,”hostname”:”securecheckout.
threatxlabs.local”}
] }, { “cluster_id”: “adc0522852e356492dbc3939658ebe3c6ad5db27”, “last_timestamp”: 1529955308, “prototype_path”: “/wwwstats”, “prototype_hostname”: “payments.threatxlabs.local”, “attacks”: [ {“ip”:”10.73.150.8″,”timestamp”:”2018-06-25T13:35:08+00:00″,”path”:”/wwwstats”,”hostname”:”payments.threatxlabs.local”} ] }, { “cluster_id”: “1f1d730e936e217d903c895f7f8ff5485b7725ef”, “last_timestamp”: 1529963811, “prototype_path”: “/htaccess”, “prototype_hostname”: “securecheckout.threatxlabs.local”, “attacks”: [ {“ip”:”10.212.40.7″,”timestamp”:”2018-06-25T15:56:51+00:00″,”path”:”/htaccess”,”hostname”:”securecheckout.threatxlabs.
local”}
] } ] |
We also see examples of one-off attacks from the example data set; these were sorted into clusters of their own.
Taking Action on the Entire Botnet
What does grouping attackers get us?
The clustering example included in this post is just the beginning of the techniques that can be applied to identify like attack traffic. By clustering attacks and grouping attackers we gain the ability to make decisions based on the entire group’s behavior. For botnet traffic, this means quickly detecting, tarpitting, and/or blocking malicious traffic from all identified members of the botnet, potentially before the individual member is able to fully participate in the attack.
ThreatX is continuously updating our capability to perform this kind of clustering based on attacker behavior. This, combined with active interrogation of suspicious actors and dynamic site profiling, ensures malicious bots are quickly identified and stopped.