Botnet Research Survey
Zhaosheng Zhu
Northwestern Univ.
zzh321@cs.northwestern.edu
Guohan Lu
Tsinghua Univ.
lguohan@gmail.com
Yan Chen
Northwestern Univ.
ychen@northwestern.edu
Zhi Judy Fu
Motorola Labs
Judy.fu@motorola.com
Phil Roberts
Internet Society
roberts@isoc.org
Keesook Han
AFRL
Keesook.Han@rl.af.mil
Abstract
Botnets are emerging threat with hundreds of millions
of computers infected. A study shows that about 40% of
all computers connected to the internet in the world are in-
fected bots and controlled by attackers( [2]). This article is
a survey of recent advances in botnet research. The survey
classifies the botnet research into three areas: understand-
ing botnets, detecting and tracking botnets, and defending
against botnets. While botnets are widespread, the research
and solutions for botnets are still in their infancy. The paper
also summarizes the existing research and proposes future
directions for botnet research.
1 Introduction
According to the explanation in [1], botnet is a term
for a collection of software robots, or bots, which run au-
tonomously and automatically. They run on groups of zom-
bie computers controlled remotely by attackers. A typical
bot can be created and maintained in four phases.
1. Initial Infection: A computer can be infected in sev-
eral different ways. For example, 1) Being actively
exploited. The host has some vulnerability (e.g. DCE-
RPC). A malicious program then exploits the vulner-
ability and runs on the host. 2) Malware was auto-
matically downloaded while viewing web pages. 3)
Malware was automatically downloaded and executed
through opening an email attachment. 4) USB autorun.
2. Secondary Injection: In this phase, the infected hosts
download and run the bot code, then become a real
bot. The download can be via ftp, http and P2P
(e.g., Trojan.Peacomm) as discussed in § 2.1.
3. Malicious Activities: The bot communicates to its con-
troller to get commands/instructions for conducting ac-
tivities such as spam, DDoS and scanning. Currently
a more sophisticated technique called fast-flux service
networks are gaining popularity(§ 2.1.4). The com-
mand communication can be IRC-based, HTTP-based,
DNS-based or using P2P protocol to avoid single point
of failure.
4. Maintenance and Upgrade: The bot continuously up-
grades its binary in this phase.
Botnets are always classified according to their command
and control architecture. For example, those who use the
Internet Relay Chat (IRC) protocol are known as IRC based
botnets.
We classify current botnet research into three areas: un-
derstanding botnets, detecting and tracking botnets, and
countering against botnets. We will discuss them respec-
tively in the subsequent sections.
2 Understanding Botnet
Most current research focuses on understanding botnets.
There are mainly three types of papers in this area.
• Bot Anatomy: The papers in this category provide ex-
tensive analysis of a specific kind of bot for case study.
The analysis mainly focuses on its network level be-
havior, usually involving the use of binary analysis
tools.
• Wide-area Measurement Study: The second group of
papers provides measurement studies through tracking
botnets to reveal different aspects of botnets in the in-
ternet, such as botnet size, traffic generated, their us-
ages and dynamics. Currently only IRC-based botnets
have been studied.
• Botnet Modeling and Future Botnet Prediction: The
third group of papers discusses the theoretical model-
Annual IEEE International Computer Software and Applications Conference
0730-3157/08 $25.00 © 2008 IEEE
DOI
967
Annual IEEE International Computer Software and Applications Conference
0730-3157/08 $25.00 © 2008 IEEE
DOI 10.1109/COMPSAC.2008.205
967
Annual IEEE International Computer Software and Applications Conference
0730-3157/08 $25.00 © 2008 IEEE
DOI 10.1109/COMPSAC.2008.205
967
ing of botnets, the possible future evolution of botnets
and countermeasures against them.
We will describe each in the following subsections.
2.1 Bot Anatomy
2.1.1 IRC Bot
In [6], it analyzed the source code for four bots, Agobot,
SDBot, SpyBot and GT bot, which are all IRC-based bots.
Among these botnets, only Agobot is a fully-developed bot,
and the other three are like toys. Agobot has provided the
following five features.
• Exploits: It can exploit many well known OS vulner-
abilities (e.g. buffer overflow) and back doors left by
other viruses.
• Delivery: It separates exploits and delivery. Once the
first step exploits succeed, it opens a shell on the re-
mote host to download bot binary. The binary is en-
coded to avoid network-based signature detection.
• Deception: The bot has the module to test for debug-
gers (e.g. SoftIce) and VMWare once it is installed. If
it detected VMWare it stopped running. So VMWare-
based Honeypot cannot run Agobot.
• Function: It can steal system information and monitor
local network traffic.
• Recruiting: It recruits using botmaster controlled hori-
zontal and vertical scannings.
Although using direct source analysis can give us a clear in-
sight about a bot, this approach is quite limited. The biggest
problem is that most bots do not have source code avail-
able. Therefore, more sophisticated methods, for example,
system-level analysis and networking-level analysis for the
botnet behavior are needed.
2.1.2 HTTP Bot
It analyzed the binary of an HTTP-based spam bot module
in Rustock rootkit( [8]). The command and control (C&C)
is http based. To ensure the anonymity, the communication
channel is encrypted. In this paper a binary analysis tool
IDA Pro is used to analyze the binary and find the encryp-
tion key. The paper summarizes that a typical process for
the spam bot to send a spam is as following.
1. The bot asks the controller for local processes/files to
kill and delete.
2. The controller sends back system information.
3. The bot asks for SMTP servers.
4. The bot gets failure responses from the SMTP servers.
5. The bot gets spam message
6. The bot gets target email addresses.
In [21], the author described an HTTP-based DDoS bot,
BlackEnergy. The bot is only used for DDoS attacks.
However, the bot does not have any exploit activities, so
it cannot be captured by Honeynet. The paper only de-
scribed the commands used in C&C, but did not describe
their method to obtain samples. Once a sample is captured,
the botmaster can be tracked.
In [12], it discussed Clickbot.A, a low-noise click
fraud bot. The client is propagated via email attachment.
The botnet also uses HTTP protocol for their command and
control. The paper provided the source code of its botmas-
ter, written in PHP. The paper discussed the details of its
click fraud process. It is also shown that the client partic-
ipating in click fraud also sends spams, implying that the
client performs multi-tasks.
2.1.3 P2P Bot
The author claims that centralized control of botnets offers
a single point of failure for the botnet( [15]). So more sta-
ble architectures, like P2P based architecture, will be used
by botnet operators. And it analyzes one case study: Tro-
jan.Peacomm with binary analysis. The author captures this
binary using Honeypot. The analysis is mainly based on
blackbox techniques. They only discuss the network activi-
ties of an infected host, but did not perform binary analysis
of the code. In their paper they found the P2P technology
(Kademlia algorithm) is used to get the URL to download
real bot binary in the secondary injections discussed in § 1.
In [19], the author analyzes one of the most widespread
P2P botnets by analyzing the binary and networking traces.
Also it proposes some techniques, for example, Eclipsing
Content and Polluting the file, to disrupt the communication
of botnet P2P networks.
2.1.4 Fast-flux Networks
We have mentioned that fast-flux networks are increasingly
used as botnets Command and control networks. There are
many servers in the blackhat circles, such as the phishing
websites. These websites are valuable assets of them, so
they really want to hide their IP addresses from outsiders.
In order to achieve such a goal, they let a user first connect
to a compromised computer, which serves as a proxy, to
forward the user requests to a real server and the response
from the server to the user.
In [4] it introduces a new type of techniques called Fast-
flux service networks for this purpose. The DNS records
968968968
of a real website point to the computers of Fast-flux net-
works. The network uses a combination of round-robin
IP addresses and a very short Time-To-Live (TTL) for any
given particular DNS Resource Record (RR) to distribute a
user’s request to a large number of compromised comput-
ers.
The Fast-flux motherships are the controlling parts of the
fast-flux service networks. It is very similar to the command
and control (C&C) systems found in conventional botnets
but provides more features. It is observed that these nodes
are always hosting both DNS and HTTP services, for be-
ing able to manage the content availability for thousands of
domains simultaneously on a single host.
This paper also presents a case-study for one specific
fast-flux network. They collected information on the IP ad-
dresses assigned to the domain name and how those IP ad-
dresses (A and NS records) changed over time. They then
did some statistical analysis, for example, the distribution
of AS Breakdown for DNS Flux Networks. They found lots
of compromised computers were involved. There are alto-
gether 3,241 unique IP addresses. 1,516 were advertised
as NS records, while 2,844 were short lived TTL used for
HTTP proxy. The above result is only one example, alto-
gether they monitored 80,000 flux IPs with over 1.2million
unique mappings.
2.2 Wide-area Measurement Study
It presents a honeynet-based botnet detection system as
well as some findings on botnets across the Internet( [24]).
The systems are composed of three module.
1. malware collection: use a lightweight responder
nepenthes and unpatched WindowsXP in a virtu-
alized environment.
2. Graybox testing: to learn botnet ”dialect”.
3. Botnets tracking: an IRC tracker (drone) to lurk in IRC
channel and record commands. DNS tracking, a novel
method to estimate botnet size using DNS cache.
For data collection, it deploys a modified version of the
nepenthes platform in darknet to collect malware. To com-
plement the role of nepenthes, it also uses Honeynet in
which the honeypots are running unpatched instances of
Windows XP in a virtualized environment.
There are also several interesting findings in this paper.
• Botnet scanning traffic containing large percentage of
Internet background radiation.
• Most of botnet scanning behavior is well-controlled by
its commander.
• 90% bots stay in IRC channel for less than 50 minutes.
• Over 80% bots are generally detected by Anti-virus
software, e.g. Norton.
• Small botnets receive a larger portion of control and
mining commands. Large botnets have a larger
percentage of cloning and downloading commands
(DDoS).
In [26] it mainly characterizes the network-level behav-
ior of spammers. For example, (1) IP address, AS and
country of spammers. (2) The characteristics of spamming
botnets. To identify a set of hosts that are sending email
from botnets, they used a trace of hosts infected by the
W32/Bobax (.Bobax.) worm from April 28-29, 2005. And
based on the findings, it suggests developing algorithms to
identify botnet membership based on network-level proper-
ties.
Several papers are developing methods to reveal more
properties of botnets, for example, to estimate botnet sizes,
either its footprint or live population. The following are
some existing work.
1. Botnet Infiltration: In [13] it lets a drone to join the
botnet and record joining bot information on the chan-
nel. However only 52% of the botnets they tracked
make bot join information available. A well-developed
botnet will surely not make such information available.
Moreover, this method is solely IRC-based.
2. DNS Redirection: In [10] it counts infected bots by
manipulating the DNS entry associated with a botnet’s
IRC server and redirecting connections to a local sink-
hole. However, it can only count bots which issue DNS
requests to this DNS server.
3. DNSBL In [27] it monitors lookups to a DNS-based
blackhole list to expose botnet membership. However,
it does not reveal which botnet the bot belongs to, and
only applies when the bots are used to send spam.
4. DNS Cache: In [24] it uses DNS cache snooping to un-
cover a botnet’s footprint. However, the result is just a
lower bound on its true DNS footprint and subjected to
three problems. For example, a cache hit was recorded
only if a bot made a lookup query to its local DNS
server.
However, to estimate the botnet size is still a problem.
In [25], the author points out that several issues may make
counting botnet memberships more complicate. For exam-
ple, temporary bot mitigation and bot cloning. It suggests
synthesizing the results from multiple independent views of
a botnet’s behavior.
969969969
2.3 Botnet Modeling and Future Botnet
Prediction
There are also several papers on modeling botnets. It
creates a diurnal propagation model based on the fact that
computers that are offline are not infectious, and any re-
gional bias in infections will affect the overall growth of the
botnet( [11]). Realizing that the trend to small botnets may
be more dangerous than big botnets, in [28] it proposes a
superbot model that the botnets are designed to be coordi-
nated into a network of botnets. [29] discusses an advanced
botnet which considers the following challenges:
1. How to generate a robust botnet even though some bots
are removed?
2. How to prevent significant exposure of the network
topology even though some bots are detected?
3. How to easily monitor and obtain the complete infor-
mation of a botnet by its botmaster?
4. How to prevent (or make it harder) defenders from de-
tecting bots via their communication traffic patterns?
So it proposes to use a hybrid P2P botnet instead of pure
P2P structure to improve the stability of a botnet. Tradi-
tional C&C botnet uses one or two hosts as central con-
trollers. Because the controllers of a botnet can be easily
identified and shut down once one of the bots has been iden-
tified, the paper suggests using some bots as botnet con-
trollers (servant bots), which resembles the super node in
current P2P network.
In [9] it mainly discusses the botnet structures based on
their utilities to botmasters. One conclusion shows that ran-
dom graph botnets (e.g., those using P2P formations) are
highly resistant to both random and targeted responses.
Although there are several papers on the modeling of
botnet, we still have no idea how close these models are
to the botnets in the real world. More accurate models may
help us get more knowledge about botnet and give a better
prediction to the development of botnet.
3 Detecting and Tracking Botnet
There are mainly two approaches of botnet detection and
tracking methods. One is honeynet based method and the
other is based on passive traffic monitoring.
3.1 Honeynet
There are many papers [23, 24] discussed how to track
botnet using Honeynet, and how to use tools to collect mal-
ware [5]. In [22], Jose Nazario from Arbor Networks dis-
cusses several challenges in developing a botnet tracking
tool. In summary, first, there are several tools available to
collect malware, but no tool for tracking the botnet. Sec-
ondly, the tracking tool needs to understand the botnet’s
”jargon” in order to be accepted by the botmaster. More-
over, the increasing use of anti-analysis techniques used by
the blackhat circle makes the development of the tool even
more challenging.
3.2 Traffic Monitoring
In [20] it described a network-wide system to identify
botmasters based on transport layer flow information. It
gathers traffic flow information from many vantage points
within the network. The core idea is based on the attack
and control chain of the botnet. The major steps are listed
as follows:
1. Identify bots based on their attack activities, such as
scanning, emailing of spam and viruses, or DDoS traf-
fic generation. The activities are reported by other se-
curity system.
2. Analyze the flows of these bots to find candidate con-
troller connections (CCC).
3. Analyze the CCC to locate the botmaster.
This paper also gives us some interesting results. For exam-
ple, based on the long-time observation, it estimates the bot
stays 2-3 days on the same controller in average. In [14] it
presented a passive monitoring system (Rishi) to track bot-
nets based on the bots’ IRC nicknames. The core idea is that
the format of nicknames used by the bots is different from
that of a normal user, e.g. USA|016887436 is a typical
nickname used by the bots. The author uses regular expres-
sion for the detection. The system is deployed on a border
router of a campus network running two weeks, and here
are their findings:
• Results are compared with their NIDS system (Blast-
o-Mat). 82 bots were detected while only 34 were de-
tected by Blast-o-Mat. Blast-o-Mat detected 20 hosts
which were not picked up by Rishi.
• None of the botnets uses port traditional IRC port 6667
for C&C.
However, this approach is quite limited. For example,
IRC Nickname can be changed to resemble normal user.
And it can not detect HTTP botnet, or the botnet of which
the communication is encrypted, e.g. Rustock mentioned in
§ 2.1.
The following are two more advanced detection tools. A
BotHunter system is presented which consists of a corre-
lation engine that is driven by three malware-focused net-
work packet sensors, each charged with detecting specific
970970970
stages of the malware infection process( [17]). It finds the
suspicious flows which match BotHunter’s infection dialog
model. Based on the observation that bots within the same
botnet will likely have spatial-temporal correlation and sim-
ilarity, it proposes using network-based anomaly detection
to identify botnet C&C channels( [18]).
The most recent work appears in [16]. In this pa-
per presents classifying networking traffic to detect botnet,
which is independent of the botnet protocol and structure.
4 Defenses Against Botnet
Unfortunately, only a few papers proposed defense tech-
nologies against botnet. The most effective way is to shut-
down the botmaster once we identify it. However, this task
is far from trivial. The following discusses the defense and
some practical issues with this approach.
4.1 Spam
In [7] it proposed a distributed, content independent
spam classification system to defend from botnet generated
spams. A little bit unexpected, the system does not utilize
previous botnet detection results to ban emails generated by
bots. The basic idea of the system is that ”A host that has
recently sent large amounts of e-mails may be a spam-bot.
Consequently, any e-mail coming from such hosts is poten-
tially spam, and if the source has a dynamically allocated
IP address (or simply a dynamic IP address) and the sender
is not in the recipient’s address book or list of past recipi-
ents or senders, then it is almost certain that the e-mail is
spam.”
The system consists of following parts:
1. Identifying the source of emails
2. Keeping track of how many emails were recently sent
by a source
3. Disseminating this information for the purposes of
classifying future emails.
The effectiveness of this system is unknown since it is still
in the process of Implement.
4.2 Enterprise Solutions
Trend Micro provided Botnet Identification Service(
[3]). The company provide the customers the real-time bot-
net C&C botmaster address list via BGP peering between
Trend Micro BIS router and the customers’ BGP border
router. This service charges 9 cents per user for 500,000
users. However, Fast-Flux networks can make Trend Mi-
cro’s sol
本文档为【Botnet Research Survey(1)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。