Flies in the honeypot

Catching more flies with SSH than with honey. — 20 Jun 2022

About a decade ago I ran a few servers for a side web dev business. I was in college at the time and thus had plenty of free time, curiosity, and spare compute cycles my clients’ sites would never consume.

I knew a little about system administration and cared about being a good steward of the Internet as a freshly minted sysadmin. I installed fail2ban, patched my boxes, and made sure my httpd and sshd configurations were secure, and watched my logs.

The logs were something else. I hadn’t expected the near constant deluge of attempted attacks – it was exciting. So, I ran a Kippo SSH honeypot for nine months to learn more about who the attackers were, what they wanted, and what their tricks were. I captured live recordings, payloads, and IP addresses which I’ll show below. They painted a picture that permanently changed the way I saw hackers.

The honeypot

Before we jump into the data, recordings, and payloads it’s important to talk about the honeypot’s capabilities.

Traditionally honeypots fell into two buckets: low interaction and high interaction. Low interaction honeypots aimed to replicate enough of a protocol to record that an attack took place. They produce low fidelity information that’s more useful in aggregate if you’ve deployed dozens of them.

On the opposite side of the spectrum, high interaction honeypots are typically real software that had been sandboxed and modified to record attacks. These produce a rich set of information but are dangerous to deploy because they are likely to actually be abused.

The honeypot I deployed, Kippo, claimed to be in a newer class of medium interaction honeypots. Like low interaction, they were built from the ground up with security in mind, but attempted to provide a convincing enough environment that even a human attacker would be fooled.

Kippo could be run as an unprivileged user on the machine and rather convincingly mimicked a shell on a Debian box. It could record terminal sessions, and faked out wget to download payloads.

I haven’t linked it because the tool is very outdated and likely insecure, but at the time it wasn’t bad.

The numbers

Over nine months my honeypot had 462,588 login attempts – roughly one per minute. There were 795 successful logins (~0.2% success rate):

634 immediately disconnected,¹
124 were automated probes, and
37 were humans.

The honeypot recorded 1036 total commands from these sessions and captured 49 payloads. The commands weren’t really anything exotic, but the proportions showed a general paranoia or unfamiliarity with the box (which was masquerading as a Debian machine):

100+ invocations: ls, cd, w
50-99 invocations: wget
20-49 invocations: cat, exit, tar, ps, apt-get, uname, rm, echo
10-19 invocations: unset, history, unzip, chmod, perl, yum, id

Normal users don’t typically immediately log in and ask who they are (id), who else is there (w), what the box is (uname), or try to unset history – they also don’t reach for yum when they’d know the system uses Debian packages.

Making fun of script kiddies

Unless otherwise noted recordings have been edited for time. I haven't edited content, but domains and IP addresses are likely to have been recycled in the 10+ years since these interactions were recorded.

In this next section we’ll take a look at some of the recordings. A surprising amount of interactions with the honeypot were amateurs fumbling around or trying to follow scripts and with no idea what to do when things go wrong.

Hacker tries to log into machine they’re already logged into:

Don’t you hate it when people are watching you type?

Is it the capitalization that is wrong? No, it must be the tar flags!

When you forget to log out overnight and come back in the morning (real time 17 hour recording):

nano is clearly the superior editor:

Some competency

This hacker uses a clever directory named .. (two dots and a space) to hide it from ls. You can see them clearly think about pulling down one malware, reconsider, then choose another:

Here’s a nice automated probe that gets in, captures, and gets out:

Here’s another probe where the interaction seems calm and collected, apparently trying to find something that the tool was supposed to upload but Kippo failed with:

The payloads

The honeypot I was using fetched payloads when an attacker used wget and stored them in a directory for later analysis. The forty-nine payloads I received roughly broke down in the following ways:

6 were large files used to test network speed.
5 were attempts to download files that no longer existed.
3 were SSH keys.
1 was a utility.
35 were malware.

My rough classification of the malware resulted in the following categories:

15 SSH scanners.
9 general port scanners (SIP, NNTP, POP, SMTP, others).
8 botnet bots (mostly IRC).
2 “rootkits”/persistent backdoors.
1 Counter Strike bot.

Most of the scanners weren’t unexpected, except the one looking for Usenet servers which I thought were on their way out by that time.

The botnet bots were mostly IRC based and would connect to a channel and wait for commands to run from the server. Most were simplistic, not validating the commands they were receiving. One actually did use public/private key crypto to make sure the botnet couldn’t be taken over.

Software stone soup

Calling the payloads “poor quality” would be charitable. Plagiaristic might be more accurate. I found a few places where otherwise identical files were claimed to have been invented by different individuals. Below are headers from the same port scanning code in two different payloads:

/*
** pscan.c - Originally by Volatile
** modified by riksta, lizard
**
*/

/*
** pscan.c - Made By KidRck
** modified by #KidRockS Team
**
*/

In some places, provenance was entirely lost:

# Spreader
# this 'spreader' code isnot mine, i dont know who coded it.
# update: well, i just fix0red this shit a bit.

Most of the payloads were cobbled together from other pieces and may have organically grown over time. It wasn’t uncommon to see multiple spoken or computer languages used across files in a single payload. An extreme example is the toolkit below that’s a combination of some OSS code (MultipartPostHandler.py), Python, PHP, Perl, Shell, and compiled binaries.

  .
  ├── 41
  ├── core
  │   ├── check.py
  │   ├── MultipartPostHandler.py
  │   ├── MultipartPostHandler.pyc
  │   └── x.php
  ├── g3t
  ├── get
  ├── get.py
  ├── go
  ├── mass
  ├── pscan2
  ├── ss
  └── thread.pl

The secretive and anonymous nature of the groups involved may exacerbate the problem. When hackers with true skills shy away from attention, it leaves a vacuum for charlatans to creep in.

Where the one eyed man is king

Many of the toolkits contained help files, READMEs, or tools like pico and bash to make using them easier for their script kiddie audiences. This is a readme.txt file included in one of the payloads:

######################################################
## Edit wn                                          ##
## Replace [REDACTED]@yahoo.com with your email     ##
## ./wn <a/b> <ClassA/ClassB> <Interface> <Speed>   ##
######################################################
######################################################
## ChannelHelp @ Undernet                           ##
## Powered by wn                                    ##
## Contact me at [REDACTED]@yahoo.com               ##
######################################################
######################################################

It wasn’t uncommon to see attackers immediately reach for pico or nano and try to install it if it wasn’t on the system much to the chagrin of one of the rootkit tool makers:

# PICO WILL MAKE RK GROW BIG!
# SO FUCK OFF AND USE vi !

For educational purposes only

It was also fairly common for the authors to state that the malware was for educational/testing purposes only and under some kind of open source license. I suppose it’s the thought that counts?

# this spreader is coded by xdh
# xdh@[REDACTED]
# only for testing...

*** (C)'ed Under a BSDish license. Please look at LICENSE-file.
*** SO YOU USE THIS AT YOUR OWN RISK!
*** YOU ARE ONLY ALLOWED TO USE THIS IN LEGAL MANNERS.
*** !!! FOR EDUCATIONAL PURPOSES ONLY !!!

*#########################################################################
*#### This program is for educational purposes only         ##############
*####     I'm not responsible any  damages of this program  ##############
*####            Use it with your own risk                  ##############
*#########################################################################

Wrapping it up

This was an interesting enough project that I’ve been thinking about the data for almost ten years. There’s something chilling about watching commands get typed, seeing a mistake and a backspace knowing there’s a human on the other end of the line.

When the honeypot was running it would report the geolocation of the incoming request by doing an IP lookup. Usually the IPs belonged to other hacked servers. But sometimes they came from residential ISPs where some hacker was trying to take over their first box. These IPs were often in Eastern Europe or South America – places where the money you’d get for a hacked machine was actually worth something. That revelation was deeply saddening.

I hope these people aren’t in a place where they need to do this anymore – and if they enjoy doing it, then I hope they’ve at least learned to use vi because my new honeypot is ready and looking for flies.

I now suspect these were actually SCP or non-interactive invocations, but the honeypot didn’t support them. ↩︎