Recently at the Recorded Future User Network (RFUN) conference, I had the privilege of meeting Dr. Ben Shneiderman from the University of Maryland. Dr. Shneiderman is a Computer Science professor and founding director of the Human Computer Interaction Lab (HCIL) at University of Maryland.
Dr. Shneiderman demonstrated for us several amazing data analysis tools which have been developed at the HCIL, including LifeLines and EventFlow, two tools designed for temporal analysis and visualization of events. While these tools were designed to analyze medical events around patient care, I wondered if they could also be applied to analyze patterns used by attackers against my honeypots.
The first step was to take all of my honeypot logs and turn them into something EventFlow could understand. I imported the logs into Splunk, and started identifying fields. After careful consideration, the only fields I really care about for this analysis are the session number, source IP address, and the main commands being entered by the attacker, such as "who" "ls" "rm" etc. I combined the source IP and session number to create a session ID, so that EventFlow would treat each connection by each IP address separately.
After exporting the data from Splunk and a little formatting, I loaded the data into EventFlow. I removed the sessions which were extremely basic with only two commands or less during the session, as well as a few other records which were simple data gathering but no attempt at infecting the system with malware. This reduced the number of sessions from 24 down to 14.
Performing analysis on the data set which was left, the most common command in all sessions is "wget", a command used for retrieving files from the Internet using HTTP, which is typically how the attacker will pull down malware from the web to infect a system.
At first glance, there doesn't appear to be much in common between sessions prior to "wget". However, by simply removing a few commonly used commands such as "cd", patterns start to appear prior to wget.
A pattern can then be seen that in 10 sessions, the session starts with "w" or "who" prior to issuing a "wget". For those not familiar with the command, "who" allows the user to see who else is logged onto the system.
So out of our original 24 example sessions on the honeypot, 41 percent started with a "who" command followed shortly afterwards with a "wget". Now in my experience as a Linux admin, both of these commands are rarely used, and almost never used together - so the false positive rate on this detection method should be pretty low. Perhaps detection systems should be designed to alert an administrator if this behavior is observed?
Would it be possible then to create "smart" intrusion detection systems based upon user behavior? Maybe. The "who"/"wget" example above is just one example of many behaviors an attacker performs when compromising a system. Other behaviors might not be as detectable. Banks already use similar behavior analysis methods to combat identity theft.
As I analyze additional attacks using this tool, hopefully more interesting insights can be provided into the behavior of attackers - such as patterns in SQL or PHP code injection attacks against my web based honeypot.
Many thanks to the University of Maryland for providing me access to this exciting tool!