Two years ago I started a project, for fun, to try to catch as much malware and URLs related to malware as possible. I have written about this before. In this post I’ll explain the heuristics I use for trying to classify URLs as malicious with “Malware Intelligence” (the name of the system that generates the daily Malware URLs feed). What a normal user sees in any of the 2 text files offered are simply domains or URLs that I classified as “malicious”. But, how does “Malware Intelligence” classifies them as malicious? What heuristics, techniques, etc… does it use?
This is a history of fail. I was analysing a piece of code, in assembly, that I thought would be vulnerable to a zero allocation bug allowing me to overwrite some bytes of heap space (overwriting a structure with many function pointers!). However, after spending like 2 hours analysing statically the “bug”, and documenting it, I finally discovered it wasn’t vulnerable. #Fail.
Auditing a product recently I noticed a curious scenario where I control the following:
- Unix based: The limited vulnerability allows one to create any file as root controlling the contents of that file. I can even overwrite existing files.
- Windows based: The vulnerability allows one to execute an operating system command but doesn’t allow, for some reason, copying files as the Unix vulnerability allows.
In the next paragraphs I will explain how one could exploit such somewhat limited scope vulnerabilities in order to execute remote arbitrary code in the context of the running application (root under Unix and SYSTEM under Windows). In any case, I’ll also explain the opposite case: one can execute an arbitrary operating system command in Unix based systems but can’t create an arbitrary file in the system and one can create an arbitrary file anywhere in the system in Windows operating systems but cannot execute an arbitrary command.
It’s been a while since I started writing a first prototype to try to catch as much malware (URLs and samples) as possible. Today I can say my project is all grown up as it’s generating, daily, a feed with around 9.000 malware URLs and with a low rate of false positives (although there may be some).
The process of finding malware URLs in my tool used to be only a matter of finding suspicious URLs in social networks (Twitter and Identi.ca), checking mail accounts receiving loads of bad stuff and nothing else. At first. Today I’m using crawlers, honeypots, sandboxes, thirdy party public URL feeds, private URL feeds (provided under consent), executable unpackers, heuristic engines for Flash movies, PDFs, OLE2 documents, etc… It changed a lot and became a big project that, I hope, can give useful information for malware researchers.
Today I was performing some tests in the random number generators of some browsers and found, by chance, this mail sent to Bugtraq by Michal Zalewsky called “Unix entropy source can be used for keystroke timing attacks”. While the idea of Michal is very good, I failed to find a reliable way of doing it in my house computer after some time (well, honestly, after just 1 hour…). However, a more simpler idea come to my mind: if /dev/random blocks when the entropy pool is empty and most of the events are generated when mouse or keyboard events happens, at least, I can write quite easily an activity monitor based on /dev/random.