Two years ago I started a project, for fun, to try to catch as much malware and URLs related to malware as possible. I have written about this before. In this post I’ll explain the heuristics I use for trying to classify URLs as malicious with “Malware Intelligence” (the name of the system that generates the daily Malware URLs feed). What a normal user sees in any of the 2 text files offered are simply domains or URLs that I classified as “malicious”. But, how does “Malware Intelligence” classifies them as malicious? What heuristics, techniques, etc… does it use?
This is a history of fail. I was analysing a piece of code, in assembly, that I thought would be vulnerable to a zero allocation bug allowing me to overwrite some bytes of heap space (overwriting a structure with many function pointers!). However, after spending like 2 hours analysing statically the “bug”, and documenting it, I finally discovered it wasn’t vulnerable. #Fail.
It’s been a while since I started writing a first prototype to try to catch as much malware (URLs and samples) as possible. Today I can say my project is all grown up as it’s generating, daily, a feed with around 9.000 malware URLs and with a low rate of false positives (although there may be some).
The process of finding malware URLs in my tool used to be only a matter of finding suspicious URLs in social networks (Twitter and Identi.ca), checking mail accounts receiving loads of bad stuff and nothing else. At first. Today I’m using crawlers, honeypots, sandboxes, thirdy party public URL feeds, private URL feeds (provided under consent), executable unpackers, heuristic engines for Flash movies, PDFs, OLE2 documents, etc… It changed a lot and became a big project that, I hope, can give useful information for malware researchers.
Today I was performing some tests in the random number generators of some browsers and found, by chance, this mail sent to Bugtraq by Michal Zalewsky called “Unix entropy source can be used for keystroke timing attacks”. While the idea of Michal is very good, I failed to find a reliable way of doing it in my house computer after some time (well, honestly, after just 1 hour…). However, a more simpler idea come to my mind: if /dev/random blocks when the entropy pool is empty and most of the events are generated when mouse or keyboard events happens, at least, I can write quite easily an activity monitor based on /dev/random.
From time to time I need to use some old binary created for older Linux versions like Redhat 6.2, for example. The problem with those binaries is that they were compiled with a very old version of the glibc and they cannot be run ‘like this’ in newer systems. Sometimes, just making a symbolic link from the new library to the old name can be enough but not always. In this brief post I will talk about how to workaround the typical relocation errors and undefined symbols problems with old binaries.