<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Unintended Results &#187; Malware</title>
	<atom:link href="http://joxeankoret.com/blog/category/malware/feed/" rel="self" type="application/rss+xml" />
	<link>http://joxeankoret.com/blog</link>
	<description>Or maybe not</description>
	<lastBuildDate>Fri, 14 May 2010 23:41:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>A typical work day with DeepToad</title>
		<link>http://joxeankoret.com/blog/2010/03/08/a-typical-work-day-with-deeptoad/</link>
		<comments>http://joxeankoret.com/blog/2010/03/08/a-typical-work-day-with-deeptoad/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 19:31:43 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[DeepToad]]></category>
		<category><![CDATA[Fuzzy hashing]]></category>
		<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=117</guid>
		<description><![CDATA[Sometimes, I receive so many malware samples that it turns out to be imposible (or at least inhuman) to analyze all the samples by hand and I need to automate the typical (boring) tasks: Clusterization of the samples in smaller sets and initial (and superficial) analysis of the different samples. For the first task I [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">Sometimes, I receive so many malware samples that it turns out to be imposible (or at least inhuman) to analyze all the samples by hand and I need to automate the typical (boring) tasks: Clusterization of the samples in smaller sets and initial (and superficial) analysis of the different samples. For the first task I created <a href="http://code.google.com/p/deeptoad" target="_blank">DeepToad</a>, a tool to clusterize any kind of file using fuzzy hashing techniques.</p>
<p><span id="more-117"></span></p>
<p><strong>Clusterization of malware samples</strong></p>
<p>The very first step is to run DeepToad and see what groups it finds out with 145 PDF malware samples:</p>
<pre lang="asm">$ deeptoad.py .
sQOxsT6xPj7LPsvLgcuBgayBrKyGrIaG;.\c0d1dde49be3a07c4ef4acb79da7050afa6df5b8
Sb+//PzY2BgYCAj4+AgIXl6UlCoq7+/V;.\63a18865ae6b8851ed9e18f12333308f93e156eb
fITKfLLKSrJHSv5Haf5FaXFFbnFAbmZA;.\a30b967a495074e71c711a8cac93b836053e46c1
PzY/P7Q/tLSrtKur1qvW1l3WXV06XTo6;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
iZDqiRzqSBwWSNUWg9WIg0mIJElTJElT;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d
LkREk5NMTKGhcHDz829vJyfe3qOjuLho;.\b5c5bd76bbb56c43ef67c3acb9d62908057c5fc6
WzMQW1oQ91oK9+8KDe/fDW/fc2/ScxzS;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
sqysenq3t5mZUVFVVYuLZGT39xwcMDDN;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
pmWmpmemZ2fRZ9HRgNGAgJuAm5ucm5yc;.\d5c3757ea828bed5ad4a184f7654140ae45e1f3f
xFTExIHEgYE7gTs7fTt9ff19/f2A/YCA;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35
iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\6e309298423e3e4d04e9432900768e9d9493e972
XpuPC8IfyYpfb+Y8+y88GkAcw4pVelnN;.\a45b64d4c6ef074f25a772c841b2041fa118189c
QyIiYmLW1oCAm5ucnGtrGhoaGtXVTU3Z;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
7NraXV0dHfv7FxcLC62tFxfR0erqysrJ;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1
dhQBdmEBR2GQR8SQ+MRs+G1sPW08PS48;.\134d4325d2dbed016d898996e7359f0169df4a21
(...a lot more different hashes...)
VIODRUVFRaqqXFydnfX1lZUPD6WlW1sN;.\9e8e153d80248bd88a178d831210ceec963a3d1d
WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\18a0300a0147764a516702a29841d63d43d8b5c4
WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f
WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\9ebdbce3ecf04b477aa322c12c4370d79807879f
ypa7ymG78WHz8erzBOpfBOZf/uZL/qZL;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
QRIMQXsMoHuKoPKKivJWih9WSh+JSn2J;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64
qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\002ae4bc6822fad96998cc5814d81d957bfa980c
qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\8a0414600a0ac1665611fba114dd2878a5e003f1
AlACAu8C7++v76+vmK+YmKqYqqpnqmdn;.\9b9be301a440f9c4b2bc5b88475859e4907ba74a
pS3MpZDMnJCKnOqKuOo4uAU4WAWdWHGd;.\8c388936f594c469003a1585ac8b7d0b10d92c6b
e8V7e1Z7VlamVqamBaYFBWMFY2M2YzY2;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
Tgd3ToR3NISENEmE9km89qu886tU87dU;.\f87303c057fbca4bd2315798336bea26774858ff
mHSYmNCY0NDS0NLSiNKIiJ6Inp6Nno2N;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f
noZonnBosXCCsdmCdtmrdnirUnjWUjrW;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
0JGRMTGoqElJZ2c1NXp6paXBwb293Nzo;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5</pre>
<p>Uhm&#8230; It doesn&#8217;t seem to be working OK. I can see some groups (<em>WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD </em>and <em>qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo</em>, for example) but it doesn&#8217;t help either because there are so many different fuzzy hashes no one can determine how close are between them. This bad output is because of the default block size used in DeepToad (512 bytes). The files are very small and, as so, the used block size doesn&#8217;t work OK to clusterize those files so, next step, change the block size (sorry for the long output, scroll down&#8230;):</p>
<pre>$ deeptoad.py -b=64 .
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\09d26510be759a54e9de9c011d171e2d30bdf61d
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\12875bebac82cef1392aec33902161340cba51a2
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\1a269b62104ad68238e5bb412bf7c22c4d5d757b
(...more samples with the same hash...)
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\1d0033c9fa4181dd839b8a30e98380487fadce37
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\c9b7024aba6fcae432d177e604dddf95444a5733
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\cb59987e37857e5d3e2e87f5803a8679c39691ee
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\d33b12256cc68971f9355b8ed2dbf5ba6650c733
amuzat2zmd1OmThO5Tgm5TsmSjvUSi/U;.\134d4325d2dbed016d898996e7359f0169df4a21
reRTrX1TlX1llVdlXVfWXR/WWx+EW8iE;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8
a71ra1trW1vTW9PTztPOzgHOAQF2AXZ2;.\e60bca8c871c04384cfc4dccce704afb7e40d703
yZoJyQEJ3wHs373sQL0uQKMulqNHlndH;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5
yZsxyfgx7fji7dfi9tfk9nzk9nyM9umM;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\18b7c952396cbb7c467b32209f2dae8aed830a64
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
(...more samples with the same hash...)
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\e764df606d3af0d8ce4b741689ea7712d12d7f42
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\ea44e955662633d1ac18c542c999e8619a120058
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\fa436c794b6b167b3bc905ae418b44057b913feb
a71rayNrIyO+I76+rb6trTatNjahNqGh;.\a45b64d4c6ef074f25a772c841b2041fa118189c
J+k2Jxs23Ruk3e6kDO5ZDC5Zzy6oz3Co;.\90e4bbc93e7a576b975ec034c4abfd884d9a33ad
a71rawFrAQGUAZSUEJQQEP0Q/f0T/RMT;.\8643f5678f44314a5f63b4ef571f8eaf1585faff
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
rmv8ri/8Fi/BFnjB93i392q32Gp02J90;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\97a47252c2deff9062c421f01399d904b6be9d25
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\be334d38fef5221c4047ec6f89f378c5246b38f2
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\eb3736f0e85a939a9e09092b3d9fc119616cea76
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\f87303c057fbca4bd2315798336bea26774858ff
a71ra/hr+Pji+OLiUeJRUYdRh4cshyws;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
BPUfxLU19QJqrr/o0BDIT73U7rz0dqdB;.\86d04e76947116a96d09ed2af959250f48f8bd56
a71ra+5r7u4Y7hgY7Rjt7QXtBQW8Bby8;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\31b6a93c36064fe3124a0ad4c28491b3b6ca0398
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\591800184d27139e34d8f8b3fe3537f74909cb6b
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\6a40e26883cb6295df1a0ebdee0e317974613749
(...more samples with the same hash...)
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fe05130ef9c841ba6e6013dae5e639bca1f32003
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
26cX2/gX+fjx+Xjxmngzmugz+ejo+Zbo;.\2316da2ad647d61985026d4ac2a1c1fdf665fa8b
26cX2/gX+fjx+Xjxmngzmugz+ejo+Zbo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\002ae4bc6822fad96998cc5814d81d957bfa980c
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\09daa78a232de5db932ef8abe3c859eacc41f3ba
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\118d7b731ca29d316c2a65c58b9617dfa242d9cf
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\158822ec614d73b3027a9ef7590625f11f6873a5
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\18a0300a0147764a516702a29841d63d43d8b5c4
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\d5c3757ea828bed5ad4a184f7654140ae45e1f3f
5N2K5J6KK560K2y0amykasWka8U4a204;.\e1b60cb7b05e93fadcd3c0e328150353cde8540a
5N2K5J6KK560K2y0amykasWka8U4a204;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931
xuGCxhWCUxU5UyA5IyB5I6J5O6IJO7YJ;.\52ee636ee7038affdefadd84f23ebee45411852d
z5kEz/EE6vEK6sUKCMVTCOtTTOsXTCsX;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18
vPu8vLa8trbVttXVdNV0dEl0SUmdSZ2d;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e</pre>
<p>This time the output is better, isn&#8217;t it? <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  We clearly see 5 different groups. I will change again the block size to something smaller, 32 instead of 64, to see what happens:</p>
<pre>$ deeptoad.py -b=32 .
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\09d26510be759a54e9de9c011d171e2d30bdf61d
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\12875bebac82cef1392aec33902161340cba51a2
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7
(...more samples with the same hash...)
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\f8ed9cca28a9c566b2c98bec903d63ebadc88b35
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fc90cf6a5c72dbd29843bc6a14f486192ac4ef1d
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fd3acefb4eb5b8677f9e4481f09bd7b2ddb1fdef
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\ff7ebc93b56e74c17a2bfcc2d96a676ab124670a
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\134d4325d2dbed016d898996e7359f0169df4a21
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\2316da2ad647d61985026d4ac2a1c1fdf665fa8b
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8643f5678f44314a5f63b4ef571f8eaf1585faff
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\a45b64d4c6ef074f25a772c841b2041fa118189c
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\e60bca8c871c04384cfc4dccce704afb7e40d703
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
rU6trXGtcXFLcUtL8Uvx8cTxxMScxJyc;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e
oOr5oOv53esW3fIWB/L5B8n5FsntFv7t;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\18b7c952396cbb7c467b32209f2dae8aed830a64
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\2c64cf6430662e93acd85789f1d7e75e6de6c2e8
(...more samples with the same hash...)
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\e764df606d3af0d8ce4b741689ea7712d12d7f42
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\ea44e955662633d1ac18c542c999e8619a120058
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\fa436c794b6b167b3bc905ae418b44057b913feb
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\e1b60cb7b05e93fadcd3c0e328150353cde8540a
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\002ae4bc6822fad96998cc5814d81d957bfa980c
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\09daa78a232de5db932ef8abe3c859eacc41f3ba
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015
(...more samples with the same hash...)
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\eb3736f0e85a939a9e09092b3d9fc119616cea76
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\f87303c057fbca4bd2315798336bea26774858ff
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fe05130ef9c841ba6e6013dae5e639bca1f32003
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
lPQDlOwD6ewj6fwj+vz1+tv1FNsBFAkB;.\52ee636ee7038affdefadd84f23ebee45411852d
Tha1Tim1zCl/zJ9/2p/p2tPp4dPR4WPR;.\86d04e76947116a96d09ed2af959250f48f8bd56
XkX+XvL+7PI+7P0+Bv3qBtXqL9X7L6/7;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8
n+YDn9wDHdwCHfgC9/jQ9wzQ9AwB9BIB;.\90e4bbc93e7a576b975ec034c4abfd884d9a33ad</pre>
<p>This time the output is even better. There are 4/5 groups and 2 of them seems to be pretty close: the hashes <em>swqzs4yzjIzejN7ejN6MjICMgIAogCgo </em>and <em>swqzs4mziYmCiYKCgIKAgPGA8fG28ba2</em>. The generated hash starts with the same string (<em>swqzs4</em>) so it seems that both groups starts with the same content. However, DeepToad by default shows only the hash that creates the lowest number of sets so we don&#8217;t know if the files from the 2 groups starts or ends with the same string. To show all the generated signatures (the signature, reverse signature and simple signature) use the argument &#8220;-p&#8221; (to print all the hashes) and redirect the output to some file, like in the following example:</p>
<pre>$ deeptoad.py -b=32 -p . &gt; files.csv</pre>
<p>Now, we&#8217;ve a CSV formatted file with all the hashes. Open it with some sort of &#8220;advanced analysis tool&#8221; like OpenOffice&#8217;s calc, Star Calc, GNumeric or Microsoft Excel and sort the columns like in the following picture:</p>
<p><a href="http://joxeankoret.com/blog/wp-content/uploads/2010/03/screen1.png"><img class="size-medium wp-image-118 aligncenter" title="Samples and signatures" src="http://joxeankoret.com/blog/wp-content/uploads/2010/03/screen1-300x233.png" alt="" width="300" height="233" /></a></p>
<p>As we can see, there are 3 similar looking groups and the matching signature (&#8220;Signature&#8221; field) specifies that both files starts with a similar content so we may consider all the files starting with &#8220;swqzs4&#8243; a group. I reduced the number of different elements to be analyzed from 145 to 5 groups and 6 completely different (unique) malware samples. Now, it&#8217;s time to see what tricks they are using and what is the purpose of them <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  But this will be for another post&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2010/03/08/a-typical-work-day-with-deeptoad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Antiemulation Techniques (Malware Tricks II)</title>
		<link>http://joxeankoret.com/blog/2010/02/23/antiemulation-techniques-malware-tricks-ii/</link>
		<comments>http://joxeankoret.com/blog/2010/02/23/antiemulation-techniques-malware-tricks-ii/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 18:55:00 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[antidebugging]]></category>
		<category><![CDATA[antiemulation]]></category>
		<category><![CDATA[unpacking]]></category>
		<category><![CDATA[virtual machine detection]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=74</guid>
		<description><![CDATA[From time to time, when reversing malware, I find new antiemulation techniques as they are widely used by malware to evade detection by AVs that uses emulation, however, it seems that no one wrote about them maybe because there are a lot or, maybe, because they aren&#8217;t very interesting. Anyway, a friend and I decided [...]]]></description>
			<content:encoded><![CDATA[<p>From time to time, when reversing malware, I find new antiemulation techniques as they are widely used by malware to evade detection by AVs that uses emulation, however, it seems that no one wrote about them maybe because there are a lot or, maybe, because they aren&#8217;t very interesting. Anyway, a friend and I decided to look for antiemulation techniques and we found a bunch of them in just about 2 days. Surprise. Well, the following is a list of antiemulation techniques &#8220;found&#8221; by us.<br />
<span id="more-74"></span><br />
<strong>API Emulation</strong></p>
<p>The most typically used antiemulation technique is the use of undocumented APIs or the use of non common ones such as, in example, <a href="http://msdn.microsoft.com/en-us/library/ms680621(VS.85).aspx">SetErrorMode</a>:</p>
<pre lang="c">DWORD dwCode = 1024;

  SetErrorMode(1024);
  if (SetErrorMode(0) != 1024)
    printf("Hi emulator!\n");</pre>
<p>This technique catches, at least, the IDAPro+Bochs debugger and Norman Sandbox.</p>
<p>Another typical trick is the use of non existent APIs. Many emulators will try to &#8220;emulate&#8221; the function by simply returning 0 instead of failing with a null pointer exception. Another one, try to load a vital library for the operating system which is not emulated and call an exported function: just trying to load the library will fail in almost any emulators:</p>
<pre lang="c">int test6(void)
{
HANDLE hProc;

    hProc = LoadLibrary("ntoskrnl.exe");

    if (hProc == NULL)
        return EMULATOR_DETECTED;
    else
        return EMULATOR_NOT_DETECTED;
}</pre>
<p>Just in the case an emulator allows to load any library returning a pseudo handle, a bit more complex examples:</p>
<pre lang="c">struct data1
{
  int a1;
  int a2;
};

struct data2
{
  int a1;
  int a2;
  int a3;
  int a4;
  int a5;
  int a6;
  struct data1 *a7;
};

typedef int (WINAPI *FCcSetReadAheadGranularity)(struct data2 *a1, int num);
typedef int (WINAPI *FIofCallDriver)();

int test8(void)
{
HINSTANCE hProc;
FIofCallDriver pIofCallDriver;

	hProc = LoadLibrary("ntkrnlpa.exe");

	if (hProc == NULL)
		return 0;

	pIofCallDriver = (FIofCallDriver) GetProcAddress(hProc, "IofCallDriver");
	pIofCallDriver -= 2; // At this point there is a 0xCC character, so an INT3 should be raised

	try
	{
		pIofCallDriver();
		return EMULATOR_DETECTED;
	}
	catch(...)
	{
		return EMULATOR_NOT_DETECTED;
	}

}

int test9(void)
{
HINSTANCE hProc;
FCcSetReadAheadGranularity CcSetReadAheadGranularity;
struct data1 s1;
struct data2 s2;
int ret;

	hProc = LoadLibrary("ntkrnlpa.exe");

	if (hProc == NULL)
		return 0;

	CcSetReadAheadGranularity = (FCcSetReadAheadGranularity)GetProcAddress(hProc, "CcSetReadAheadGranularity");

	if (CcSetReadAheadGranularity == NULL)
		return 0;

	s1.a2 = 0;
	s2.a7 = &amp;s1;

        // After this call, ret must be 0x666, the given 2nd argument minus 1
	ret = CcSetReadAheadGranularity(&amp;s2, 0x667);

	if (ret != 0x666)
		return EMULATOR_DETECTED;
	else
		return EMULATOR_NOT_DETECTED;

}</pre>
<p>This technique(s) works in the 3 emulators I tested (Norman Sandbox, IDA+Bochs and Wine) and I&#8217;m pretty sure that them will work in any emulator.</p>
<p><strong>Old Features</strong></p>
<p>In the old -<em>good?</em>- days of MSDOS and Windows 9x the AUX, CON, and other special devices were used to read data from the keyboard, change terminal colors, etc&#8230; This behavior, while not currently supported (if I&#8217;m not wrong), works in current Microsoft Windows operating systems but not in emulators. The following is an easy example:</p>
<pre lang="c">FILE *f;

    f = fopen("c:\\con", "r");

    if (f == NULL)
        return EMULATOR_DETECTED;
    else
        return EMULATOR_NOT_DETECTED;</pre>
<p>The unique &#8220;emulator&#8221; that simulates correctly this behavior is Wine. This technique was found by 2 of my co-workers, <em>nick-namely</em>, &#8220;PE_Luchin&#8221; and &#8220;Shaddy&#8221;.</p>
<p><strong>Assembly</strong></p>
<p>Emulating corrrectly a complete CPU is a very hard task and is also the most error prone area to look for incongruencies. Norman Sandbox works remarkably bad in this sense: The emulator fails (or it failed, I didn&#8217;t tested it since last year) with instructions like ICEBP or UD2 and allows changing, in example, the debug registers via privileged instructions. Easier to see in the following 4 examples:</p>
<pre lang="c">int test1(void)
{
    try
    {
		__asm
		{
			mov eax, 1
			mov dr0, eax
		}
    }
    catch(...)
    {
        return EMULATOR_NOT_DETECTED;
    }

    return EMULATOR_DETECTED;
}

int test2(void)
{
    try
    {
		__asm
		{
			mov eax, 1
			mov cr0, eax
		}
    }
    catch(...)
    {
        return EMULATOR_NOT_DETECTED;
    }

    return EMULATOR_DETECTED;
}

int test3(void)
{
    try
    {
        __asm int 4
    }
    catch(...)
    {
        return EMULATOR_NOT_DETECTED;
    }

    return EMULATOR_DETECTED;
}

/** Norman Sandbox stoped execution at this point <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  */
int test4(void)
{
    try
    {
        __asm ud2
    }
    catch(...)
    {
        return EMULATOR_NOT_DETECTED;
    }

    return EMULATOR_DETECTED;
}

/** Norman Sandbox stoped execution at this point <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  */
int test5(void)
{
    try
    {
        // icebp
	__asm  _emit 0xf1
    }
    catch(...)
    {
        return EMULATOR_NOT_DETECTED;
    }

    return EMULATOR_DETECTED;
}</pre>
<p>These tests were launched against Wine, IDA+Bochs and Norman. While they don&#8217;t work in Bochs they makes failing both Norman Sandbox and Wine; both thinks the process has crashed and stops execution.</p>
<p><strong>Conclussion</strong></p>
<p>There are a lot of antiemulation techniques and these are just simple examples; writting much more elaborated ones is a matter of time and it&#8217;s simply impossible to circunvent all the antiemulation techniques. The old cat &amp; mouse game <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2010/02/23/antiemulation-techniques-malware-tricks-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Analyzing PDF exploits with Pyew</title>
		<link>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/</link>
		<comments>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 14:46:23 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[pyew]]></category>
		<category><![CDATA[obfuscated]]></category>
		<category><![CDATA[pdf]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=95</guid>
		<description><![CDATA[Something I really hate to do when analyzing PDF malware exploits is to manually extract the streams and manually decode them to see the, typically, hidden JavaScript code, so I decided to extend the PDF plugin for Pyew to automatically see them. Now, with the new version of the plugin (download it from the Mercurial [...]]]></description>
			<content:encoded><![CDATA[<p>Something I really hate to do when analyzing PDF malware  exploits is to manually extract the streams and manually decode them to see the, typically, hidden JavaScript code, so I decided to extend the PDF plugin for <a title="Pyew" href="http://code.google.com/p/pyew" target="_blank">Pyew</a> to automatically see them. Now, with the new version of the plugin (download it from the <a href="http://code.google.com/p/pyew/source/checkout" target="_blank">Mercurial repository</a>) we can see what filters are used in the exploit and, the most important thing, we can see the decoded streams, independently of how many filters are being used.<br />
<span id="more-95"></span><br />
<strong>Example</strong></p>
<p>For example, I will take one obfuscated PDF exploit (SHA256 6a8204ee7b703f96f811f32f903ac9df4045b05910d633fc34fed89e2e0a7576). I will open it in Pyew to see what is inside so, simply, run the command &#8220;pyew pdf.file&#8221;:</p>
<blockquote><p>$ pyew sample.pdf<br />
PDF File</p>
<p>PDFiD 0.0.9_PL 6a8204ee7b703f96f811f32f903ac9df4045b05910d633fc34fed89e2e0a7576<br />
PDF Header: %PDF-1.1<br />
obj                    4<br />
endobj                 4<br />
stream                 1<br />
endstream              1<br />
xref                   1<br />
trailer                1<br />
startxref              1<br />
/Page                  1<br />
/Encrypt               0<br />
/ObjStm                0<br />
/JS                    1<br />
/JavaScript            1<br />
/AA                    0<br />
/OpenAction            1<br />
/AcroForm              0<br />
/JBIG2Decode           0<br />
/RichMedia             0<br />
/Colors &gt; 2^24         0<br />
%%EOF                  1<br />
After last %%EOF       0<br />
Total entropy:           4.293999 (      5547 bytes)<br />
Entropy inside streams:  3.669587 (      4773 bytes)<br />
Entropy outside streams: 5.132696 (       774 bytes)</p>
<p>(&#8230;)</p>
<p>[0x00000000]&gt; p<br />
%PDF-1.1<br />
%&amp;#1074;&amp;#1075;&amp;#1055;&amp;#1059;<br />
1 0 obj<br />
&lt;&lt;<br />
/Type /Catalog<br />
/OpenAction &lt;&lt;<br />
/JS 4 0 R<br />
/S /JavaScript<br />
&gt;&gt;<br />
/Pages 2 0 R<br />
&gt;&gt;<br />
endobj<br />
2 0 obj<br />
&lt;&lt;<br />
/Type /Pages<br />
/Kids [ 3 0 R ]<br />
/Count 1<br />
&gt;&gt;<br />
endobj<br />
3 0 obj<br />
&lt;&lt;<br />
/Type /Page<br />
/Parent 2 0 R<br />
/Resources &lt;&lt;<br />
/Font &lt;&lt;<br />
/F1 &lt;&lt;<br />
/Type /Font<br />
/Name /F1<br />
/Subtype /Type1<br />
/BaseFont /Helvetica<br />
&gt;&gt;<br />
&gt;&gt;<br />
&gt;&gt;<br />
/MediaBox [ 0 0 795 842 ]<br />
&gt;&gt;<br />
endobj<br />
4 0 obj<br />
&lt;&lt;<br />
/Length 4769<br />
/Filter [/ASCIIHexDecode /ASCII85Decode /#4c</p></blockquote>
<p>What we see in Pyew? The output of <a href="http://blog.didierstevens.com/programs/pdf-tools/" target="_blank">PDFId</a> (a great tool by Didier Stevens) as well as the hexadecimal output of the first block (512 bytes). Taking a brief look to the 1st block of data we see one "OpenAction" to execute JavaScript. Surprise. The code "/JS 4 0 R" specifies that the JavaScript code to be executed is the object number 4. Seeking to the offset where the object #4 is and printing the buffer (in ASCII) we will find the following:</p>
<blockquote>
<pre>[0x000001b7]&gt; s 0x1b7
[0x000001b7]&gt; p
4 0 obj
&lt;&lt;
        /Length 4769
        /Filter [/ASCIIHexDecode /ASCII85Decode /#4c#5a#57De#63#6fde /R#75nLen#67t#68#44ecod#65 /FlateDecode ]
&gt;&gt;stream
4A2E3539605651222D714E634326304C5A47725A236A63494B26682C323A4E532&#8230;</pre>
</blockquote>
<p>The object is multiple times encoded and, which is more, the strings to specify what filters must be used in order to decode the stream are encoded too. It's perfectly legal according to the PDF specifications, although pretty suspicious. Pyew does a good job decoding both the encoded strings and the multiple times encoded stream. To see the streams just type "pdfvi" to see the encoded streams in the console:</p>
<blockquote>
<pre>eval(unescape("%76%61%72%20%56%68%4C%66%4E%20%3D..."))</pre>
</blockquote>
<p>Wow! it's a <em>small</em> chunk of JavaScript data <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Pyew <em>automagically</em> applied all the filters needed (ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode and FlateDecode) and printed out the obfuscated code. We can see it, too, in a graphical user interface. Instead of typing "pdfvi" execute the command "pdfview". You will see the following screen:</p>
<div id="attachment_96" class="wp-caption aligncenter" style="width: 310px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf1.png"><img class="size-medium wp-image-96" title="Obfuscated Stream View" src="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf1-300x156.png" alt="Obfuscated Stream View" width="300" height="156" /></a><p class="wp-caption-text">Obfuscated Stream View</p></div>
<p><strong>More Examples</strong></p>
<p>OK, so we can see now the encoded stream but, what if there are a lot of encoded streams and we must check them all or if we want to see just one of them? For this purpose, and also to show the Pyew's APIs, I created an example usage of the PDF API. The example reads all the streams and shows a list of all the encoded streams as you may see in the following snapshot:</p>
<div id="attachment_97" class="wp-caption aligncenter" style="width: 310px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf2.png"><img class="size-medium wp-image-97" title="Usage example of the PDF API" src="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf2-300x156.png" alt="Usage example of the PDF API" width="300" height="156" /></a><p class="wp-caption-text">Usage example of the PDF API</p></div>
<p>Using this simple screen we can see all the streams or just one specific (encoded) stream. This is the code of this example usage of the Pyew's API for the PDF format:</p>
<pre lang="python">#!/usr/bin/env python

import os
import sys

from pyew_core import CPyew
from easygui import choicebox, fileopenbox, msgbox

def main(filename=None):
    if filename is None:
        filename = fileopenbox(msg="Select PDF file", default="*.pdf", filetypes=["*.pdf"])
        if filename is None:
            return

    pyew = CPyew(batch=True)
    pyew.loadFile(filename)

    streams = pyew.plugins["pdfilter"](pyew, doprint=True)
    if len(streams) == 0:
        msgbox(title="PDF Streams",msg="No encoded streams found")

    l = []
    l.append("About PDF Streams Viewer")
    l.append("See all streams (both encoded and unencoded)")
    for x in streams:
        l.append("Stream %d encoded with %s" % (x, streams[x]))
    l.append("Quit")

    while 1:
        c = choicebox(msg="Select one stream to view it decoded", title="Stream Viewer", choices=l)
        if c is None:
            break
        elif c.lower() == "quit":
            break
        elif c.lower().startswith("about"):
            msgbox(title="About PDF Streams Viewer",
                   msg="Example usage of the Pyew APIs to see PDF streams. Written by Joxean Koret")
        elif c.lower().startswith("see all"):
            pyew.plugins["pdfview"](pyew, doprint=False, stream_id=-1)
        else:
            stream_id = int(c.split(" ")[1])
            pyew.plugins["pdfview"](pyew, stream_id=stream_id)

if __name__ == "__main__":
    if len(sys.argv) == 1:
        main()
    else:
        main(sys.argv[1])</pre>
<p>And, that's all for the moment. I hope you like the new Pyew's features <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pyew! A Python tool to analyze malware</title>
		<link>http://joxeankoret.com/blog/2010/02/08/pyew-a-python-tool-to-analyze-malware/</link>
		<comments>http://joxeankoret.com/blog/2010/02/08/pyew-a-python-tool-to-analyze-malware/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 18:37:11 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=80</guid>
		<description><![CDATA[Working in a disassembler with code analysis to speed up (graph) analysis of malware dumps (malware dumped from memory while running) I decided to write a tool using this core oriented to malware analysis and the result is Pyew! Pyew is a tool like radare or biew/hiew. It&#8217;s an hexadecimal viewer, disassembler for IA32 and [...]]]></description>
			<content:encoded><![CDATA[<p>Working in a disassembler with code analysis to speed up (graph) analysis of malware dumps (malware dumped from memory while running) I decided to write a tool using this core oriented to malware analysis and the result is <a href="http://code.google.com/p/pyew/">Pyew</a>!<br />
<span id="more-80"></span><br />
Pyew is a tool like <a href="http://www.radare.org" target="_blank">radare</a> or <a href="http://biew.sourceforge.net/" target="_blank">biew</a>/<a href="http://www.hiew.ru/" target="_blank">hiew</a>. It&#8217;s an hexadecimal viewer, disassembler for IA32 and AMD64 with support for PE &amp; ELF formats as well as other non executable formats, like OLE2 or PDF. In the <a href="http://code.google.com/p/pyew/" target="_blank">project&#8217;s page</a> you may find <a href="http://code.google.com/p/pyew/wiki/UsageExample" target="_blank">usage examples</a> (like the superficial analysis of some <a href="http://code.google.com/p/pyew/wiki/AnalysisMebroot" target="_blank">Mebroot dowloaders</a>) as well as the <a href="http://code.google.com/p/pyew/wiki/Features" target="_blank">features</a> of the version available for download as a package (however, I recommend you to download the bleeding edge version from the <a href="http://mercurial.selenic.com/" target="_blank">Mercurial</a> repository available <a href="http://code.google.com/p/pyew/source/checkout" target="_blank">here</a>).</p>
<p>Anyway, even when Pyew have a command line interface (and a graphical user interface is planned) it was written for batch analysis of malware. Let&#8217;s imagine the following situation: You need to analyze a bunch of malware samples, i.e. 1000 new samples. What would you do? Analyze all of them manually one per one? It&#8217;s better to write some sort of batch script to analyze the samples and get a simple report about the malwares. You may find in the <a href="http://code.google.com/p/pyew/w/list" target="_blank">wiki</a> of Pyew a <a href="http://code.google.com/p/pyew/wiki/BatchExample" target="_blank">batch script example</a> to check for some specific marks at the file header, get the API calls made at entry point or to get a list of uncommon mnemonics found in the entry point.</p>
<p>Just to show another example of Pyew in batch mode I will explain how to write a simple script to get mnemonics of instructions used commonly as antidebugs. Let&#8217;s start writting the script. First import the libraries we need:</p>
<pre lang="python">
from pyew_core import CPyew
</pre>
<p>We need to import the class CPyew from pyew_core (the kernel of Pyew). Next, write a code to handle the load of one file and, after the load, print the antidebugs found:</p>
<pre lang="python">import sys
from pyew_core import CPyew

filename = sys.argv[1]
pyew = CPyew(batch=True) # Specify that we're in batch mode
pyew.codeanalysis = True # Just in case, by default code analysis is always performed
pyew.loadFile(filename) # Load the file and read all the structures, perform code analysis, etc...

print pyew.antidebug</pre>
<p>That&#8217;s all! This simple script will take as input a file and will analyze it for mnemonics used as antidebug (like INT 3 or RDTSC). Now, it&#8217;s time to write a better script that takes a directory and recursively traverses every subdirectory to analyze all files. The final result is <a href="http://code.google.com/p/pyew/source/browse/batch_example.py">here</a></p>
<p><a href="http://code.google.com/p/pyew/source/browse/batch_example.py"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2010/02/08/pyew-a-python-tool-to-analyze-malware/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Malware Tricks I</title>
		<link>http://joxeankoret.com/blog/2009/12/02/malware-tricks-i/</link>
		<comments>http://joxeankoret.com/blog/2009/12/02/malware-tricks-i/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 21:57:42 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[pyew]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=76</guid>
		<description><![CDATA[Today, while analyzing a family of malwares (the familiy called by some vendors as &#8220;Krap&#8221;) I noticed a good and new, at least for me, antiemulation technique. What do you think this sample code does? some_func: ; Do stuff... start: push offset some_func jmp edx What is this? We&#8217;re pushing the address of the function [...]]]></description>
			<content:encoded><![CDATA[<p>Today, while analyzing a family of malwares (the familiy called by some vendors as &#8220;Krap&#8221;) I noticed a good and new, at least for me, antiemulation technique. What do you think this sample code does?</p>
<pre lang="asm">some_func:
  ; Do stuff...

start:
   push offset some_func
   jmp edx</pre>
<p><span id="more-76"></span><br />
What is this? We&#8217;re pushing the address of the function some_func in the stack and, after this, jumping unconditionally to the address contained at EDX. The question here is: What value has the EDX register before executing your first line of assembly code? You have the address of ntdll!KiFastSystemCallRet:</p>
<p style="text-align: center;">
<a href="http://joxeankoret.com/blog/wp-content/uploads/2009/12/anal_edx.png"><img class="size-medium wp-image-77 aligncenter" title="Value of EDX at the very first program\'s instruction" src="http://joxeankoret.com/blog/wp-content/uploads/2009/12/anal_edx-300x178.png" alt="" width="300" height="178" /></a></p>
<p>So, basically, we&#8217;re jumping to a return only function (see a detailed description of <a href="http://www.dumpanalysis.org/blog/index.php/2008/01/10/what-is-kifastsystemcallret/">KiFastSystemCallRet</a>) efectively returning into the &#8220;some_func&#8221; function. The emulators I tested, as in example, the Bochs Debugger module that comes with IDA Pro, initialize all the registers to 0: a cool trick! And the first time I see this.</p>
<p>The tricks I typically find in malware are undocumented (or non typical) API calls mixed with junk code, as the following example extracted from a Mebroot downloader:</p>
<pre lang="asm">
000013a7 PUSH 0x74327ebc
000013ac CALL KERNEL32.dll!WriteFile
000013b2 TEST EAX, EAX
000013b4 JZ 0x000013bb      ; 1
000013b6 JMP 0x0000108e     ; 2
000013bb PUSH 0x0
000013bd CALL KERNEL32.dll!DisconnectNamedPipe
</pre>
<p>Junk code using APIs relatively commons:</p>
<pre lang="asm">
00001c1f PUSH 0x0
00001c21 PUSH 0x0
00001c23 CALL SHLWAPI.dll!SHDeleteKeyA
00001c29 PUSH 0x100
00001c2e CALL msvcrt.dll!malloc
00001c34 ADD ESP, 0x4
00001c37 PUSH EAX
00001c38 CALL msvcrt.dll!free
00001c3e ADD ESP, 0x4
00001c41 PUSH 0x0
00001c43 CALL WINMM.dll!timeKillEvent
00001c49 PUSH 0x10005129
00001c4e LEA EAX, [EBP-0x20]
00001c51 PUSH EAX
00001c52 CALL USER32.dll!wsprintfA
00001c58 ADD ESP, 0x8
00001c5b PUSH 0x0
00001c5d CALL ADVAPI32.dll!RegCloseKey
00001c63 CALL ole32.dll!OleUninitialize
</pre>
<p>Very simple API calls not commonly emulated (extracted from the dropper of the rootkit TDSS):</p>
<pre lang="asm">
00000813 XOR ESI, ESI
00000815 PUSH ESI
00000816 MOV EAX, [0x40600c]        ; kernel32.dll!GetModuleHandleA
0000081d CALL EAX
0000081f (PUSH 0x74
00000821 MOV EAX, [0x406080]        ; msvcrt.dll!iscntrl
00000827 CALL EAX
00000829 POP ECX
0000082a TEST EAX, EAX
0000082c JNZ 0x000008ad     ; 1
00000832 PUSH 0x6d
00000834 PUSH 0x68
00000836 MOV EAX, [0x40607c]        ; msvcrt.dll!is_wctype
0000083d CALL EAX
</pre>
<p>Or strange x86 assembly instructions like multibyte NOPs with redundant prefixes and so on (found in some variants of Sality): </p>
<pre lang="asm">
f30f1f90909090. rep nop [eax+0x66909090]
</pre>
<p>I know it&#8217;s just one antiemulation trick and there are thousands of them but this trick is new (at least for me), special and cool!</p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2009/12/02/malware-tricks-i/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zerowine: Better reports, network conversations and bug fixes</title>
		<link>http://joxeankoret.com/blog/2009/02/10/zerowine-better-reports-network-conversations-and-bug-fixes/</link>
		<comments>http://joxeankoret.com/blog/2009/02/10/zerowine-better-reports-network-conversations-and-bug-fixes/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 10:05:59 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[antidebugging]]></category>
		<category><![CDATA[unpacking]]></category>
		<category><![CDATA[virtual machine detection]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=67</guid>
		<description><![CDATA[Single user version of Zerowine Yesterday I finished the (surely) last single-user version of Zerowine and added some interesting features to it. Many Zerowine users told me that the reports were very confusing and, yes, that&#8217;s true. I fixed this problem by adding new debugging channels to the currently latest stable version of Wine (1.1.10) [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Single user version of Zerowine</strong></p>
<p>Yesterday I finished the (surely) last single-user version of Zerowine and added some interesting features to it. Many Zerowine users told me that the reports were very confusing and, yes, that&#8217;s true. I fixed this problem by adding new debugging channels to the currently latest stable version of Wine (1.1.10) and, well, the reports now are less confusing and more readable. The new debugging channels I added to Wine are the following:</p>
<ol>
<li>humanmalware: This channel shows in human readable format what the malware is doing.</li>
<li>malware: Quite similar to the TRACE channel, but just logs the calls to APIs interesting for malware research.</li>
<li>malwaredump: This channel shows the network conversations.</li>
<li>malwarereg: Shows registry operations.</li>
<li>malwarelib: Shows what libraries the malware is loading/unloading.</li>
</ol>
<p>The following is an example report of running a malware in the sandbox with the latest features:</p>
<div id="attachment_68" class="wp-caption aligncenter" style="width: 300px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2009/02/zerowine_channels1.png"><img class="size-medium wp-image-68" title="Zerowine reports with the new channels" src="http://joxeankoret.com/blog/wp-content/uploads/2009/02/zerowine_channels1-290x300.png" alt="Zerowine reports with the new channels" width="290" height="300" /></a><p class="wp-caption-text">Zerowine reports with the new channels</p></div>
<p>We can see how the malware connects to some remote web server, the HTTP query executed, the local file downloaded, etc&#8230; This in the &#8220;Report&#8221; section, in the &#8220;Signature&#8221; section we get just the &#8220;human readable&#8221; format of the report (as is normal, not as detailed as the &#8220;Report&#8221; section, however).</p>
<p>I also fixed various bugs (in both Wine and Zerowine) and Zerowine now is able to detect more anti-debugging techniques, to dump new malware formats and more <em>secure</em>. I removed some features in the patched version of Wine that are a bit insecure for malware analysis.</p>
<p>Well, and that&#8217;s all for the mono-user version (I will be releasing it this week, or at least I hope to do so). I will update this entry when the file I&#8217;m uploading to the Sourceforge.net finishes, and it&#8217;s very slow (really, a pain in the ass).</p>
<p><strong>Multiuser Version of Zerowine</strong></p>
<p>The new multi-user version of Zerowine will take a long while because it requires a lot of changes, however, many features are implemented right now (Queues, multiple malware analysis nodes, database support, etc&#8230;). The changes will be, mainly, architectural ones but not all. In example, I&#8217;m implementing right now new &#8220;engines&#8221; to analyze malware in other platforms: One IDA Pro based agent to execute the malware with the Bochs Debugger inside IDA, dump &amp; analyze it and get an unpacked IDB database.</p>
<p>Other (possible) agent I&#8217;m planning is a Windows hooker to analyze the malware in a real Windows box (but the problem that comes to my mind is how to clean the environment automatically after the malware execution&#8230;).</p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2009/02/10/zerowine-better-reports-network-conversations-and-bug-fixes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Zerowine: Malware dumping and detection tricks [Updated]</title>
		<link>http://joxeankoret.com/blog/2009/01/18/zerowine-malware-dumping-and-detection-tricks/</link>
		<comments>http://joxeankoret.com/blog/2009/01/18/zerowine-malware-dumping-and-detection-tricks/#comments</comments>
		<pubDate>Sun, 18 Jan 2009 17:24:30 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[antidebugging]]></category>
		<category><![CDATA[unpacking]]></category>
		<category><![CDATA[virtual machine detection]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=54</guid>
		<description><![CDATA[Update: I released the new version now! Download the prebuilt QEmu virtual machine (or the source code) from here. Remember that the root&#8217;s password is &#8216;zerowine&#8217;. There is also another user account: &#8216;malware&#8217; with password &#8216;malware&#8217;. I recently added 3 new interesting features to Zerowine. The very first one is the ability to dump the [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Update</strong>: I released the new version now! Download the prebuilt QEmu virtual machine (or the source code) from <a href="https://sourceforge.net/project/platformdownload.php?group_id=248410" target="_blank">here</a>. Remember that the root&#8217;s password is &#8216;zerowine&#8217;. There is also another user account: &#8216;malware&#8217; with password &#8216;malware&#8217;.</p>
<p>I recently added 3 new interesting features to <a href="http://sourceforge.net/projects/zerowine" target="_blank">Zerowine</a>. The very first one is the ability to dump the malware from memory while running and analyze the memory. This way, strings and code hidden in a packed malware can be analyzed because it is completely unpacked, as in the following example showing the strings from a variant of the MyTob malware packed with MEW.</p>
<div id="attachment_56" class="wp-caption aligncenter" style="width: 310px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine1.png"><img class="size-medium wp-image-56" title="zerowine1" src="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine1-300x242.png" alt="Zerowine: String analysis of the MyTob malware after dumping it from memory" width="300" height="242" /></a><p class="wp-caption-text">Zerowine: String analysis of the MyTob malware after dumping it from memory </p></div>
<p>The memory dumps can also be downloaded for later analysis with <a href="http://www.hex-rays.com/idapro/" target="_blank">IDA Pro</a>. The dumping process is done from outside <a href="http://www.winehq.org" target="_blank">WINE</a> with a <a href="http://www.python.org" target="_blank">Python</a> script (/home/malware/bin/dump_process.py) that uses <a href="http://python-ptrace.hachoir.org/" target="_blank">python-ptrace</a> to attach to the running malware and dump the memory.</p>
<p>I added also signatures using this new feature to detect the most typical Virtual Machine detection tricks (such as the <a href="http://www.invisiblethings.org/papers/redpill.html" target="_blank">redpill</a> trick or the VMWare&#8217;s backdoor).</p>
<p style="text-align: center;"><a href="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine11.png"><img class="size-medium wp-image-57 aligncenter" title="Red Pill Virtual Machine trick detected by Zerowine" src="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine11-291x300.png" alt="" width="291" height="300" /></a></p>
<p style="text-align: left;">In this screenshot you can see also the &#8220;Debugger detection tricks&#8221; section. The detection is done by analyzing the behavior of the malware. The following is an analysis of some Chinesse malware packed with <a href="http://www.oreans.com/products.php" target="_blank">Themida</a>:</p>
<p style="text-align: center;"><a href="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine2.png"><img class="alignnone size-medium wp-image-58" title="Zerowine: Antidebugging techniques detection" src="http://joxeankoret.com/blog/wp-content/uploads/2009/01/zerowine2-278x300.png" alt="" width="278" height="300" /></a></p>
<p>And, well, that&#8217;s all at the moment. The new version will be released (or at least I hope to do so) in a week.</p>
<p>Cheers!</p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2009/01/18/zerowine-malware-dumping-and-detection-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Malware Behavior Analysis: Zero Wine</title>
		<link>http://joxeankoret.com/blog/2008/12/28/malware-behavior-analysis-zero-wine/</link>
		<comments>http://joxeankoret.com/blog/2008/12/28/malware-behavior-analysis-zero-wine/#comments</comments>
		<pubDate>Sun, 28 Dec 2008 20:44:27 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=33</guid>
		<description><![CDATA[As a research project, I decided to create a &#8220;sandbox&#8221; to analyze malware and generate reports automatically based in the behavior. The sandbox is a Debian based distribution with WINE and various python libraries and tools. Generally, it works quite well to analyze malware even when it&#8217;s packed (as is pretty common in today&#8217;s malware). [...]]]></description>
			<content:encoded><![CDATA[<p>As a research project, I decided to create a &#8220;sandbox&#8221; to analyze malware and generate reports automatically based in the behavior. The sandbox is a <a title="Debian" href="http://www.debian.org" target="_blank">Debian</a> based distribution with <a title="WINE" href="http://www.winehq.org" target="_blank">WINE</a> and various <a title="Python" href="http://www.python.org" target="_blank">python</a> libraries and tools.</p>
<p>Generally, it works quite well to analyze malware even when it&#8217;s packed (as is pretty common in today&#8217;s malware). However, WINE fails with some packers as, in example, with <a title="Armadillo" href="http://www.siliconrealms.com/" target="_blank">Armadillo</a> when the &#8220;Compatibility Mode&#8221; is disabled. Anyway, almost all the packers I tried are working (themida, aspack, upx, etc&#8230;).</p>
<p>Zero Wine is distributed in source code form or as a prebuilt QEmu virtual machine: Download, unpack and run the virtual machine. Using the scripts supplied in the tar.gz file the vm&#8217;s port 8000 will be redirected to your computer&#8217;s 8000 port and the following very simple web page will be presented:</p>
<p style="text-align: center;"><a href="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img1.png"><img class="size-medium wp-image-27 aligncenter" title="Zero Wine Homepage" src="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img1-300x135.png" alt="" width="300" height="135" /></a></p>
<p style="text-align: left;">Quite simple: Just select the malware to upload, specify a timeout and click the submit button. After a while a report&#8217;s summary with 4 options will be presented:</p>
<p style="text-align: center;"><a href="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img2.png"><img class="size-medium wp-image-28 aligncenter" title="Zero Wine Report Summary" src="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img2-300x155.png" alt="" width="300" height="155" /></a></p>
<p style="text-align: left;">
<p style="text-align: left;">The options available are the following:</p>
<ol>
<li>Report: The complete raw report of all the APIs called by the malware. Hard to follow and hard to understand (a 10mb report is not uncommon).</li>
<li>Strings: Just the output of the typical unix command &#8220;strings&#8221;.</li>
<li>File headers: All the information gathered from the PE using the library <a title="PEFile" href="http://code.google.com/p/pefile/" target="_blank">PEFile</a>.</li>
<li>Signature: The signature report is an extract of the full raw report with the most interesting calls.</li>
</ol>
<p>When the malware was correctly analyzed the &#8220;Signature&#8221; report is all what you want. A sample malware&#8217;s report would be like the following:</p>
<p style="text-align: center;"><a href="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img6.png"><img class="size-medium wp-image-29 aligncenter" title="zerowine-img6" src="http://joxeankoret.com/blog/wp-content/uploads/2008/12/zerowine-img6-300x250.png" alt="" width="300" height="250" /></a></p>
<p style="text-align: left;">In this very first release, the reports aren&#8217;t saved in the virtual machines and you can analyze just one malware at a time (as the malware runs in a fixed WINEPREFIX) however, in future releases all the malware&#8217;s reports will be added to an SQLite format database and a new WINEPREFIX specific for every malware will be created.</p>
<p style="text-align: left;">The project is hosted in <a title="Zero Wine" href="http://sourceforge.net/projects/zerowine" target="_blank">Sourceforge</a> and, well, that&#8217;s all at the moment. Bye!</p>
<p style="text-align: left;">Joxean Koret</p>
<p style="text-align: left;">
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2008/12/28/malware-behavior-analysis-zero-wine/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
