# A typical work day with DeepToad

Sometimes, I receive so many malware samples that it turns out to be imposible (or at least inhuman) to analyze all the samples by hand and I need to automate the typical (boring) tasks: Clusterization of the samples in smaller sets and initial (and superficial) analysis of the different samples. For the first task I created DeepToad, a tool to clusterize any kind of file using fuzzy hashing techniques.

Clusterization of malware samples

The very first step is to run DeepToad and see what groups it finds out with 145 PDF malware samples:

$deeptoad.py . 1. sQOxsT6xPj7LPsvLgcuBgayBrKyGrIaG;.\c0d1dde49be3a07c4ef4acb79da7050afa6df5b8 2. Sb+//PzY2BgYCAj4+AgIXl6UlCoq7+/V;.\63a18865ae6b8851ed9e18f12333308f93e156eb 3. fITKfLLKSrJHSv5Haf5FaXFFbnFAbmZA;.\a30b967a495074e71c711a8cac93b836053e46c1 4. PzY/P7Q/tLSrtKur1qvW1l3WXV06XTo6;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde 5. iZDqiRzqSBwWSNUWg9WIg0mIJElTJElT;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d 6. LkREk5NMTKGhcHDz829vJyfe3qOjuLho;.\b5c5bd76bbb56c43ef67c3acb9d62908057c5fc6 7. WzMQW1oQ91oK9+8KDe/fDW/fc2/ScxzS;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144 8. sqysenq3t5mZUVFVVYuLZGT39xwcMDDN;.\1956954f28800edb72d3d05db908cc0a37d1c1a4 9. pmWmpmemZ2fRZ9HRgNGAgJuAm5ucm5yc;.\d5c3757ea828bed5ad4a184f7654140ae45e1f3f 10. xFTExIHEgYE7gTs7fTt9ff19/f2A/YCA;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567 11. iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35 12. iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\6e309298423e3e4d04e9432900768e9d9493e972 13. XpuPC8IfyYpfb+Y8+y88GkAcw4pVelnN;.\a45b64d4c6ef074f25a772c841b2041fa118189c 14. QyIiYmLW1oCAm5ucnGtrGhoaGtXVTU3Z;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069 15. 7NraXV0dHfv7FxcLC62tFxfR0erqysrJ;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1 16. dhQBdmEBR2GQR8SQ+MRs+G1sPW08PS48;.\134d4325d2dbed016d898996e7359f0169df4a21 17. (…a lot more different hashes…) 18. VIODRUVFRaqqXFydnfX1lZUPD6WlW1sN;.\9e8e153d80248bd88a178d831210ceec963a3d1d 19. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\18a0300a0147764a516702a29841d63d43d8b5c4 20. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f 21. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\9ebdbce3ecf04b477aa322c12c4370d79807879f 22. ypa7ymG78WHz8erzBOpfBOZf/uZL/qZL;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316 23. svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f 24. svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\db567d9f380b194a06afa40e6c26fa55859f5fa2 25. QRIMQXsMoHuKoPKKivJWih9WSh+JSn2J;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64 26. qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\002ae4bc6822fad96998cc5814d81d957bfa980c 27. qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\8a0414600a0ac1665611fba114dd2878a5e003f1 28. AlACAu8C7++v76+vmK+YmKqYqqpnqmdn;.\9b9be301a440f9c4b2bc5b88475859e4907ba74a 29. pS3MpZDMnJCKnOqKuOo4uAU4WAWdWHGd;.\8c388936f594c469003a1585ac8b7d0b10d92c6b 30. e8V7e1Z7VlamVqamBaYFBWMFY2M2YzY2;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea 31. Tgd3ToR3NISENEmE9km89qu886tU87dU;.\f87303c057fbca4bd2315798336bea26774858ff 32. mHSYmNCY0NDS0NLSiNKIiJ6Inp6Nno2N;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f 33. noZonnBosXCCsdmCdtmrdnirUnjWUjrW;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559 34. 0JGRMTGoqElJZ2c1NXp6paXBwb293Nzo;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5 Uhm… It doesn’t seem to be working OK. I can see some groups (WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD and qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo, for example) but it doesn’t help either because there are so many different fuzzy hashes no one can determine how close are between them. This bad output is because of the default block size used in DeepToad (512 bytes). The files are very small and, as so, the used block size doesn’t work OK to clusterize those files so, next step, change the block size (sorry for the long output, scroll down…): $ deeptoad.py -b=64 .
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\09d26510be759a54e9de9c011d171e2d30bdf61d
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\12875bebac82cef1392aec33902161340cba51a2
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7
(...more samples with the same hash...)
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\c9b7024aba6fcae432d177e604dddf95444a5733
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\cb59987e37857e5d3e2e87f5803a8679c39691ee
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\d33b12256cc68971f9355b8ed2dbf5ba6650c733
amuzat2zmd1OmThO5Tgm5TsmSjvUSi/U;.\134d4325d2dbed016d898996e7359f0169df4a21
reRTrX1TlX1llVdlXVfWXR/WWx+EW8iE;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8
a71ra1trW1vTW9PTztPOzgHOAQF2AXZ2;.\e60bca8c871c04384cfc4dccce704afb7e40d703
yZsxyfgx7fji7dfi9tfk9nzk9nyM9umM;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\18b7c952396cbb7c467b32209f2dae8aed830a64
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
(...more samples with the same hash...)
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\e764df606d3af0d8ce4b741689ea7712d12d7f42
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\ea44e955662633d1ac18c542c999e8619a120058
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\fa436c794b6b167b3bc905ae418b44057b913feb
a71rayNrIyO+I76+rb6trTatNjahNqGh;.\a45b64d4c6ef074f25a772c841b2041fa118189c
a71rawFrAQGUAZSUEJQQEP0Q/f0T/RMT;.\8643f5678f44314a5f63b4ef571f8eaf1585faff
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
rmv8ri/8Fi/BFnjB93i392q32Gp02J90;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\97a47252c2deff9062c421f01399d904b6be9d25
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\be334d38fef5221c4047ec6f89f378c5246b38f2
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\eb3736f0e85a939a9e09092b3d9fc119616cea76
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\f87303c057fbca4bd2315798336bea26774858ff
a71ra/hr+Pji+OLiUeJRUYdRh4cshyws;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
BPUfxLU19QJqrr/o0BDIT73U7rz0dqdB;.\86d04e76947116a96d09ed2af959250f48f8bd56
a71ra+5r7u4Y7hgY7Rjt7QXtBQW8Bby8;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\591800184d27139e34d8f8b3fe3537f74909cb6b
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\6a40e26883cb6295df1a0ebdee0e317974613749
(...more samples with the same hash...)
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fe05130ef9c841ba6e6013dae5e639bca1f32003
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
26cX2/gX+fjx+Xjxmngzmugz+ejo+Zbo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\09daa78a232de5db932ef8abe3c859eacc41f3ba
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\118d7b731ca29d316c2a65c58b9617dfa242d9cf
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\158822ec614d73b3027a9ef7590625f11f6873a5
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\18a0300a0147764a516702a29841d63d43d8b5c4
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35
5N2K5J6KK560K2y0amykasWka8U4a204;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931
z5kEz/EE6vEK6sUKCMVTCOtTTOsXTCsX;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18
vPu8vLa8trbVttXVdNV0dEl0SUmdSZ2d;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e

This time the output is better, isn’t it? 😉 We clearly see 5 different groups. I will change again the block size to something smaller, 32 instead of 64, to see what happens:

$deeptoad.py -b=32 . 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\09d26510be759a54e9de9c011d171e2d30bdf61d 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\12875bebac82cef1392aec33902161340cba51a2 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7 (...more samples with the same hash...) 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\f8ed9cca28a9c566b2c98bec903d63ebadc88b35 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fc90cf6a5c72dbd29843bc6a14f486192ac4ef1d 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fd3acefb4eb5b8677f9e4481f09bd7b2ddb1fdef 6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\ff7ebc93b56e74c17a2bfcc2d96a676ab124670a swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\134d4325d2dbed016d898996e7359f0169df4a21 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\2316da2ad647d61985026d4ac2a1c1fdf665fa8b swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8643f5678f44314a5f63b4ef571f8eaf1585faff swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\a45b64d4c6ef074f25a772c841b2041fa118189c swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\e60bca8c871c04384cfc4dccce704afb7e40d703 swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde rU6trXGtcXFLcUtL8Uvx8cTxxMScxJyc;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e oOr5oOv53esW3fIWB/L5B8n5FsntFv7t;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\18b7c952396cbb7c467b32209f2dae8aed830a64 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\1956954f28800edb72d3d05db908cc0a37d1c1a4 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\2c64cf6430662e93acd85789f1d7e75e6de6c2e8 (...more samples with the same hash...) mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\e764df606d3af0d8ce4b741689ea7712d12d7f42 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\ea44e955662633d1ac18c542c999e8619a120058 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7 mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\fa436c794b6b167b3bc905ae418b44057b913feb CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\db567d9f380b194a06afa40e6c26fa55859f5fa2 CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\e1b60cb7b05e93fadcd3c0e328150353cde8540a CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\002ae4bc6822fad96998cc5814d81d957bfa980c swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\09daa78a232de5db932ef8abe3c859eacc41f3ba swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015 (...more samples with the same hash...) swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\eb3736f0e85a939a9e09092b3d9fc119616cea76 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\f87303c057fbca4bd2315798336bea26774858ff swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fe05130ef9c841ba6e6013dae5e639bca1f32003 swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559 lPQDlOwD6ewj6fwj+vz1+tv1FNsBFAkB;.\52ee636ee7038affdefadd84f23ebee45411852d Tha1Tim1zCl/zJ9/2p/p2tPp4dPR4WPR;.\86d04e76947116a96d09ed2af959250f48f8bd56 XkX+XvL+7PI+7P0+Bv3qBtXqL9X7L6/7;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8 n+YDn9wDHdwCHfgC9/jQ9wzQ9AwB9BIB;.\90e4bbc93e7a576b975ec034c4abfd884d9a33ad This time the output is even better. There are 4/5 groups and 2 of them seems to be pretty close: the hashes swqzs4yzjIzejN7ejN6MjICMgIAogCgo and swqzs4mziYmCiYKCgIKAgPGA8fG28ba2. The generated hash starts with the same string (swqzs4) so it seems that both groups starts with the same content. However, DeepToad by default shows only the hash that creates the lowest number of sets so we don’t know if the files from the 2 groups starts or ends with the same string. To show all the generated signatures (the signature, reverse signature and simple signature) use the argument “-p” (to print all the hashes) and redirect the output to some file, like in the following example: $ deeptoad.py -b=32 -p . > files.csv

Now, we’ve a CSV formatted file with all the hashes. Open it with some sort of “advanced analysis tool” like OpenOffice’s calc, Star Calc, GNumeric or Microsoft Excel and sort the columns like in the following picture:

As we can see, there are 3 similar looking groups and the matching signature (“Signature” field) specifies that both files starts with a similar content so we may consider all the files starting with “swqzs4” a group. I reduced the number of different elements to be analyzed from 145 to 5 groups and 6 completely different (unique) malware samples. Now, it’s time to see what tricks they are using and what is the purpose of them 😉 But this will be for another post…

## 2 thoughts on “A typical work day with DeepToad”

1. talfiq

Hi bro.

I am curious about the differences of bit that you use. What if you use -b=1 and get a lot(or less) groups. Which one is better. More groups or less groups?

Less groups means down to “atomic level” similarity? 😐