A typical work day with DeepToad

Sometimes, I receive so many malware samples that it turns out to be imposible (or at least inhuman) to analyze all the samples by hand and I need to automate the typical (boring) tasks: Clusterization of the samples in smaller sets and initial (and superficial) analysis of the different samples. For the first task I created DeepToad, a tool to clusterize any kind of file using fuzzy hashing techniques.

Clusterization of malware samples

The very first step is to run DeepToad and see what groups it finds out with 145 PDF malware samples:

$ deeptoad.py .
  1. sQOxsT6xPj7LPsvLgcuBgayBrKyGrIaG;.\c0d1dde49be3a07c4ef4acb79da7050afa6df5b8
  2. Sb+//PzY2BgYCAj4+AgIXl6UlCoq7+/V;.\63a18865ae6b8851ed9e18f12333308f93e156eb
  3. fITKfLLKSrJHSv5Haf5FaXFFbnFAbmZA;.\a30b967a495074e71c711a8cac93b836053e46c1
  4. PzY/P7Q/tLSrtKur1qvW1l3WXV06XTo6;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
  5. iZDqiRzqSBwWSNUWg9WIg0mIJElTJElT;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d
  6. LkREk5NMTKGhcHDz829vJyfe3qOjuLho;.\b5c5bd76bbb56c43ef67c3acb9d62908057c5fc6
  7. WzMQW1oQ91oK9+8KDe/fDW/fc2/ScxzS;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
  8. sqysenq3t5mZUVFVVYuLZGT39xwcMDDN;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
  9. pmWmpmemZ2fRZ9HRgNGAgJuAm5ucm5yc;.\d5c3757ea828bed5ad4a184f7654140ae45e1f3f
  10. xFTExIHEgYE7gTs7fTt9ff19/f2A/YCA;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
  11. iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35
  12. iVGJiUWJRUXzRfPzefN5eY95j4+gj6Cg;.\6e309298423e3e4d04e9432900768e9d9493e972
  13. XpuPC8IfyYpfb+Y8+y88GkAcw4pVelnN;.\a45b64d4c6ef074f25a772c841b2041fa118189c
  14. QyIiYmLW1oCAm5ucnGtrGhoaGtXVTU3Z;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
  15. 7NraXV0dHfv7FxcLC62tFxfR0erqysrJ;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1
  16. dhQBdmEBR2GQR8SQ+MRs+G1sPW08PS48;.\134d4325d2dbed016d898996e7359f0169df4a21
  17. (…a lot more different hashes…)
  18. VIODRUVFRaqqXFydnfX1lZUPD6WlW1sN;.\9e8e153d80248bd88a178d831210ceec963a3d1d
  19. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\18a0300a0147764a516702a29841d63d43d8b5c4
  20. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f
  21. WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD;.\9ebdbce3ecf04b477aa322c12c4370d79807879f
  22. ypa7ymG78WHz8erzBOpfBOZf/uZL/qZL;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
  23. svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
  24. svDwb2/k5Pf3aGjb22JiSEjDw1RUpqal;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
  25. QRIMQXsMoHuKoPKKivJWih9WSh+JSn2J;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64
  26. qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\002ae4bc6822fad96998cc5814d81d957bfa980c
  27. qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo;.\8a0414600a0ac1665611fba114dd2878a5e003f1
  28. AlACAu8C7++v76+vmK+YmKqYqqpnqmdn;.\9b9be301a440f9c4b2bc5b88475859e4907ba74a
  29. pS3MpZDMnJCKnOqKuOo4uAU4WAWdWHGd;.\8c388936f594c469003a1585ac8b7d0b10d92c6b
  30. e8V7e1Z7VlamVqamBaYFBWMFY2M2YzY2;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
  31. Tgd3ToR3NISENEmE9km89qu886tU87dU;.\f87303c057fbca4bd2315798336bea26774858ff
  32. mHSYmNCY0NDS0NLSiNKIiJ6Inp6Nno2N;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f
  33. noZonnBosXCCsdmCdtmrdnirUnjWUjrW;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
  34. 0JGRMTGoqElJZ2c1NXp6paXBwb293Nzo;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5

Uhm… It doesn’t seem to be working OK. I can see some groups (WqtaWlNaU1PFU8XFl8WXl56Xnp6DnoOD and qVgHqXoHmXqTmfmTdvlndgxnQAyoQHOo, for example) but it doesn’t help either because there are so many different fuzzy hashes no one can determine how close are between them. This bad output is because of the default block size used in DeepToad (512 bytes). The files are very small and, as so, the used block size doesn’t work OK to clusterize those files so, next step, change the block size (sorry for the long output, scroll down…):

$ deeptoad.py -b=64 .
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\09d26510be759a54e9de9c011d171e2d30bdf61d
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\12875bebac82cef1392aec33902161340cba51a2
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\1a269b62104ad68238e5bb412bf7c22c4d5d757b
(...more samples with the same hash...)
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\1d0033c9fa4181dd839b8a30e98380487fadce37
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\c9b7024aba6fcae432d177e604dddf95444a5733
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\cb59987e37857e5d3e2e87f5803a8679c39691ee
ddh2de52bO7JbMfJ9cfL9VzLm1z7m137;.\d33b12256cc68971f9355b8ed2dbf5ba6650c733
amuzat2zmd1OmThO5Tgm5TsmSjvUSi/U;.\134d4325d2dbed016d898996e7359f0169df4a21
reRTrX1TlX1llVdlXVfWXR/WWx+EW8iE;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8
a71ra1trW1vTW9PTztPOzgHOAQF2AXZ2;.\e60bca8c871c04384cfc4dccce704afb7e40d703
yZoJyQEJ3wHs373sQL0uQKMulqNHlndH;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5
yZsxyfgx7fji7dfi9tfk9nzk9nyM9umM;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\18b7c952396cbb7c467b32209f2dae8aed830a64
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
(...more samples with the same hash...)
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\e764df606d3af0d8ce4b741689ea7712d12d7f42
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\ea44e955662633d1ac18c542c999e8619a120058
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7
hbSFhXmFeXlGeUZGY0ZjY+Nj4+N443h4;.\fa436c794b6b167b3bc905ae418b44057b913feb
a71rayNrIyO+I76+rb6trTatNjahNqGh;.\a45b64d4c6ef074f25a772c841b2041fa118189c
J+k2Jxs23Ruk3e6kDO5ZDC5Zzy6oz3Co;.\90e4bbc93e7a576b975ec034c4abfd884d9a33ad
a71rawFrAQGUAZSUEJQQEP0Q/f0T/RMT;.\8643f5678f44314a5f63b4ef571f8eaf1585faff
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
yqUZyggZ3gju3r/uQL8uQJwuQ5w2Qws2;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
rmv8ri/8Fi/BFnjB93i392q32Gp02J90;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\97a47252c2deff9062c421f01399d904b6be9d25
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\b6ca92fa83b9f938f6c766c672faa93c4ff6ed64
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\be334d38fef5221c4047ec6f89f378c5246b38f2
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\eb3736f0e85a939a9e09092b3d9fc119616cea76
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303
xq0Xxu8X3+8N33QNRHSKRA+KVg8FVkUF;.\f87303c057fbca4bd2315798336bea26774858ff
a71ra/hr+Pji+OLiUeJRUYdRh4cshyws;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
BPUfxLU19QJqrr/o0BDIT73U7rz0dqdB;.\86d04e76947116a96d09ed2af959250f48f8bd56
a71ra+5r7u4Y7hgY7Rjt7QXtBQW8Bby8;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\2b388c3f53f87d20af00099a8b2d903043fd7c8f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\31b6a93c36064fe3124a0ad4c28491b3b6ca0398
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\4bef1507a5c2e751b0b7f96f8ccce688a709730f
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\591800184d27139e34d8f8b3fe3537f74909cb6b
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\61dd9d7899d0d6a73d397cf3b9d0af6f5c2fa68d
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\6a40e26883cb6295df1a0ebdee0e317974613749
(...more samples with the same hash...)
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\fe05130ef9c841ba6e6013dae5e639bca1f32003
uqYNuuUN3eUd3QcdfQezfWOz7WP97b/9;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
26cX2/gX+fjx+Xjxmngzmugz+ejo+Zbo;.\2316da2ad647d61985026d4ac2a1c1fdf665fa8b
26cX2/gX+fjx+Xjxmngzmugz+ejo+Zbo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\002ae4bc6822fad96998cc5814d81d957bfa980c
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\09daa78a232de5db932ef8abe3c859eacc41f3ba
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\118d7b731ca29d316c2a65c58b9617dfa242d9cf
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\158822ec614d73b3027a9ef7590625f11f6873a5
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\18a0300a0147764a516702a29841d63d43d8b5c4
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\2875dc2f6b8ba232f2b86361f0b929ac3d670f35
saMTsfQT4vQi4gcifQezfWOz7WP97cH9;.\d5c3757ea828bed5ad4a184f7654140ae45e1f3f
5N2K5J6KK560K2y0amykasWka8U4a204;.\e1b60cb7b05e93fadcd3c0e328150353cde8540a
5N2K5J6KK560K2y0amykasWka8U4a204;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931
xuGCxhWCUxU5UyA5IyB5I6J5O6IJO7YJ;.\52ee636ee7038affdefadd84f23ebee45411852d
z5kEz/EE6vEK6sUKCMVTCOtTTOsXTCsX;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18
vPu8vLa8trbVttXVdNV0dEl0SUmdSZ2d;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e

This time the output is better, isn’t it? 😉 We clearly see 5 different groups. I will change again the block size to something smaller, 32 instead of 64, to see what happens:

$ deeptoad.py -b=32 .
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\05ca2f4386d77c8f344ff24a0a9e1869f4dc3fe3
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\09d26510be759a54e9de9c011d171e2d30bdf61d
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\0bdde8fdb58848ee1e9bacd3d61bac1f670a1b1e
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\12875bebac82cef1392aec33902161340cba51a2
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\15f7e4b04fd6f3bce2bab4df4ed6f52ea06b74e7
(...more samples with the same hash...)
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\f8ed9cca28a9c566b2c98bec903d63ebadc88b35
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fc90cf6a5c72dbd29843bc6a14f486192ac4ef1d
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\fd3acefb4eb5b8677f9e4481f09bd7b2ddb1fdef
6Ijv6IPv8YO+8be+irdkirpksbrJsY3J;.\ff7ebc93b56e74c17a2bfcc2d96a676ab124670a
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\01e22ef6d30aabc76f87fd7c37aa4b2ccc85cfe6
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\134d4325d2dbed016d898996e7359f0169df4a21
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\2316da2ad647d61985026d4ac2a1c1fdf665fa8b
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\50fa0d3f79fcfa81ef6e6b9755aa335603a09f18
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\79136bdc3c121e6b28045e4e6be2b6140f2262ea
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8239d3db30f1527a01e1ddd3fc5b93c189fdb567
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\8643f5678f44314a5f63b4ef571f8eaf1585faff
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\a45b64d4c6ef074f25a772c841b2041fa118189c
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\cdbf4d2f16ae742cc9b8f25bd0c5490fb73e9144
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\e60bca8c871c04384cfc4dccce704afb7e40d703
swqzs4yzjIzejN7ejN6MjICMgIAogCgo;.\f5970550268e6a8bf2eeb96ed4a48ccb319e7cde
rU6trXGtcXFLcUtL8Uvx8cTxxMScxJyc;.\9d9659de8bd199d24e4d18a63e12f05b7b9fd07e
oOr5oOv53esW3fIWB/L5B8n5FsntFv7t;.\9406d9612a6405b95d6316ada56a39ea9e55e2a5
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\17f5b212aa41ab7aea7f3d5dc9ba99f2b88bb069
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\18b7c952396cbb7c467b32209f2dae8aed830a64
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\1956954f28800edb72d3d05db908cc0a37d1c1a4
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\2c64cf6430662e93acd85789f1d7e75e6de6c2e8
(...more samples with the same hash...)
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\dffe57c2b63204b5c812f64fbfc77c6e267827f1
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\e764df606d3af0d8ce4b741689ea7712d12d7f42
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\ea44e955662633d1ac18c542c999e8619a120058
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\f8b1aecede7003a54dcb8d34a7fa6bcdc3bd74a7
mlvqmrTqxLSExMGEBsFdBmldemlhehdh;.\fa436c794b6b167b3bc905ae418b44057b913feb
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\9effb1fcf09e77f3f9f2ed404e604d58d44fc37f
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\db567d9f380b194a06afa40e6c26fa55859f5fa2
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\e1b60cb7b05e93fadcd3c0e328150353cde8540a
CgYKClEKUVGMUYyMgoyCgvWC9fXh9eHh;.\ec7f7a1bd9810007197df99dba763dd7ccd9b931
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\002ae4bc6822fad96998cc5814d81d957bfa980c
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\06ea2c25ac8b148efc447e86d7d09dc8960b0316
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\09daa78a232de5db932ef8abe3c859eacc41f3ba
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\0bfdc7242efeeb497b99b8e6dda1cd5fac0d1015
(...more samples with the same hash...)
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\eb3736f0e85a939a9e09092b3d9fc119616cea76
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ef5ed9ec17fcf1dd957b6886c7e8cbe2f686d303
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\f87303c057fbca4bd2315798336bea26774858ff
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fce5cc2165843bf9f9379b8933c9b3d07c5687e6
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\fe05130ef9c841ba6e6013dae5e639bca1f32003
swqzs4mziYmCiYKCgIKAgPGA8fG28ba2;.\ff890f7475a4571d1cfc8f144b2c8141f3cc8559
lPQDlOwD6ewj6fwj+vz1+tv1FNsBFAkB;.\52ee636ee7038affdefadd84f23ebee45411852d
Tha1Tim1zCl/zJ9/2p/p2tPp4dPR4WPR;.\86d04e76947116a96d09ed2af959250f48f8bd56
XkX+XvL+7PI+7P0+Bv3qBtXqL9X7L6/7;.\3be38b2a2d39d7a21c4e388c48238543152bc4e8
n+YDn9wDHdwCHfgC9/jQ9wzQ9AwB9BIB;.\90e4bbc93e7a576b975ec034c4abfd884d9a33ad

This time the output is even better. There are 4/5 groups and 2 of them seems to be pretty close: the hashes swqzs4yzjIzejN7ejN6MjICMgIAogCgo and swqzs4mziYmCiYKCgIKAgPGA8fG28ba2. The generated hash starts with the same string (swqzs4) so it seems that both groups starts with the same content. However, DeepToad by default shows only the hash that creates the lowest number of sets so we don’t know if the files from the 2 groups starts or ends with the same string. To show all the generated signatures (the signature, reverse signature and simple signature) use the argument “-p” (to print all the hashes) and redirect the output to some file, like in the following example:

$ deeptoad.py -b=32 -p . > files.csv

Now, we’ve a CSV formatted file with all the hashes. Open it with some sort of “advanced analysis tool” like OpenOffice’s calc, Star Calc, GNumeric or Microsoft Excel and sort the columns like in the following picture:

As we can see, there are 3 similar looking groups and the matching signature (“Signature” field) specifies that both files starts with a similar content so we may consider all the files starting with “swqzs4” a group. I reduced the number of different elements to be analyzed from 145 to 5 groups and 6 completely different (unique) malware samples. Now, it’s time to see what tricks they are using and what is the purpose of them 😉 But this will be for another post…

2 thoughts on “A typical work day with DeepToad

  1. talfiq

    Hi bro.

    I am curious about the differences of bit that you use. What if you use -b=1 and get a lot(or less) groups. Which one is better. More groups or less groups?

    Less groups means down to “atomic level” similarity? 😐

Leave a Reply

Your email address will not be published. Required fields are marked *