<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Unintended Results &#187; pyew</title>
	<atom:link href="http://joxeankoret.com/blog/tag/pyew/feed/" rel="self" type="application/rss+xml" />
	<link>http://joxeankoret.com/blog</link>
	<description>Or maybe not</description>
	<lastBuildDate>Fri, 14 May 2010 23:41:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Analyzing PDF exploits with Pyew</title>
		<link>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/</link>
		<comments>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 14:46:23 +0000</pubDate>
		<dc:creator>joxean</dc:creator>
				<category><![CDATA[Malware]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[pyew]]></category>
		<category><![CDATA[obfuscated]]></category>
		<category><![CDATA[pdf]]></category>

		<guid isPermaLink="false">http://joxeankoret.com/blog/?p=95</guid>
		<description><![CDATA[Something I really hate to do when analyzing PDF malware exploits is to manually extract the streams and manually decode them to see the, typically, hidden JavaScript code, so I decided to extend the PDF plugin for Pyew to automatically see them. Now, with the new version of the plugin (download it from the Mercurial [...]]]></description>
			<content:encoded><![CDATA[<p>Something I really hate to do when analyzing PDF malware  exploits is to manually extract the streams and manually decode them to see the, typically, hidden JavaScript code, so I decided to extend the PDF plugin for <a title="Pyew" href="http://code.google.com/p/pyew" target="_blank">Pyew</a> to automatically see them. Now, with the new version of the plugin (download it from the <a href="http://code.google.com/p/pyew/source/checkout" target="_blank">Mercurial repository</a>) we can see what filters are used in the exploit and, the most important thing, we can see the decoded streams, independently of how many filters are being used.<br />
<span id="more-95"></span><br />
<strong>Example</strong></p>
<p>For example, I will take one obfuscated PDF exploit (SHA256 6a8204ee7b703f96f811f32f903ac9df4045b05910d633fc34fed89e2e0a7576). I will open it in Pyew to see what is inside so, simply, run the command &#8220;pyew pdf.file&#8221;:</p>
<blockquote><p>$ pyew sample.pdf<br />
PDF File</p>
<p>PDFiD 0.0.9_PL 6a8204ee7b703f96f811f32f903ac9df4045b05910d633fc34fed89e2e0a7576<br />
PDF Header: %PDF-1.1<br />
obj                    4<br />
endobj                 4<br />
stream                 1<br />
endstream              1<br />
xref                   1<br />
trailer                1<br />
startxref              1<br />
/Page                  1<br />
/Encrypt               0<br />
/ObjStm                0<br />
/JS                    1<br />
/JavaScript            1<br />
/AA                    0<br />
/OpenAction            1<br />
/AcroForm              0<br />
/JBIG2Decode           0<br />
/RichMedia             0<br />
/Colors &gt; 2^24         0<br />
%%EOF                  1<br />
After last %%EOF       0<br />
Total entropy:           4.293999 (      5547 bytes)<br />
Entropy inside streams:  3.669587 (      4773 bytes)<br />
Entropy outside streams: 5.132696 (       774 bytes)</p>
<p>(&#8230;)</p>
<p>[0x00000000]&gt; p<br />
%PDF-1.1<br />
%&amp;#1074;&amp;#1075;&amp;#1055;&amp;#1059;<br />
1 0 obj<br />
&lt;&lt;<br />
/Type /Catalog<br />
/OpenAction &lt;&lt;<br />
/JS 4 0 R<br />
/S /JavaScript<br />
&gt;&gt;<br />
/Pages 2 0 R<br />
&gt;&gt;<br />
endobj<br />
2 0 obj<br />
&lt;&lt;<br />
/Type /Pages<br />
/Kids [ 3 0 R ]<br />
/Count 1<br />
&gt;&gt;<br />
endobj<br />
3 0 obj<br />
&lt;&lt;<br />
/Type /Page<br />
/Parent 2 0 R<br />
/Resources &lt;&lt;<br />
/Font &lt;&lt;<br />
/F1 &lt;&lt;<br />
/Type /Font<br />
/Name /F1<br />
/Subtype /Type1<br />
/BaseFont /Helvetica<br />
&gt;&gt;<br />
&gt;&gt;<br />
&gt;&gt;<br />
/MediaBox [ 0 0 795 842 ]<br />
&gt;&gt;<br />
endobj<br />
4 0 obj<br />
&lt;&lt;<br />
/Length 4769<br />
/Filter [/ASCIIHexDecode /ASCII85Decode /#4c</p></blockquote>
<p>What we see in Pyew? The output of <a href="http://blog.didierstevens.com/programs/pdf-tools/" target="_blank">PDFId</a> (a great tool by Didier Stevens) as well as the hexadecimal output of the first block (512 bytes). Taking a brief look to the 1st block of data we see one "OpenAction" to execute JavaScript. Surprise. The code "/JS 4 0 R" specifies that the JavaScript code to be executed is the object number 4. Seeking to the offset where the object #4 is and printing the buffer (in ASCII) we will find the following:</p>
<blockquote>
<pre>[0x000001b7]&gt; s 0x1b7
[0x000001b7]&gt; p
4 0 obj
&lt;&lt;
        /Length 4769
        /Filter [/ASCIIHexDecode /ASCII85Decode /#4c#5a#57De#63#6fde /R#75nLen#67t#68#44ecod#65 /FlateDecode ]
&gt;&gt;stream
4A2E3539605651222D714E634326304C5A47725A236A63494B26682C323A4E532&#8230;</pre>
</blockquote>
<p>The object is multiple times encoded and, which is more, the strings to specify what filters must be used in order to decode the stream are encoded too. It's perfectly legal according to the PDF specifications, although pretty suspicious. Pyew does a good job decoding both the encoded strings and the multiple times encoded stream. To see the streams just type "pdfvi" to see the encoded streams in the console:</p>
<blockquote>
<pre>eval(unescape("%76%61%72%20%56%68%4C%66%4E%20%3D..."))</pre>
</blockquote>
<p>Wow! it's a <em>small</em> chunk of JavaScript data <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Pyew <em>automagically</em> applied all the filters needed (ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode and FlateDecode) and printed out the obfuscated code. We can see it, too, in a graphical user interface. Instead of typing "pdfvi" execute the command "pdfview". You will see the following screen:</p>
<div id="attachment_96" class="wp-caption aligncenter" style="width: 310px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf1.png"><img class="size-medium wp-image-96" title="Obfuscated Stream View" src="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf1-300x156.png" alt="Obfuscated Stream View" width="300" height="156" /></a><p class="wp-caption-text">Obfuscated Stream View</p></div>
<p><strong>More Examples</strong></p>
<p>OK, so we can see now the encoded stream but, what if there are a lot of encoded streams and we must check them all or if we want to see just one of them? For this purpose, and also to show the Pyew's APIs, I created an example usage of the PDF API. The example reads all the streams and shows a list of all the encoded streams as you may see in the following snapshot:</p>
<div id="attachment_97" class="wp-caption aligncenter" style="width: 310px"><a href="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf2.png"><img class="size-medium wp-image-97" title="Usage example of the PDF API" src="http://joxeankoret.com/blog/wp-content/uploads/2010/02/pdf2-300x156.png" alt="Usage example of the PDF API" width="300" height="156" /></a><p class="wp-caption-text">Usage example of the PDF API</p></div>
<p>Using this simple screen we can see all the streams or just one specific (encoded) stream. This is the code of this example usage of the Pyew's API for the PDF format:</p>
<pre lang="python">#!/usr/bin/env python

import os
import sys

from pyew_core import CPyew
from easygui import choicebox, fileopenbox, msgbox

def main(filename=None):
    if filename is None:
        filename = fileopenbox(msg="Select PDF file", default="*.pdf", filetypes=["*.pdf"])
        if filename is None:
            return

    pyew = CPyew(batch=True)
    pyew.loadFile(filename)

    streams = pyew.plugins["pdfilter"](pyew, doprint=True)
    if len(streams) == 0:
        msgbox(title="PDF Streams",msg="No encoded streams found")

    l = []
    l.append("About PDF Streams Viewer")
    l.append("See all streams (both encoded and unencoded)")
    for x in streams:
        l.append("Stream %d encoded with %s" % (x, streams[x]))
    l.append("Quit")

    while 1:
        c = choicebox(msg="Select one stream to view it decoded", title="Stream Viewer", choices=l)
        if c is None:
            break
        elif c.lower() == "quit":
            break
        elif c.lower().startswith("about"):
            msgbox(title="About PDF Streams Viewer",
                   msg="Example usage of the Pyew APIs to see PDF streams. Written by Joxean Koret")
        elif c.lower().startswith("see all"):
            pyew.plugins["pdfview"](pyew, doprint=False, stream_id=-1)
        else:
            stream_id = int(c.split(" ")[1])
            pyew.plugins["pdfview"](pyew, stream_id=stream_id)

if __name__ == "__main__":
    if len(sys.argv) == 1:
        main()
    else:
        main(sys.argv[1])</pre>
<p>And, that's all for the moment. I hope you like the new Pyew's features <img src='http://joxeankoret.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://joxeankoret.com/blog/2010/02/21/analyzing-pdf-exploits-with-pyew/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
