PDA

View Full Version : Extracting URLs Out of PCAP File



c0dist
03-12-2015, 10:51 PM
Hello everyone,

Just sharing something new that I learnt today, so that others can get the benefit out of it and also so that I'll know where to look when I forget. :p
I believe we all know what PCAP files are. Many times, we capture some network traffic and let's assume that was HTTP traffic and what if you need the URLs your PCAP has?
Fret not, Python comes to rescue. We can use "dpkt (https://pypi.python.org/pypi/dpkt)", a fast, simple packet creation / parsing library. While reading something today, I came across this blog post which wonderfully explains how to parse PCAP files using 'dpkt' library in Python - https://jon.oberheide.org/blog/2008/...g-a-pcap-file/.

So, I took the liberty to edit code, added some things here and there and this is what it looks like now.
Please note: All the credits goes to the original author. Also, I take no liability of working of this code. Enough talk, here's your code.



#!/usr/bin/env python

import dpkt
import sys

f = open(sys.argv[1], "r")
pcap = dpkt.pcap.Reader(f)

http_ports = [80, 8080] # Add other ports if you website on non-standard port.
urls = [ ]

for timestamp, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
ip = eth.data
tcp = ip.data
if tcp.__class__.__name__ == 'TCP':
if tcp.dport in http_ports and len(tcp.data) > 0:
try:
http = dpkt.http.Request(tcp.data)
urls.append(http.headers['host'] + http.uri)
except Exception as e:
# Just in case we come across some stubborn kid.
print "[-] Some error occured. - %s" % str(e)
f.close()

print "[+] URLs extracted from PCAP file are:\n"
for url in urls:
print url



Usage:
You'll need to install "dpkt" package by using pip/easy_install/whatever you want. On a Linux machine, it can be installed as

batman@gothamcity /tmp $ sudo pip install dpkt # Yup, that's all.

Just copy the code in a file, save it at <filename>.py. And run it as,

batman@gothamcity /tmp $ python pcap-url.py testing.pcap



Example:
I downloaded a PCAP file from here - http://malware-traffic-analysis.net/.../01/index.html and ran it against the script, here's the result:

http://i.imgur.com/ovciCLx.png?1


Hope this helps. :)
Regards,
c0dist