View RSS Feed

Fb1h2s aka Rahul Sasi's Blog

Cracking a Captcha . Nullcon| EMC2 CTF 2015

Rating: 15 votes, 4.80 average.
Last week EMC2/nullcon CTF got over . Even though I really wanted to I did not have enough time to play the ctf. I was/am busy working on my "hacking Drones" research for Nullcon .

Last year I was one among the top 30 finilist of EMC2 defenders league and stood 5th in the final ranking.

Any way on Sunday night I got bit bored with drones and decided to take a sneak peak at the CTF, but by that time the winners were declared and score board was closed. I went straight to my favorite web and reversing challenges and decide to solve one of those. Web 5 was a captcha sovler for 500 point ands I decided that would be easy.
Name:  10926458_10153215485122454_4416938377600721468_n.jpg
Views: 6645
Size:  16.6 KB

The challenge was to break maximum number of captcha and submit using a given session token in a time frame of 2 minutes.

Analyzing the captcha we easily understand that , there are 5 easily visible colours.
Black == background
dark violet == dots
Gray == lines
Letters == In some form of light violet colors.

Name:  imagedemo.png
Views: 6307
Size:  1.9 KB

Form the look of it, it was an easily crack-able captcha .

This is small AI problem where we need to create a program that could recognize these captchas. We need to teach our AI program what is right and what is
wrong. So the first step is to build a training data set, that goes as an input for our captcha solver . For creating the training data people choose
different methods, they depend on neural networks, Vector Space Search etc. In our current situation we do not have a complicated data set. The captcha is simple and has only [a-z,A-Z ] characters in it.

Building the Training Data set.

Step 1 :

The captcha image we are provided was a PNG file, which is in RGBA mode [Red Green Blue Alpha] . ref: We will have to bring it down to a maxium of 255 colour space. And the best way to do that is to
convert the image to gif form png. We will use python PHL module to do that.

captcha_image = captcha_image.convert("P")


Next step is the find the image pixel concentration . Plot the colour and the respective pixel count.

We can use phil histogram and plot . And we get the following.

[-] Image pixel concentration

0 8344
190 938
53 301
96 184
204 113
205 69
60 24
210 14
211 7
95 4

Here 0 stands for Black and has the most pixel count 8344 followed by color 190. At this point I assumed color 204 and 205 are those that that are used for captcha letters.

Step 2:

Remove the noises from the image. This is easy to do as we can simply remove those pixels that are not used for captcha letters.

Simply plot those captcha letter colors to a new image and remove everything else.

if pix == 204 or pix == 205: # these are the numbers to get

Now we would get an image whose background is white with all noises removed.
Name:  10926789_10153215507002454_389180500482001232_o.jpg
Views: 6351
Size:  18.2 KB

Step 3:

Next step is to find the captcha letter spacing, and slice each characters out of the captcha .

This would be easy as we have only three different colours in our new image. 255,204,205 .

Horizontal position where letter start and stop .


image 1 Line Spacing is [(5, 13), (35, 43), (65, 73), (95, 102), (125, 133), (155, 163)]
Image 2 Line Spacing is [(5, 13), (35, 43), (66, 73), (96, 102), (125, 133), (155, 163)]

Each letters in the captcha occupied almost the same space .
Cut each characters and place them inside a folder.
Rename each letter images[file name] to there respective letter .
Now we will have a folder with sliced letter named with there respective letter.
Name:  10904415_10153215506997454_4402739535327763108_o.jpg
Views: 6414
Size:  19.3 KB

Final Solution algorithm:

The final algorithm to solve the captcha would be.

a) read a new captcha , session cookie
b) filter noise out
c) Slice filtered captcha and extract each letter
d) compare it with those letters kept in the letter folder and find the best match,
c) best match would be the captcha letter
d) continue for all letters in captcha
e) Submit the full captcha along with session cookie to application
f) fetch new captcha with session cookie, goto step b

Compare two images in Python:

There are multiple ways to compare an image in python .

1) Calculate the root mean square

2) Euclidean distance
3) Normalized cross-correlation

We will choose the normalized cross relation.

PHIL module's difference returns the absolute value of the difference between the two images.

ImageChops.difference(image1, image2) ⇒ image

out = abs(image1 - image2)

Our images are in the same shape and size. So this is the best bet.

PHP Code:
from PIL import Image,ImageChops
from operator import itemgetter
import urllib2
import cStringIO
#we have kept all our letters in this folder 
files_names =  glob.glob("/root/ctf/let/*.*")
#we need to get the captcha at the same time get the session cookie, and use it for all solved captcha request.
response urllib2.urlopen('')
cookie response.headers['Set-Cookie']
#print cookie

#lets make 500 request read teach captcha 
for x in range(1,500):
captcha =""
opener urllib2.build_opener()
opener.addheaders =[
'Accept''application/json, text/javascript, */*; q=0.01'),
'Cookie' ,cookie),]
length response.headers['content-length']
# read the captch and we will save them with there content length */
print "[-] Image Content length " length
#cStringIO to create an object from memmory
  #image_read ="/root/ctf/u.png")
image_read cStringIO.StringIO(image_read)
#im ="/root/ctf/de")
captcha_image captcha_image.convert("P")
temp = {}

#print im.histogram()
his captcha_image.histogram()
values = {}

i in range(256):
values[i] = his[i]
"[-] Image pixel concentration \n"  
for color,concentrate in sorted(values.items(), key=itemgetter(1), reverse=True)[:10]:
for x in range(captcha_image.size[1]):
y in range(captcha_image.size[0]):
pix captcha_image.getpixel((y,x))
temp[pix] = pix
if pix == 204 or pix == 205# these are the numbers to get
inletter False

= []

y in range(captcha_filtered.size[0]): # slice across
for x in range(captcha_filtered.size[1]): # slice down
pix captcha_filtered.getpixel((y,x))
pix != 255:
inletter True
if foundletter == False and inletter == True:
foundletter True

if foundletter == True and inletter == False:
foundletter False

print "[-] Horizontal Position Where letter start and stop \n"  
print letters
print "\n"

count 0
for letter in letters:
im3 captcha_filtered.crop(( letter[0] , 0letter[1],captcha_filtered.size[1] ))
#Match current letter with sample data"/root/ctf/let/%s.gif"%(m.hexdigest()),quality=95)
count += 1
#print files_names

class Fit:
letter None


letter in files_names:
#print letter
current Fit()
current.letter letter

#print sample_path
difference ImageChops.difference(basesample)
x in range(difference.size[0]):
y in range(difference.size[1]):
current.difference += difference.getpixel((xy))

not best.letter or best.difference current.difference:
best current
#final captcha decoded
tmp best.letter[14:15]
captcha captcha+tmp
#let us post the captcha to the server along with the session token
print "[+] Captcha is "captcha
data urllib.urlencode({'solution' captcha.strip(), 'Submit' 'Submit'})
My program had 97% success rate and after 50 successful entries I got the flag.
GitHub Code:
Name:  ctf_final_uploadn.jpg
Views: 6100
Size:  37.4 KB

Tags: None Add / Edit Tags



Total Trackbacks 0
Trackback URL: