View Full Version : Tutorial - Sed and Awk From Pen Testers Perspective

04-17-2011, 10:10 PM
I am starting a series of Tutorial on my blog. It will cover basics of hacking related topic starting with Sed and Awk.

Prerequisite - Need to have linux box installed. A little familiarity would be nice, but i have tried to keep it simple. If any problem, let me know, I would be glad to help.

Also give me your feedback, since its my first post on security. I would be happy to learn and get better at it.


04-18-2011, 12:12 AM
Today we are going to see what sed and a little bit of awk are and how we can use it in Pen Testing. I am using Backtrack 4 in vmware.

Recently a hacker group called Anonymous attacked rootkit.com and uploaded the whole database of the site online. Its a huge database. With some 80,000+ users and many other information in it.

Lets say we are the Pen Testers. And we just got hold of that huge database. Inside the database there are many tables, but the one that is of your interest is called - people and it has many fields, but you are interested in only the user name and its hashed password. We want to extract all the user name and its hash in this format

You certainly can't do this manually. And it doesn't suit us doing manual work, Ahmm .. CTRL-C -> CTRL-V :).

We like to do things fast and smart.... right ;)
Sed and Awk (and grep too) to the rescue!

So what is this... Sed

Sed is a Stream Editor in which we feed some text, and it processes them line by line and performs some commands which manipulate the text in the way we want. For example We can replace all " " to ":" or replace all the occurrences of the string "hello" to "hi" and many awesome stuffs.
Hold ...Hold, before you say " big deal. That I can do with Replace All command in my Notepad" (yeah even I thought the same before I learned Sed)

OK, Lets start the Magic Show.

Here's the link (http://dazzlepod.com/rootkit/), download the gzip file rootkit_com_mysqlbackup_02_06_11.gz , and paste it in any folder in your Linux machine.

Once you have downloaded the file, rename the file so that its small

root@bt:~/blog# mv rootkit_com_mysqlbackup_02_06_11.gz database.gz

now we need to decompress the file.

root@bt:~/blog# gunzip database.gz

Now that you have decompressed it, you should have a file called database (without any .gz)

OK. Just to get a feel of how BIG the database is just do the Cat command on the database file, go have some coke, sleep and come back. :)
No, don't worry, it is possible to extract fields from this huge file in a very clever yet elegant way. Hold On, magic is about to begin.

OK. First we need to know what we are dealing with.
Open database file in Vim.
We will search for all the Create Table Statements. Do

and keep pressing "n" for next occurrence of the given string. You will notice that there are so many tables in this database. Keep going till you hit the people database. Saw? OK. Now keep going down (using down arrow key) slowly and keep noticing the fields (yeah there are many fields in this table). You will hit the insert into table and notice that the insert into line span for multiple lines.
INSERT INTO `people` VALUES (1,'admin','51a42fa118e77f95f70d4efff4395f8d','roo tkit sysop','hoglund@rootkit.com',10,0,'','','','','',' ',0,'http://www.rootkit.com/usericons/admin.jpg','',1296966693,'',12967051 13,1283501911,1296457930,1294556469,1294942812,0,0 ,'','','','','',-1,'P'),(2,'aaronh','7cb8b36d','Aaron Heady','hackdoctor@aol.com',1,0,'','','','','','', 0,'http://www.rootkit.com/usericons/aaronh.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','',-1,''),(3,'abcdef','e80
.................................................. ........................and so on

To be sure that one line is really spanning multiple lines do the following command inside Vim
[ESC] :set nu

And it should show you the line number at which insert into is. If you are at first insert into
(1,'admin','51a42fa118e77f95f70d4efff4395f8d','roo tkit...
then the line should be 425.

Anyway the point was it is spanning multiple lines. It is important to know this. Why? Because cut command cuts per line.

What is this Cut command you ask? We will see that later in detail. The mist is about to clear. Forget Cut for now.

Now that you know what lines you want to work on i.e. INSERT INTO `people`
(Note around people is not single quotes but backtick, which lies above the tab key)

To select only particular lines from this file we will use grep command.
Grep command takes a file as input and one or more strings to match. The lines that are matched are returned from that file. There are many more features of grep (check out this command -> #man grep)

ok quit from the Vim
To quit Vim
[ESC] :q!

Oh just press CTRL-L to clear the screen. :) if you are wondering how to clear so much clutter on the screen.

OK Do,

root@bt:~/blog# grep "INSERT INTO \`people\`" database

You will see a huge amount of output even now, but don't worry we have extracted out only the INSERT INTO `people` statements.

Wait why we put those \ in front of Backticks "`" ?
Because backticks are special characters and we want linux to treat them as normal characters. To make any special character normal character we put "\" in front of them.

Ok. Now Very Important part.
Each insert into statement is inserting many values within.
(1,'admin',...),(2, 'aaronh',....) etc

To simplify things we are going to put each row of value in seperate line.
How we are going to do this? By asking sed to substitute "),(" with a newline. Why "),(" ? Because that is where your one row is ending and a new one is beginning.

So we do,
root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g

Ok. Is that too much? :) Don't worry i'll explain.
1. grep "INSERT INTO \`people\`" database

What we did before.

2. |

Pipe Command. What it does is, it passes the output of one command as input to the other command. So here, the selected text from the grep command is passed as input to the sed command. Simple!

3. sed s/\)\,\(/\\n/g

o here s stands for substitute.
o what to replace is told after the first /
o i.e. to replace string "),("
o we added \ before ) and ( so that its treated as normal characters not special characters.
o 2nd / specifies the string to replace with
o to replace with string is newline, i.e. \n but since \ is a special character we make it normal character buy adding one more \ :)
o 3rd / specifies substitute all the occurrences (g = global) of the, to replace string
o Note Replacing a string and putting a newline is something you cannot do in notepad with replace All :)
Now when you run the command, you yet cant see the output.
Append the command with a head command. By default head command outputs only first ten lines of text file given (and tail command does the opposite)

So just do

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | head

and this should be the output
INSERT INTO `people` VALUES (1,'admin','51a42fa118e77f95f70d4efff4395f8d','roo tkit sysop','hoglund@rootkit.com',10,0,'','','','','',' ',0,'http://www.rootkit.com/usericons/admin.jpg','',1296966693,'',12967051 13,1283501911,1296457930,1294556469,1294942812,0,0 ,'','','','','',-1,'P'
2,'aaronh','7cb8b36d','Aaron Heady','hackdoctor@aol.com',1,0,'','','','','','', 0,'http://www.rootkit.com/usericons/aaronh.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','',-1,''
3,'abcdef','e80b5017','Ashish Rungta','ASHTME@YAHOO.COM',1,0,'','','','','','',0 ,'http://www.rootkit.com/usericons/abcdef.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','',-1,''
4,'abel','cd779e8a','Adi A','adia@opsynet.com',1,0,'','','','','','',0,'htt p://www.rootkit.com/usericons/abel.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','',-1,''
5,'abhi0070','a6a7c0ce','kbcack','unknownbuddy@yah oo.com',1,0,'','','','','','',0,'http://www.rootkit.com/usericons/abhi0070.jpg','',0,'',0,0,0,0,0,0,0,'','0','','',' ',-1,''
6,'abm','e50624ea','alex murphy','abm@mitre.org',1,0,'','','','','','',0,'h ttp://www.rootkit.com/usericons/abm.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','',-1,''
7,'abraxas','7f9cc44f','Alex Mellor','i_love_g0ats@hotmail.com',1,0,'','','','' ,'','',0,'http://www.rootkit.com/usericons/abraxas.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','' ,-1,''
8,'acc_chen','c4025c6f','Jun.chen','johnychen@nete ase.com',1,0,'','','','','','',0,'http://www.rootkit.com/usericons/acc_chen.jpg','',0,'',0,0,0,0,0,0,0,'','0','','',' ',-1,''
9,'access55','e9f5bda6','WHY YOU ASK','access55@manx.net',1,0,'','','','','','',0,' http://www.rootkit.com/usericons/access55.jpg','',0,'',0,0,0,0,0,0,0,'','0','','',' ',-1,''
10,'accobra','66f363a6','Rob','rlstephe@yahoo.com' ,1,0,'','','','','','',0,'http://www.rootkit.com/usericons/accobra.jpg','',0,'',0,0,0,0,0,0,0,'','0','','','' ,-1,''

I can see a faint smile on your face now :) Don't worry you will be having a strong urge to show off at the end of this tutorial. Just a few steps more.

Ok. Before we extract out the user and password. You need to know what is cut command. The cut command works on field seperators.

To understand how cut works before we contd, lets take an example. Do

root@bt:~/blog# cat /etc/passwd


All the fields in this file are separated by ":". Userid is the first one, 2nd password, 3rd uid and so on.

What if you want to extract only user and uid from this file?
This is where cut comes into play. Cut command works on field separators, for each line. By default field separator is space but we can specify any character as field separator.

root@bt:~/blog# cut -d":" -f1,3 /etc/passwd


04-18-2011, 12:19 AM
Nice.. Now we are gonna cut each line by comma. Why? Cause each field is seperated by commas. We want user and hash which is 2nd and 3rd field if the field separator is ","

So we do

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | head

I have put cut command before head. We are simply saying to cut as
field seperator , ( -d "," )
and select only column 2 and 3 ( -f2,3)
and pass it to head so that we get first ten lines only (its easy to see output and be sure that the command is working)

It should give


Now we just need to remove those single quotes.
You know what to do :)

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | sed s/\'//g | head

i.e. replace all (/g)
single quotes ( /\' as Single Quotes is a special character)
with Nothing ( // )

Last thing, and its Game Over, replace "," with ":"

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | sed s/\'//g | sed s/,/:/g | head

:) Done!

You can now remove the head command to check if its working for all files.

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | sed s/\'//g | sed s/,/:/g

Wait there's more.
What if you want hash first and user name later? i.e.


With sed you can do, but its too complicated.
Meet Sed's Elder brother Awk. Awk is more powerful (and more complicated). Awk is used mainly for data extraction and reporting tool.

So do,

root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | sed s/\'//g | sed s/,/:/g | awk -F':' '{print $2 ":" $1}' | head

What we did?
-F is like -d in cut command, i.e. specifying field separator.
Each field is put in $1 $2 $3 etc, where $1 is username here and $2 password.

By '{print $2 ":" $1}'
we are asking it to print it in reverse order. Don't give Comma in between $2 and $1 as it will replace the field separator with space.

Output ->


Now just redirect the final output (without the head command) to a file.


root@bt:~/blog# grep "INSERT INTO \`people\`" database | sed s/\),\(/\\n/g | cut -d "," -f2,3 | sed s/\'//g | sed s/,/:/g | awk -F':' '{print $2 ":" $1}' > output.txt

With > output.txt we are redirecting the final output of awk command into the file i.e. output.txt rather than on the terminal.

Now just for curiosity, to count the number of users we got we do
root@bt:~/blog# wc output.txt

81450 82345 3145329 output.
wc command prints
1) newline
2) word
3) byte
counts for each file

So we have 81450 users in this final output.txt Pheww Thats a lot!



Super Crisp Command Mode

Above command has Total 6 Commands (excluding > output.txt) i.e.

grep "INSERT INTO \`people\`" database
| sed s/\),\(/\\n/g
| cut -d "," -f2,3
| sed s/\'//g
| sed s/,/:/g
| awk -F':' '{print $2 ":" $1}'

Do you think we can crisp it down to only 2 Commands!!
No, I am not crazy. It is possible.
Why do I want to do it? Just cause we can ;)

Wanna See ? :)
Here's the command

root@bt:~/blog# sed -n /"INSERT INTO \`people\`"/s/\),\(/\\n/pg database | gawk -F\' '{print $4 ":" $2}' | head

[ Head command is for you to see the short version of the output, not required though ]

What is going on? Thats for you to figure it out.

Have Fun. :D

Here are the references that was made during the writing of this tutorial.
1. Sed (http://en.wikipedia.org/wiki/Sed)
2. Awk (http://en.wikipedia.org/wiki/AWK)
3. Basic Linux (http://code.google.com/edu/tools101/linux/basics.html)
4. Grep (http://code.google.com/edu/tools101/linux/grep.html)
5. Anonymous (http://en.wikipedia.org/wiki/Anonymous_%28group%29)
6. Sed & Awk Book (http://www.amazon.com/sed-awk-2nd-Dale-Dougherty/dp/1565922255)
7. HackingDojo (http://www.hackingdojo.com)

04-18-2011, 07:59 AM
The pain you took to document this is really appreciable "mayjune". I liked the way you took a live example and showed the power of bash scripting.
Good work! and keep it up :)

For bash scripting I recommend:
"Bash Guide for Beginners" by Machtelt Garrels. The free pdf is available online

For awk programming, there is one very nice small document by "Michael Stutz". Actually I did not find the coverage of awk up to the mark in "Bash Guide for Beginners", so for me the document by "Michael Stutz" is quite good.


04-18-2011, 08:18 AM

04-18-2011, 01:12 PM
:) Thanks guys. It was a homework in my online hacking class - hackingdojo.
I felt whatever i learned while doing that homework should be available to others since people learn hacking tools, but they ignore these simple tools which will enhance your information gathering phase to a great extent.

Thanks a lot for your encouragement.
Will keep more tutorials like this coming.

Following text is not correct. Read bonds post for correct information.

PS - @bond Sed is not a Scripting language, but a Stream Editor. I haven't use any scripting language in it. All you require for this tutorial is Sed grep Cut and Optionally Awk Command to perform extraction.

Scripting can be used to make a generalized script for future projects but not mandatory

Check this link (http://en.wikipedia.org/wiki/Sed)

04-18-2011, 01:24 PM
Awesome !!! Thanks for such an excellent tutorial. Enjoying it :)

04-18-2011, 03:37 PM
PS - @bond Sed is not a Scripting language, but a Stream Editor. I haven't use any scripting language in it. All you require for this tutorial is Sed grep Cut and Optionally Awk Command to perform extraction.

Scripting can be used to make a generalized script for future projects but not mandatory

Check this link (http://en.wikipedia.org/wiki/Sed)

Dear mayjune, instead of finding any link for you for the definition of bash/shell scripting, I would put in simple words:
Shell scripting in *nix or batch programming in windows are nothing but putting together various commands or small utilities together to perform a job....exactly the way you have done in your post. Now whether you perform your current task on console itself, or put in a script, the result would be the same.

Thanks for letting me know in "BOLD" that Sed stands for Stream Editor. Mind it when you are suppose to use bold and when not. Btw if you love being going by definition, then Awk is not a command, as been stated by you, but a programming language in itself.

04-18-2011, 04:02 PM
I used to use Good Amount of Flame on such arrogance...
Right now flame thrower not in action :D

purane juniors bechare kabhi na kabhi flame thrower ke shikar the :)

04-18-2011, 09:13 PM
My bad :o :p
No wonder that I am yet a newbie and you are the bond! :)
But its good that I made this mistake now and got the embarrassment here within the community rather than in the field. Thanks a lot for clarifying. I had a misconception about what a shell script is and what its not.

Will be careful next time before saying anything. Sorry, I didn't mean to offend you in anyway when i made it bold.

I am yet a newbie, if there are any mistakes from my side, I would be happy to learn from it.

04-19-2011, 07:19 AM
And that's appreciable mayjune.
The bottom line is, Garage is a family where sometimes we have differences in thoughts and we do argue over that but there is no place for arrogance or ego. And that helps keeping the environment clean and makes it a nice place to stay.


05-24-2011, 02:14 AM
For a serious shell scripting/Linux System/Network Administrators, I recommend to learn "Expect".

Expect is actually a program controller. Especially for inter-active programs. I mean, where some "X" program needs interaction with user/some other program, etc.,

For Example:
I have post a reply about MD5 trend, which you could find --> [ Here (http://www.garage4hackers.com/showthread.php?931-Help-on-MD5-hash...&p=3844&viewfull=1#post3844) ]
However, I used ruby for the example out there out for a quick, reply.
Let us use the bash script to encrypt [ which is actually not possible with the tool we use authpasswd(authpasswd tool is used to encrypt a string with different encryptions like, MD5, SHA1 etc.,) ] below is how we use authpasswd to encrypt a word.


As you could see, when we use "authpasswd md5raw", it prompt us for a word to type so that it could encrypt in a md5raw, it prompts again, so that we have to type the same word again.

So, if you use a shell script, it is not possible for you to pass that value/word to the script.

If you use, say

authpasswd md5raw
somecode what ever

once the script runs, it just stops after executing the "authpasswd md5raw" and wait for a user interaction, and will not continue until and unless user do something. For our brute forcing, we don't think we could type all value every time and all do we? So basically its the limit of bash scripting. Here comes the beauty of "expect".

Here is a expect code:


set __myValue cool

spawn authpasswd md5raw
expect "password:"
send "$__myValue\r"
expect "password:"
send "$__myValue\r"

expect eof

Below is the image, you just need to run the script and our value( in our case "cool", is passed to the program(authpasswd) as if a human is typing a word twice, total automation.


Another example:
We know we can get header information of a page let say "insecure.org", we could use "Netcat"( nc ) to connect on port 80, like:

Linux~$nc -v insecure.org 80

and our netcat connects to server to port 80 and waits for our interaction, Can you pass the value to get headers(i.e., "HEAD / HTTP/1.0" and hitting enter key twice) with bash script? No. you cannot. If you use bash script after connecting to "insecure.org" with netcat it just waits for user-interaction, like for typing the HEAD / HTTP/1.0" and hitting enter key twice.
With expect, you you could over come this by,


set __myValue "HEAD / HTTP/1.0\r\r"

spawn nc -v insecure.org 80
expect "open"
send "$__myValue\r\r"

expect eof


Expect help Network/System Administrator of Linux Server very very efficiently. And it very very easy to learn.


05-24-2011, 07:08 AM
That's quite interesting and useful Hackuin and surely covering up for bash. TFS

BTW for the following I've a work around:

and our netcat connects to server to port 80 and waits for our interaction, Can you pass the value to get headers(i.e., "HEAD / HTTP/1.0" and hitting enter key twice) with bash script? No. you cannot.

To automate the whole procedure of finding banner or checking other HTTP methods like TRACE / TRACK etc.:

For Linux platform (Tested on BackTrack 3):

You need two text files; iplist.txt and header.txt.
Iplist.txt would contain the list of IP address, one IP per line.
Header.txt would contain your HTTP commands.

E.g. content of iplist.txt file would be like:

Content of header.txt file:
E.g. if you want to do banner grabbing then the content of your header.txt file would be:
HEAD / HTTP/1.0 (press two returns)

If you want to do trace, the content would be:
TRACE / HTTP/1.0 (press single return)
HOST:anything (press single return)
X-HEADER:anything (press return twice)

Similarly for OPTIONS and TRACK.

Command to be executed:

# for f in `cat iplist.txt`;do nc –q 2 –w 2 –v $f 80 < header.txt; done;

It will automate the whole procedure.

: -q 2 : To terminate the connection if opposite side listener is infact netcat rather than a web server.
: -w 2: To timeout the connection (2 seconds, change it according to the requirement)

Now instead of typing a long command on console you can make a shell script for it. Make a blank shell script and name it netcat.sh. Edit it and put the following lines into it:

for f in `cat iplist.txt`;do
nc –q 2 –w 2 –v $f 80 < header.txt

Execute the script running following command

# ./netcat.sh

Now depending on the content of “header.txt” file, output will be displayed on the console.

For Windows Platform (Tested on XP):

Do exactly the same i.e. make iplist.txt and header.txt file. Only command will differ:

C:\> for /f %1 in (iplist.txt) do nc –vv –w2 %1 80 < header.txt

Note: -q option is not present in windows version of netcat.

Obviously your iplist.txt and header.txt needs to reside in the current directory where you are executing the command, or specify absolute/relative path for these files.


05-24-2011, 10:07 AM
Information is overflowing here..... awesome shares by everyone.... special thanks to mayjune for his simple yet cover it all explanation..

I would love to tryout "expect". seems that it will be very helpful for my future work.

Thanks and regards to all

P.S. : What can work as a fuel for the flame thrower neo....

05-25-2011, 02:56 PM
Nitrogen+Petrol works as fuel for flamethrower (For a real flamethrower that is LoL)

Normally people beging with Fullishness/stubbornness/Script Kiddies/Know it all Attitude fuels the flamethrower.

Flamethrower is on hold since, we are now senior members with responsibility. :-D
But sometime it gets out of control you know what I mean :)

05-25-2011, 05:04 PM
Yep, there always a work around, file descriptors are bit of handy like,

my post was pointing towards people who are learning bash script, for system administrations. :]

05-26-2011, 07:35 AM
Yep, there always a work around, file descriptors are bit of handy like,

my post was pointing towards people who are learning bash script, for system administrations. :]
And I fall under the first category :)
Thanks for the file descriptor way of doing it.