PDA

View Full Version : Link Extractor in Python



D4rk357
11-09-2010, 02:35 PM
This is my first python code .Feedback on improving it would be great.

#A small link extractor program .
import os,sys,urllib,re,httplib
if len(sys.argv) != 2:
print "\n|-----------------------------------------------------------------|"
print "| lastman100[@]gmail[dot]com |"
print "| 10/2010 Link Extractor v0.1 |"
print "| Visit : www.garage4hackers.com |"
print "|-----------------------------------------------------------------|\n"


ab=raw_input("enter URL to extract the link\n")
ht=re.compile("http://")
if ht.search(ab):
sa=urllib.urlopen(ab)
else:
sa=urllib.urlopen('http://'+ab)


st=sa.read()

link=re.compile('http\S\W+'+'\S+')

y =link.finditer(st)

for i in y:
print i.group()

Punter
11-09-2010, 03:00 PM
Nice Start bro also we are planning to implement SVN were we can host all the tools there and keep track on changes and Updates

D4rk357
11-09-2010, 03:34 PM
thanks for the encouragement bro . I am planning to improve it further and add new features as my knowledge of this awesome language improves :)

prashant_uniyal
11-09-2010, 06:38 PM
Awesome start bro !!!! Good to see..I m using this tool :)

AnArKI
11-09-2010, 07:02 PM
Impressive and am encouraged to see such stuff.....I believe we have enuf tools to populate our new tool sections.

D4rk357
11-10-2010, 12:10 AM
thanks for the encouragement prashant and Anarki bro . It means a lot to me :)

prashant_uniyal
11-10-2010, 09:50 AM
Working great ! :)

http://img573.imageshack.us/img573/9154/lnk1.jpg

d4rkd4wn
11-10-2010, 01:09 PM
Awesome bro !!!!!!!!!!!!!

the_empty
11-10-2010, 06:30 PM
yeah... i see now that Darkest is really encourage and enlighten because of one night stay with FB1. Hope to see you rocking this stream bro.. keep it up.

And we shall seriously start keeping proper track of GARAGE developments.

neo
11-12-2010, 12:34 PM
Well the code is fine, just as programmer some comments
NOTE: These comments are for good programming if you want to create just dirty script then do not read ahead. :D

use the re / re.compile as less as possible since as your code lines will increase it will create optimization problems.
So the first IF

ht=re.compile("http://")
if ht.search(ab):

Could be replaced with

if ab.startswith("http://")

Further more the Regular Expression is not that perfect ...
give test case as

st = "www.com www..com www,yahoo,com mail?yahoo?com"

it will recognise all these as proper urls.

Regular Expression needs to be improved.

D4rk357
11-12-2010, 04:27 PM
Thanks for the feeds neo .

i will try to incorporate these ideas in my next code .

fb1h2s
11-13-2010, 01:02 AM
@the-empty its all his work, he been working on python and I never helped him in any way its his work :) Good job darkest but I wann see the code full , I remember how bond used to scold me because how impatient I was, "which I am still :P". , now u are all the same here, just put up the rest of your ideas and make it huge :D