Results 1 to 4 of 4

Thread: Using Beautiful Soup Library for Parsing HTML in python Share/Save - My123World.Com!

  1. #1
    Garage Member D4rk357's Avatar
    Join Date
    Jul 2010
    Location
    localhost@mumbai
    Posts
    153
    Blog Entries
    1

    Using Beautiful Soup Library for Parsing HTML in python

    This is a code which downloads a html page and then parses a table from it to display output . Without Beautiful Soup it would require a lot of work and a lot of exception handling but Beautiful Soup makes the work lot

    Code:
    #!/usr/bin/python
    
    # D4rk-Parser-- A small code for parsing HTML tables in python using Beautiful Soup 
    # Coded By D4rk357[2013]
    
    
    import os , sys, urllib2 ,re
    from bs4 import BeautifulSoup
    response = urllib2.urlopen('https://urlquery.net/report.php?id=2602506')
    html1 = response.read()
    
    soup = BeautifulSoup(html1)
    
    table = soup.find(lambda tag: tag.name=='table' ) 
    rows = table.findAll('tr')
    
    for x in rows:
     print '|'.join(x.stripped_strings) # important thing to note is the usage of stripped_strings function . This function is important in cases where there are some other HTML tags inside the table like <b> etc . In that case normal strip functions won't function properly . It is true in this particular case as well
    Output
    Attached Images Attached Images  
    Last edited by D4rk357; 05-23-2013 at 10:47 AM.
    Spirit was turned 2 ashes ,soul endured so much pain..
    now the darker time evanescence ,the fallen shall rise again.

  2. #2
    Garage Member
    Join Date
    Aug 2012
    Location
    India
    Posts
    97
    Blog Entries
    1
    Hi D4rk357,

    Nice work.
    Though I didn't get the chance to actually use it 'coz it was asking for external parser in my case. Will install lxml/ html5lib later.
    You might want to check the code I wrote sometime ago to un-shorten the shortened URLs using BeautifulSoup -> RahulB's Blog | InfoSec n' All: [Python] URL Un-shorten-er .

    Cheers.
    Anyone who stops learning is old, whether at twenty or eighty. Anyone who
    keeps learning stays young. The greatest thing in life is to keep your mind young.
    - Henry Ford

  3. #3
    thanks, whether it can be used to create a word list?

  4. #4
    Garage Member D4rk357's Avatar
    Join Date
    Jul 2010
    Location
    localhost@mumbai
    Posts
    153
    Blog Entries
    1
    Quote Originally Posted by jimmy View Post
    thanks, whether it can be used to create a word list?
    What kind of Word list ?? This is a HTML parsing library .
    Spirit was turned 2 ashes ,soul endured so much pain..
    now the darker time evanescence ,the fallen shall rise again.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •