Monday, September 10, 2007

URLView - URL and email extractor

Although we can make use of combining linux commands like cut, awk and grep to parse URL and email address string from a file, here's an entry to parse URL link and email address from a file. A lot of this email and URL extractor program is available in windows world, most of them are commercial and not free.

Spider crawling of URLs and email address from a page or text file in linux can be done using URLview. URLView linux command comes as part of mutt linux package, which handles email operations from command line.

urlview is a screen oriented program for extracting URLs from text files and displaying a menu from which you may launch a command to view a specific item.


URLView extracts URLs string and email address link from a file interactively presented in a grid type and numbered view. Selecting one from the list, launches your browser to browse the particular selected site from URLView.

URLView USAGE:
--------------

Here's a few usage of using URL and email extractor as shown below

# urlview pagefile.txt
# urlview pagefile.html

Besides from extracting URL strings from file or web page, again linux I/O redirection creates an additional way to use the command. URLview provides a way to fetch and extract URL links and email address from a site too. This is possible using linux I/O linux command redirections like shown below:

# wget -c "http://www.domain.com/contacts.html"
# urlview contacts.html

This command, when executed from a loop from a shell script and feeding the script with batches of URLs from a text file would give you the concept of email and URL extractor softwares that are commercially and widely around the windows world.

These are the URL and email regular expressions that urlview linux command is designed to fetched for:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a. (((https?|ftp|gopher)://
b. (mailto|file|news):)[^’ <>"]+|(www|web|w3).[-a-z0-9.]+)[^’ .,;<>":]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This blog entry is not here for creating an army of email and URL extractor group that spams and email-bombs the web.


HTH

0 comments:

Sign up for PayPal and start accepting credit card payments instantly.
ILoveTux - howtos and news | About | Contact | TOS | Policy