Although we can make use of combining linux commands like cut, awk and grep to parse URL and email address string from a file, here's an entry to parse URL link and email address from a file. A lot of this email and URL extractor program is available in windows world, most of them are commercial and not free.
Spider crawling of URLs and email address from a page or text file in linux can be done using URLview. URLView linux command comes as part of mutt linux package, which handles email operations from command line.
urlview is a screen oriented program for extracting URLs from text files and displaying a menu from which you may launch a command to view a specific item.
URLView extracts URLs string and email address link from a file interactively presented in a grid type and numbered view. Selecting one from the list, launches your browser to browse the particular selected site from URLView.
URLView USAGE:
--------------
Here's a few usage of using URL and email extractor as shown below
# urlview pagefile.txt
# urlview pagefile.html
Besides from extracting URL strings from file or web page, again linux I/O redirection creates an additional way to use the command. URLview provides a way to fetch and extract URL links and email address from a site too. This is possible using linux I/O linux command redirections like shown below:
# wget -c "http://www.domain.com/contacts.html"
# urlview contacts.html
This command, when executed from a loop from a shell script and feeding the script with batches of URLs from a text file would give you the concept of email and URL extractor softwares that are commercially and widely around the windows world.
These are the URL and email regular expressions that urlview linux command is designed to fetched for:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
a. (((https?|ftp|gopher)://
b. (mailto|file|news):)[^’ <>"]+|(www|web|w3).[-a-z0-9.]+)[^’ .,;<>":]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This blog entry is not here for creating an army of email and URL extractor group that spams and email-bombs the web.
HTH
Subscription
Categories
- HowTos (612)
- Linux Devices (40)
- Linux Diggs (620)
- Linux News (1541)
- Linux Videos (24)
Recent Posts
Blog Archive
-
▼
2007
(340)
-
▼
September
(58)
- Dear Linux SysAd Blog Readers
- Proactive monitoring from linux terminal
- deleting new lines and return line from text file
- Google chat setup using PSI howto
- PSI messenger - a truly promising open messaging a...
- Google chat setup using GAIM Pidgin howto
- Linux command line shell variables defined
- ls - displaying directory contents in many ways
- Control of alternative linux executables
- Searching using whereis linux command
- Witchy which linux command
- NeroLinux - diehard Nero burning software
- Google Sky - Explore and Rediscover the Sky
- Celestia - 3D Earth and Sky visualization
- Earth3D - real-time 3D Earth visualization
- GcStar - managing personal collection items
- DStat - resource statistics linux tool
- Bandwidth Monitor-NG - terminal-based interface ba...
- KNemo - KDE network interface monitoring tool
- EtherApe - graphical network activity monitoring tool
- Beauty of Math using Linux
- print leading/trailing lines before/after a matchi...
- Nagios Monitoring - install and generic setup howto
- ChRT- change real-time attribute process scheduling
- squeezed out multiple commented lines
- UNIX to DOS text file format converter
- totally squeezed out multiple blank lines
- squeezed multiple blank lines into single line
- fmt - simple optimal text formatter
- Linux backup powered by RDiff-Backup
- Linux Ping command explained
- read and display text file from terminal
- URLView - URL and email extractor
- TFTP server - setup and install howto
- NMap - Linux port scanning
- removing garbage characters from screen terminal
- invert string match using grep
- RDesktop - remote desktop howto
- Graveman on Linux - burn baby burn burn
- HTOP - interactive process viewer alternative
- Caching DNS server install howto
- BitTorrent - downloading large files made easy
- PHPAlbum - web photo album install howto
- MRTG graph creation with Cisco routers
- Tree view of directories and file listings from co...
- MAC address packet filtering using IPTables
- GTK-based GNOME Linux Tools
- Linux backups powered by RSnapShot
- MRTG tutorial, install and howtos
- FindSMB - view shared folders from network
- Squid - upgrade and install howto
- prompt and press a key between script lines
- Devede - DVD/VCD video authoring and creation tool
- display file and file system status
- determine file type
- GNOME GUI task scheduler install howto
- Linux backups powered by Rsync
- KPackage - GUI package administration and manageme...
-
▼
September
(58)
Monday, September 10, 2007
URLView - URL and email extractor
Subscribe to:
Post Comments (Atom)
ILoveTux - howtos and news | About | Contact | TOS | Policy
0 comments:
Post a Comment