Hyperlink to URL && Extracting all the links from a site
This is a sample experiment i tried to convert , rather extract an URI from a hyperlink.
Consider an example :
href = "< href="http://www.w3schools.com/"target="_blank">VisitW3Schools!<"
to to get the URL , we need only that part which starts with http and is between "".
So , i can made a simple R.E to strip the URL from any given hyperlink. The R.E is designed for `sed` , hence can be used with stdout or any file.
sed -nr 's/(.*)href="?([^ ">]*).*/\2\n\1/; T; P; D;'
As evident , the href="?([^ ">]*).* part says , get me the thing after href= and which is not [^] space , " or > .
So, to get the URL , we can just echo the contents of the variable href and sed the o/p to the URL variable.
echo $href | sed -nr 's/(.*)href="?([^ ">]*).*/\2\n\1/; T; P; D;'
An echo on URL , echo $URL , would gives us the URL part of the href , hence converting href to URL.
Digging more , if its a file with loads of href's , you can just cat the file and use the R.E as mentioned , that is it works well with multiple href's.
You also dump the URl to and *.html using lynx or just curl and cat the *.html and sed the output stream to get your URL's , that can be again rededicated to a file .
If you just want to collect all the links is a site :
lynx -dump "http://www.h3manth.com" | grep -o "http:.*" > links
Recent blog posts
- watir-webdriver web inspector
- gem list to gemfile
- Packing ruby2.0 on debian.
- Made it into The Guinness Book!
- to_h in ruby 2.0
- Filter elements by pattern jQuery.
- Better HTML password fields for mobile ?
- Grayscale image when user offline
- nth-child CSS pseudo-class Christmas colors
- EventEmitter in nodejs