So lately I’ve been reading up on about push based webservers (i.e. APE, Node.js) and I was reading an article on installing Node.js. While reading the article I was curious if 0.4.4 was in fact the latest version (which it was) and in the process of checking the download page, I thought, “gee, wouldn’t it be nice to have these things automatically updated”? So that got me onto thinking about how this could be done. So after 2 minutes and a little bit of command line magic with sed, the problem was solved. Now I know this can be done very easily in Perl or with PHP’s PCRE however, for sake of brevity, simplicity and overall compatibility, I’m going to show you how to do this with a crontab with curl, grep, sed and sort (which with the exception of maybe curl, are all common tools on any Linux/Unix based system).
First off, you need to get your output. Which I used curl –silent to get. To seg off for a second, there’s a minor chance that curl *may* not be installed on your System so if you’re using a RPM based system such as RedHat/CentOS/Fedora you can simply run “yum install curl” or “sudo yum install curl”. For a dpackage based system such as Debian/Ubuntu/Mint you can use “apt-get install curl”. If you’re using Arch Linux or Gentoo I’ll just assume you know what you’re doing already (but for those that don’t pacman and emerge are their respective managing systems). Last but not least if your using OSX or some other flavor of BSD based system you can use the ports (which will need to be installed separately on a Mac–which could also use homebrew or fink) system and enter “port install curl”. Anyhow… to get the output I used curl –silent http://nodejs.org/dist/ . This gives me a nice dump of the page contents (including the html) which looks a little bit (minus some truncation on my part) like the following:
Now that you can see the output you’re working with it’s time to use a shell data pipe to grep to grab out the content we want. So the command is going to now reflect the following:
curl --silent http://nodejs.org/dist/ | grep node-v
This will give you only the lines you’ll need to work with for determining the latest version. Now on to the magic of using sed, sortcurl –silent http://nodejs.org/dist/ | grep node-v | sort -nk1 | tail -1 and tail. Now if you just need to grab the text you will definitely need to use sed, otherwise you can get away with just a numerical sort and a shell variable to store the page URL. Assuming you’re just grabbing the version and require sed, you can use the following:
curl --silent http://nodejs.org/dist/ | grep node-v | sed 's@.*">\(node-v.*\)</a\>.*@\1@g' | sort -nk1 | tail -1
The use of sed will effectively discard all the HTML tags and will only spit out the file names. So once you use a numerical sort (which is denoted by the “-n” in the sort command’s arguments), it will sort all the output by each characters’ alphanumerical integer reference lowest to greatest (you could also use the reverse flag and pipe to head too). The last bit of the piped command uses the tail binary to with the argument -1 to denote that you want to only grab the last line (it’s important to recognize that the minus is not actually a negative sign but rather a switch to denote an argument for the tail executable) of the stream.
Now that I’ve shown you the sed way to grab the line. Lets get a bit more crazy and actually output an HTML formatted line with the magic of some shell variables (this will also make our little one-liner a lot more fun and modular):
SITE="http://nodejs.org/dist/" # defining site URL
LATESTLINK=$(curl --silent $SITE | grep node-v | sort -nk1 | tail -1) # Latest version line including HTML output
OUTPUT=$(echo -n $LATESTLINK | sed 's_href\="_&'$SITE'_') # HTML output reflecting site URL
FINALLINK=$(echo -n $OUTPUT | sed 's#^\(<a.*</a>\).*#\1#') # using sed to strip out excess none-link material
And there you have it, the FINALLINK variable will now reflect the HTML anchor for the link to the latest version of Node.js (provided the formatting of the HTML page remains the same).
To incorporate this into a website you can merely make it a shell script. So fire up vi or nano and prepend:
..to the top line and then append the following line to the end:
Once you have done that and saved the file (lets call it latestnode_ver.sh and save it in /home/doug/bin/) you then must make it executable so run the following:
chmod +x ~/bin/latestnode_ver.sh
Now you can get the latest version using that command (assuming ~/bin is within your default PATH variable).
You can then create a cron job to modify your HTML file regularly (using crontab -e).