When Written: Oct 2011
Last column I wrote about the problems of building scalable web based solutions. The need for software solutions that will handle large amounts of data and transactions become more vital as web applications become more complex and popular. The most popular web server out on the internet is without doubt Apache, which whilst it is a very powerful and capable web server there are always those amongst us who need ‘more power’ even if we don’t work on ‘Top Gear’.
There is an Open source framework for Apache called Hadoop (http://hadoop.apache.org/) which does just this. There are several modules which are built around the framework, the main module perhaps being the Hadoop Distributed File System . The curious name is the name of the initial inventor’s son’s toy elephant! I guess you have to call it something and I can sympathise as someone who is currently struggling to think of names for two new Burmese kittens we are picking up next week to replace my sadly missed Burmese cat, Emale.
Hadoop is written in Java. Companies such as Yahoo and Facebook use the Hadoop file system with Facebook claiming 30 PB of storage and Yahoo running over a 10,000 core Linux cluster; these are huge numbers by anyone’s standards. The really clever thing about Hadoop is that rather than relying on advanced hardware to handle redundancy in the event of a failure, the Hadoop framework is designed to detect and handle such failures at the application layer and so will run on a simple cluster of low-cost servers safe in the knowledge that in any hardware failure will have no effect on the web service.
Article by: Mark Newton
Published in: Mark Newton