When Written: Dec 2007
Security is never far from the headlines these days, whether it’s the latest chunk of data lost by a government department or some story of a group of hackers attempting to gain access to bank accounts. All these stories are of concern to anyone who develops an application that is either used by others or on a machine that is connected to the internet, so that’s …err… all your web applications. Whilst not directly a security risk, machine generated entries to web forms can be used to find machines that are vulnerable to certain exploits to say nothing of the annoyance of removing these postings from the web server’s database. For sometime there has been a technology that has given us a reasonable way of preventing such inputs. CAPTCHA or Completely Automated Public Turing test ( Turing being the British scientist who first proposed a test to decide whether a computer had intelligence ) which uses a dynamically generated image showing a code made of distorted characters that the user has to type in to confirm that they are a real person. The reason for using an image of distorted characters is because currently a machine can not read these images. I say ‘currently’ as there is a game going round that asks users to enter a CAPTCHA code to make an image of a woman appear with less clothes on as each CAPTCHA code is entered. The reason for this apparently harmless ‘game’ is to build up a database of CAPTCHA codes with their ASCII equivalent so that in the future machines will be able to read these. The use of the vast number of users on the internet to do work that is currently impossible for computers to achieve is growing; the idea is to encourage real people to enter information so that the computers can ‘learn’ from this information. If at first this sounds a little too like science fiction for you then just go and take a look at our old friend Amazon and one of their current beta projects.

MTurk – not much good at chess – yet
This project is called ‘The Amazon Mechanical Turk’ or ‘ Amazon MTurk’ ( http://www.amazon.com/Mechanical-Turk-AWS-home-page/b/ref=sc_fe_l_2/002-4312207-3892809?ie=UTF8&node=15879911&no=3435361&me=A36L942TSJ2AJA) . The name comes from a ‘mechanical’ device which was built in 1769 and consisted of a model of the top half of a Turkish man which not only played chess against any opponent but beat them! The doors of the box underneath would be opened to reveal series of cogs and gears. In reality it turned out that there was a person concealed inside who did the chess playing. Never less it fooled a lot of people at the time, including Napoleon.
The idea behind MTurk is not to ‘con’ people but to combine the use of human and mechanical ( in this case computer ) efforts to achieve something that would not be possible with either one used singly. With any new idea like this, its use is up to the us as developers to decide but, for example, a web site could use this system to automatically tag images with keywords or do complicated ‘fuzzy’ data de-duplication. Something that computers are not currently so good at but that humans can achieve with relative ease. The idea is that, people are ‘commissioned’ to do the work based on a set fee per ‘hit’ so, in the case of the first idea, they would assign a keyword to an image and would be paid for doing so. I think that this use of the web is an interesting development and is a departure from the endless community web sites we read about.
The next step is for the computer system to learn from the human’s responses. Imagine a computer with the ability to learn, connected to the internet as its memory with the backup support of thousands of humans supplying the information it does not have? Now that is sci-fi, for now!
Article by: Mark Newton
Published in: Mark Newton