Tag Archives: dataset

Tinkering with Apache Hadoop – Map Reduce Framework

I have used a Map Reduce based system at my present employer (Bank of America – Merrill Lynch) to process (read “crunch”) extremely large datasets in matter of seconds. Sometimes I used those to price bonds in real-time otherwise it was used for data processing/reporting purposes. It is an in-house product, known as Hugs framework […]

2  

Computing similarities between datasets

Similarity measures is used all over the web and is pretty well known by anyone who has performed Internet searches using a search engine. Assuming the entire internet comprising of all the websites as one single database which could be divided into two classes – those which can answer your query and other which cannot. […]

0