Computing similarities between datasets

Similarity measures is used all over the web and is pretty well known by anyone who has performed Internet searches using a search engine. Assuming the entire internet comprising of all the websites as one single database which could be divided into two classes – those which can answer your query and other which cannot. The one which answers your search query will bear more similar results than the one which does not. Similarity in this case depends on the query you come up with which is in the form of keywords.

There are many similarity and dissimilarity/distance measures which we could use to find alikeness among the datasets.

Src Code: Similarity.py

leave your comment