Synonyms

Semantic Similarity Between Any Pair Of Words

A script that calculates the semantic similarity between any two pair of words.

First, a semantic descriptor vector is created for each word and stored as a dictionary. This is done by taking in a text and creating a dictionary that shows the amount of times different words occur in the same sentence as the chosen word

Then, the similairity is calculated by determining the cosine similarity between two vectors. This returns a float with the highest float being the highest similairty.

The program can also determine how accurate it is if the correct synonym is already known by the user. Given a text file where each line is in the format of word, correct answer, and choices, the program can return the percentage of correct answers that was chosen using the semantic descriptor algorithm.

The current test case takes in two books, Swann's Way and War and Peace as text files. A test file with each line containing a word, the correct synonym, and a list of choices is inputted and the program returns the percentage of cases where it guessed the correct synonym.