Semantic Textual Similarity

Thanks to the proliferation of web search engines and their increased efficiency, it has been possible to develop other types of similarity measures based in this type of application.

The main advantage of using search engines is that almost any possible word or meaning can be indexed, so it is not necessary to rely on limited data sources or vocabularies, where the descriptions might be limited or even non-existent.

One of the first works based on Web search engines is the one developed
by Strube. It performs a basic measure such as taking the results obtained when performing a search (hits, page counts) from a search engine and applying the so-called Jaccard coefficient.

However, these similarity measures are able to deal with the overall relationship of two words, plus than their semantic similarity. In addition, some works note certain problems in this type of measurement, and that is that the result count ignores the position of a word on a page; that is, even if two words are on the same page, they may not be related.

この記事が気に入ったらサポートをしてみませんか?