An Empirical Study on Word Sense Disambiguation for Adult Content Filtering

Abstract

It is obvious that Internet can act as a powerful source of information. However, as happens with other media, each type of information is targeted to a different type of public. Specifically, adult content should not be accessible for children. In this context, several approaches for content filtering have been proposed both in the industry and the academia. Some of these approaches use the text content of a webpage to model a classic bag-of-word model to categorise them and filter the inappropriate content. These methods, to the best of our knowledge, have no semantic information at all and, therefore, they may be surpassed using different attacks that exploit the well-known ambiguity of natural language. Given this background, we present the first semantics-aware adult filtering approach that models webpages, applying a previous wordsense-disambiguation step in order to face the ambiguity. We show that this approach can improve the filtering results of the classic statistical models. abstract environment.

Publication
International Joint Conference SOCO’14-CISIS’14-ICEUTE’14

Related