Date of Award
5-2013
Document Type
Thesis
Degree Name
Master of Science (MS)
College/School
College of Science and Mathematics
Department/Program
Computer Science
Thesis Sponsor/Dissertation Chair/Project Chair
Aparna Varde
Committee Member
Eileen M. Fitzpatrick
Committee Member
Anna Feldman
Abstract
Collocations are words in English that occur together frequently. Non-native speakers of English tend to confuse certain terms with other similar terms. This causes one of the terms to be substituted with a term that is not commonly used with the other term. “Powerful tea” is an example of such an odd collocation. In this scenario the more commonly used term is “strong tea”.
This paper proposes an approach called CollOrder to detect such odd collocations in written English. CollOrder also provides suggestions to correct the odd collocate. These suggestions are filtered and ranked as top-k suggestions.
We make use of large text corpora such as the American National Corpus (ANC) to identify the common collocations and we use search heuristics to speed the search of collocations and preparing a list of suggestions. We have created a Collocation Frequency Knowledge Base (CFKB). We combine various measures of similarity and frequency of usage using a machine learning classifier to arrive at a formula that can be used to filter and rank the top-k suggestions.
We have implemented a web based solution to evaluate the approach and have considered factors such as caching of intermediate results to enable it to be used in real time.
We claim that our approach would be useful in semantically enhancing Web information retrieval, providing automated error correction in machine translated documents and offering assistance to students using ESL tools.
File Format
Recommended Citation
Varghese, Alan T., "Collocation Error Correction in Web Queries and Text Documents" (2013). Theses, Dissertations and Culminating Projects. 1076.
https://digitalcommons.montclair.edu/etd/1076