Date of Award

5-2013

Document Type

Thesis

Degree Name

Master of Science (MS)

College/School

College of Science and Mathematics

Department/Program

Computer Science

Thesis Sponsor/Dissertation Chair/Project Chair

Aparna Varde

Committee Member

Eileen M. Fitzpatrick

Committee Member

Anna Feldman

Abstract

Collocations are words in English that occur together frequently. Non-native speakers of English tend to confuse certain terms with other similar terms. This causes one of the terms to be substituted with a term that is not commonly used with the other term. “Powerful tea” is an example of such an odd collocation. In this scenario the more commonly used term is “strong tea”.

This paper proposes an approach called CollOrder to detect such odd collocations in written English. CollOrder also provides suggestions to correct the odd collocate. These suggestions are filtered and ranked as top-k suggestions.

We make use of large text corpora such as the American National Corpus (ANC) to identify the common collocations and we use search heuristics to speed the search of collocations and preparing a list of suggestions. We have created a Collocation Frequency Knowledge Base (CFKB). We combine various measures of similarity and frequency of usage using a machine learning classifier to arrive at a formula that can be used to filter and rank the top-k suggestions.

We have implemented a web based solution to evaluate the approach and have considered factors such as caching of intermediate results to enable it to be used in real time.

We claim that our approach would be useful in semantically enhancing Web information retrieval, providing automated error correction in machine translated documents and offering assistance to students using ESL tools.

File Format

PDF

Recommended Citation

Varghese, Alan T., "Collocation Error Correction in Web Queries and Text Documents" (2013). Theses, Dissertations and Culminating Projects. 1076.
https://digitalcommons.montclair.edu/etd/1076

Download

Included in

Computer Sciences Commons

COinS

Theses, Dissertations and Culminating Projects

Collocation Error Correction in Web Queries and Text Documents

Date of Award

Document Type

Degree Name

College/School

Department/Program

Thesis Sponsor/Dissertation Chair/Project Chair

Committee Member

Committee Member

Abstract

File Format

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Theses, Dissertations and Culminating Projects

Collocation Error Correction in Web Queries and Text Documents

Author

Date of Award

Document Type

Degree Name

College/School

Department/Program

Thesis Sponsor/Dissertation Chair/Project Chair

Committee Member

Committee Member

Abstract

File Format

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>