Date of Award
8-2022
Document Type
Thesis
Degree Name
Master of Science (MS)
College/School
College of Science and Mathematics
Department/Program
Computer Science
Thesis Sponsor/Dissertation Chair/Project Chair
Bharath Samanthula
Committee Member
Boxiang Dong
Committee Member
Jiacheng Shang
Abstract
The ability to search for similar documents is a well-known problem on the Web and Information Retrieval field. For example, identifying similar profiles across different government agencies is an important process during intelligence gathering. Nonetheless, when data belongs to multiple parties, internal security policies and government regulations cannot allow the participating parties to freely share their sensitive documents. In our project, we aim to address the following problem: Given a user’s query Q and an encrypted database of documents stored on a third-party cloud server, we want to retrieve top-k documents similar to Q without disclosing Q and the contents of the database to the cloud server. We translated documents into bloom filter representations and used the Jaccard Coefficient metric in order to find similarity between each document and Q. We will be conducting empirical tests to validate the reliability, speed, and space efficiency of using Bloom Filters in order to perform operations over Encrypted data. Our proposed solution allows users to keep the contents of documents hidden from unauthorized parties and at the same time facilitating end-users to efficiently retrieve top-k similar documents from the cloud in a secure manner. In our experiments, we found that using a bloom filter provided adequate security amongst the entities involved. The use of bloom filters to measure the Jaccard coefficient accuracy showed good results at a reasonable bit size.
File Format
Recommended Citation
Guzman, German, "Secure Retrieval of Encrypted Similar Documents using Bloom Filters" (2022). Theses, Dissertations and Culminating Projects. 1129.
https://digitalcommons.montclair.edu/etd/1129