Date of Award

8-2022

Document Type

Thesis

Degree Name

Master of Science (MS)

College/School

College of Science and Mathematics

Department/Program

Computer Science

Thesis Sponsor/Dissertation Chair/Project Chair

Bharath Samanthula

Committee Member

Boxiang Dong

Committee Member

Jiacheng Shang

Abstract

The ability to search for similar documents is a well-known problem on the Web and Information Retrieval field. For example, identifying similar profiles across different government agencies is an important process during intelligence gathering. Nonetheless, when data belongs to multiple parties, internal security policies and government regulations cannot allow the participating parties to freely share their sensitive documents. In our project, we aim to address the following problem: Given a user’s query Q and an encrypted database of documents stored on a third-party cloud server, we want to retrieve top-k documents similar to Q without disclosing Q and the contents of the database to the cloud server. We translated documents into bloom filter representations and used the Jaccard Coefficient metric in order to find similarity between each document and Q. We will be conducting empirical tests to validate the reliability, speed, and space efficiency of using Bloom Filters in order to perform operations over Encrypted data. Our proposed solution allows users to keep the contents of documents hidden from unauthorized parties and at the same time facilitating end-users to efficiently retrieve top-k similar documents from the cloud in a secure manner. In our experiments, we found that using a bloom filter provided adequate security amongst the entities involved. The use of bloom filters to measure the Jaccard coefficient accuracy showed good results at a reasonable bit size.

File Format

PDF

Share

COinS