Date of Award
Master of Science (MS)
College of Science and Mathematics
Thesis Sponsor/Dissertation Chair/Project Chair
The ability to search for similar documents is a well-known problem on the Web and Information Retrieval field. For example, identifying similar profiles across different government agencies is an important process during intelligence gathering. Nonetheless, when data belongs to multiple parties, internal security policies and government regulations cannot allow the participating parties to freely share their sensitive documents. In our project, we aim to address the following problem: Given a user’s query Q and an encrypted database of documents stored on a third-party cloud server, we want to retrieve top-k documents similar to Q without disclosing Q and the contents of the database to the cloud server. We translated documents into bloom filter representations and used the Jaccard Coefficient metric in order to find similarity between each document and Q. We will be conducting empirical tests to validate the reliability, speed, and space efficiency of using Bloom Filters in order to perform operations over Encrypted data. Our proposed solution allows users to keep the contents of documents hidden from unauthorized parties and at the same time facilitating end-users to efficiently retrieve top-k similar documents from the cloud in a secure manner. In our experiments, we found that using a bloom filter provided adequate security amongst the entities involved. The use of bloom filters to measure the Jaccard coefficient accuracy showed good results at a reasonable bit size.
Guzman, German, "Secure Retrieval of Encrypted Similar Documents using Bloom Filters" (2022). Theses, Dissertations and Culminating Projects. 1129.