Private indexing is a set of approaches for analyzing research data that are similar or resemble similar ones. This is used in the database to keep track of the keys and their values. The main subject of this research is private indexing in record linkage to secure the data. Because unique personal identification numbers or social security numbers are not accessible in most countries or databases, data linkage is limited to attributes such as date of birth and names to distinguish between the number of records and the real-life entities they represent. For security reasons, the encryption of these identifiers is required. Privacy-preserving record linkage, frequently used to link private data within several databases from different companies, prevents sensitive information from being exposed to other companies. This research used a combined method to evaluate the data, using classic and new indexing methods. A combined approach is more secure than typical standard indexing in terms of privacy. Multibit tree indexing, which groups comparable data in many ways, creates a scalable tree-like structure that is both space and time flexible, as it avoids the need for redundant block structures. Because the record pair numbers to compare are the Cartesian product of both the file record numbers, the work required grows with the number of records to compare in the files. The evaluation findings of this research showed that combined method is scalable in terms of the number of databases to be linked, the database size, and the time required.
A./M. Mitzenmacher, Kirsch. (2006). Less Hashing Same Performance: Building a Better CLK, 456-467, In Azar, Y./T. Erlebach (Eds.), Algorithms-ESA 2006, Proceedings of the 14th Annual European Symposium.
Baxter R, GuL. (2004). Adaptive filtering for efficient record linkage, In SIAM international conference on data mining, Orlando.
C. Borgs. (2019). Optimal Parameter Choice for Bloom Filter-based Privacy-preserving Record Linkage.
Cohen WW, Richman J. (2002). Learning to match and cluster large high dimensional datasets for data integration, In Proceedings of ACM SIGKDD, Edmonton.
Dice, L. R. (1945). Measures of the Amount of Ecologic Association between Species, In: Ecology.
Holmes, D., McCabe, C.M. (2002). Improving precision and recall for Soundex retrieval, In: Proceedings of the IEEE International Conference on Information Technology-Coding and Computing, Las Vegas.
Kristensen, T. G./J. Nielsen/C. N. S. Pedersen. (2010). A Tree-based Method for the Rapid Screening of Chemical Fingerprints, In: Algorithms for Molecular Biology.
Latanya Sweeney. (2002). k-anonymity: A model for protecting privacy, In: International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10.05.
Lifang Gu and Rohan Baxter. (2006). Decision models for record linkage, In: Data mining, Springer.
M. Hernandez and S. Stolfo. (1998). Real world data is dirty: data cleansing and the merge/purge problem, Journal of Data Mining and Knowledge Discovery, 1(2).
Peter Christen. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer Science and Business Media.
Pyle D. (1999). Data preparation for data mining, Morgan Kaufmann Publishers, San Francisco.
Schnell, R./C. Borgs. (2017). State of the Art Privacy-preserving Record Linkage of Large Administrative Datasets, New Techniques and Technologies for Statistics.
S. Joshua Swamidass and Pierre Baldi. (2007). Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time, In: Journal of Chemical Information and Modeling.
Tobias Bachteler, Rainer Schnell and J ̈org Reiher. (2011). A novel error-tolerant anonymous linking code, Working Paper WP-GRLC-2011-02, German Record Linkage Center.
Desai, Pranita Maruti Ms. and Shelake, Vijay Maruti Mr.
"A Combined Approach For Private Indexing Mechanism,"
Journal of Digital Forensics, Security and Law: Vol. 17
, Article 6.
Available at: https://commons.erau.edu/jdfsl/vol17/iss1/6