Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12202/9035
Title: Minimum edit distance on a probabilistic string
Authors: Broder, Alan
Schwartz, Dahlia
Keywords: Optical Character Recognition (OCR)
probabilistic string
Levenshtein distance
distance algorithms
Issue Date: 23-May-2023
Publisher: Yeshiva University
Citation: Schwartz, D. (2023, May 23). Minimum edit distance on a probabilistic string [Unpublished undergraduate honors thesis]. Yeshiva University.
Series/Report no.: S. Daniel Abraham Honors Program;May 23, 2023
Abstract: Optical character recognition, or OCR, is a method used to convert images of typed or handwritten text into machine-encoded text. Oftentimes, text can be illegible or worn out, and therefore ambiguous. In these situations, OCR models can output a probabilistic string, or sequence of characters, with a ranking of several less likely options as well. In order to quantify how dissimilar the output string is from another string, Levenshtein distance, or other edit distance algorithms are used. These algorithms count the number of operations required to convert one string into another. The possible operations that can be performed in most edit distance algorithms consist of inserting a character, deleting a character, and replacing a character. The smaller the Levenshtein distance between two strings, the more similar the strings are to one another. There are various edit distance algorithms, each with their own run-time, efficiency, and readability, however, most of these algorithms do not take probabilistic strings into account. This paper’s contribution is to survey prominent methods of calculating the minimum edit distance of two strings and to evaluate how each method takes run-time, efficiency, and the ability to work with probabilistic strings into account. This will help automate the process to find missing and extra letters in scrolls that were copied from the Masoretic Text, which can help Biblical researchers.
Description: Undergraduate honors thesis / Open access
URI: https://hdl.handle.net/20.500.12202/9035
Appears in Collections:S. Daniel Abraham Honors Student Theses

Files in This Item:
File Description SizeFormat 
Dahlia Schwartz Thesis.pdf5.1 MBAdobe PDFThumbnail
View/Open


This item is licensed under a Creative Commons License Creative Commons