One Who Has Acquired a Good Name Has Acquired Something for Himself1: Named Entity Recognition on Talmudic Texts

Bruce, Adina

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12202/7895

Title:	One Who Has Acquired a Good Name Has Acquired Something for Himself1: Named Entity Recognition on Talmudic Texts
Authors:	Waxman, Joshua Bruce, Adina
Keywords:	Named Entity Recognition Talmudic Texts
Issue Date:	27-Dec-2021
Citation:	Bruce, A. (2021, December 27), One Who Has Acquired a Good Name Has Acquired Something for Himself1: Named Entity Recognition on Talmudic Texts, (Undergraduate honors thesis, Yeshiva University).
Series/Report no.:	S. Daniel Abraham Honors Student Theses;December 27, 2021
Abstract:	Abstract In this paper, I will explore the intersection between Natural Language Processing and Talmudic texts. I worked with Professor Joshua Waxman at the Stern Natural Language Processing Lab during this research project to create a Named Entity Recognizer that could be used on Talmudic texts. This process included the creation of gazetteers, that is, lists of people and place names that are found in the Talmud and the Bible. The gazetteers were created through data extraction from the Jastrow Dictionary and the Brown-Driver-Briggs Dictionary using Sefaria’s MongoDB database and utilizing the Compass Client and regular expressions. The gazetteers were used in the tagging of Talmudic texts which were then passed into a Naive-Bayes model Named Entity Recognizer as training data. Features such as the words surrounding each Named Entity, suffixes and prefixes, as well as a gazetteer lookup, were generated for the training data used on the model.¶ As part of this research, I will present a survey of the current state of the art research of using Natural Language Processing for Hebrew language texts, and especially on rabbinic texts. The Hebrew language has certain features that present challenges to utilizing popular Natural Language Processing techniques and tools that have already been developed for languages such as English. Furthermore, Hebrew from different time periods and historical sources for texts will have slight differences in grammar, sentence structure and vocabulary. Therefore, work done creating Natural Language Processing tools for Hebrew from one time period will need to be adapted in order to be used on a text from a different time period. However, techniques developed to address certain aspects of the Hebrew language, such as its high morphological ambiguity, developed for texts from any time period, are helpful to examine, to see what common challenges researchers face and what solutions are developed in the Natural Language Processing field.
Description:	Undergraduate honors thesis / 2-year embargo
URI:	https://hdl.handle.net/20.500.12202/7895
Appears in Collections:	S. Daniel Abraham Honors Student Theses

Files in This Item:

File	Description	Size	Format
AdinaBruce_HonorsThesisFinal Textual recognition 27Dec21 2yrEmbargo.pdf		367.39 kB	Adobe PDF	View/Open

Show full item record Recommend this item

This item is licensed under a Creative Commons License