One Who Has Acquired a Good Name Has Acquired Something for Himself1: Named Entity Recognition on Talmudic Texts

Date

2021-12-27

Journal Title

Journal ISSN

Volume Title

Publisher

YU Faculty Profile

Abstract

Abstract In this paper, I will explore the intersection between Natural Language Processing and Talmudic texts. I worked with Professor Joshua Waxman at the Stern Natural Language Processing Lab during this research project to create a Named Entity Recognizer that could be used on Talmudic texts. This process included the creation of gazetteers, that is, lists of people and place names that are found in the Talmud and the Bible. The gazetteers were created through data extraction from the Jastrow Dictionary and the Brown-Driver-Briggs Dictionary using Sefaria’s MongoDB database and utilizing the Compass Client and regular expressions. The gazetteers were used in the tagging of Talmudic texts which were then passed into a Naive-Bayes model Named Entity Recognizer as training data. Features such as the words surrounding each Named Entity, suffixes and prefixes, as well as a gazetteer lookup, were generated for the training data used on the model.¶ As part of this research, I will present a survey of the current state of the art research of using Natural Language Processing for Hebrew language texts, and especially on rabbinic texts. The Hebrew language has certain features that present challenges to utilizing popular Natural Language Processing techniques and tools that have already been developed for languages such as English. Furthermore, Hebrew from different time periods and historical sources for texts will have slight differences in grammar, sentence structure and vocabulary. Therefore, work done creating Natural Language Processing tools for Hebrew from one time period will need to be adapted in order to be used on a text from a different time period. However, techniques developed to address certain aspects of the Hebrew language, such as its high morphological ambiguity, developed for texts from any time period, are helpful to examine, to see what common challenges researchers face and what solutions are developed in the Natural Language Processing field.

Description

Undergraduate honors thesis / 2-year embargo

Keywords

Named Entity Recognition, Talmudic Texts

Citation

Bruce, A. (2021, December 27), One Who Has Acquired a Good Name Has Acquired Something for Himself1: Named Entity Recognition on Talmudic Texts, (Undergraduate honors thesis, Yeshiva University).