CNN as an approach to handwritten Hebrew letter classification

Date

2023-05

Journal Title

Journal ISSN

Volume Title

Publisher

Yeshiva University

YU Faculty Profile

Abstract

One of the most notable milestones in the world of Machine Learning was when models became capable of recognizing handwritten text. The ability to write a number, or an English letter, and have it be accurately recognized by a computer, has been perfected to such an extent that computers can even outperform humans at such a task, unphased by messy handwriting. One of the tasks currently being worked on is expanding this technology to a diverse set of alphabets and texts, such as Swedish, Arabic and Urdu. However, one of the alphabets that does not currently have an advanced publicly available classification model is handwritten Hebrew, also known as Hebrew script.¶ A significant impediment to developing a classification model is obtaining a large quantity of high-quality data for the model to learn from. This preprocessing step has been accomplished by a group from the Ben-Gurion University of the Negev, who, in 2020, came out with a paper describing their release of the Hebrew Handwritten Dataset (HHD)1. They discuss the formal methods that they used to come up with a dataset of Hebrew handwritten letters that is robust and represents a wide variety of handwriting for all the letters, which could then be used in the production of a classification model.

Description

Undergraduate honors thesis / YU only

Keywords

Machine Learning, Hebrew script

Citation

Siegman, R. (2023, May). CNN as an approach to handwritten Hebrew letter classification [Unpublished undergraduate honors thesis, Yeshiva University].