CNN as an approach to handwritten Hebrew letter classification
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
YU Faculty Profile
Abstract
One of the most notable milestones in the world of Machine Learning was when models became capable of recognizing handwritten text. The ability to write a number, or an English letter, and have it be accurately recognized by a computer, has been perfected to such an extent that computers can even outperform humans at such a task, unphased by messy handwriting. One of the tasks currently being worked on is expanding this technology to a diverse set of alphabets and texts, such as Swedish, Arabic and Urdu. However, one of the alphabets that does not currently have an advanced publicly available classification model is handwritten Hebrew, also known as Hebrew script.¶ A significant impediment to developing a classification model is obtaining a large quantity of high-quality data for the model to learn from. This preprocessing step has been accomplished by a group from the Ben-Gurion University of the Negev, who, in 2020, came out with a paper describing their release of the Hebrew Handwritten Dataset (HHD)1. They discuss the formal methods that they used to come up with a dataset of Hebrew handwritten letters that is robust and represents a wide variety of handwriting for all the letters, which could then be used in the production of a classification model.