Abstract
When ChatGPT was released to the public in late 2022, people quickly came to realize that many
things were about to change. While teachers started to get headaches trying to figure out whether their
students’ essays were written by AI, software engineers rejoiced, as they boosted their productivity levels
by leveraging the new AI to write boilerplate code much more quickly than a human is capable of. How
does this technology work, and what makes it so different than what came before it? In this paper, we will
explore the history of natural language processing (NLP), starting with the basic, statistics based approach
of the Naive Bayes classifier. Then we’ll explore two different types of neural networks, and end with the
transformer model that backs GPT and other large language models, doing a deep-dive into the attention
mechanism that is the basis of transformer architecture.
Citation
Cantor, N. (2023, May). From Naive Bayes to transformers: Evolution of natural language processing models for improved text understanding [Undergraduate honors thesis, Yeshiva University].
*This is constructed from limited available data and may be imprecise.