Genre Analysis Via Constituent Tree Structure

Schick, Moriyah

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12202/5642

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Waxman, Joshua
dc.contributor.author	Schick, Moriyah
dc.date.accessioned	2020-06-11T18:30:45Z
dc.date.available	2020-06-11T18:30:45Z
dc.date.issued	2020-05-22
dc.identifier.citation	Schick, Moriyah. Genre Analysis Via Constituent Tree Structure Presented to the S. Daniel Abraham Honors Program in Partial Fulfillment of the Requirements for Completion of the Program. NY: Stern College for Women. Yeshiva University, May 22, 2020. Mentor: Dr. Joshua Waxman, Computer Science.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.12202/5642
dc.description	Senior honors thesis. Opt-out: For access, please contact yair@yu.edu	en_US
dc.description.abstract	Among the many tasks within the field of natural language processing, genre analysis is one of the most difficult as there is no objective standard of what the features of a genre are. Past works have attempted to apply a combination of syntactic and lexical machine learning and deep learning models to categorize texts by genre effectively. Syntactic features have additionally been found to be important features in authorship analysis. This paper applies previous findings related to the use of syntactic features to the area of genre analysis, specifically testing whether constituency based parse trees derived from the Penn Treebank, and other related lexical features, are valuable to different supervised machine learning models, such as the Naive Bayes and Maximum Entropy classifiers in determining genre. The accuracies of these models as compared to the baseline show that these syntactic features are indeed important and result in a significant increase in accuracy.	en_US
dc.description.sponsorship	S. Daniel Abraham Honors Program	en_US
dc.language.iso	en_US	en_US
dc.publisher	New York, NY. Stern College for Women. Yeshiva University.	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Senior honors thesis	en_US
dc.subject	natural language processing	en_US
dc.subject	genre analysis	en_US
dc.subject	syntactic machine learning	en_US
dc.subject	lexical machine learning	en_US
dc.subject	Penn Treebank	en_US
dc.subject	authorship analysis	en_US
dc.subject	machine learning models	en_US
dc.subject	Naive Bayes	en_US
dc.subject	Maximum Entropy classifiers	en_US
dc.subject	constituent tree structure	en_US
dc.title	Genre Analysis Via Constituent Tree Structure	en_US
dc.type	Thesis	en_US
Appears in Collections:	S. Daniel Abraham Honors Student Theses

Files in This Item:

File	Description	Size	Format
Genre Analysis Via Constituent Tree Structure.pdf Restricted Access		229.7 kB	Adobe PDF	View/Open

Show simple item record Recommend this item

This item is licensed under a Creative Commons License