Show simple item record

dc.contributor.advisorWaxman, Joshua
dc.contributor.authorSchick, Moriyah
dc.date.accessioned2020-06-11T18:30:45Z
dc.date.available2020-06-11T18:30:45Z
dc.date.issued2020-05-22
dc.identifier.citationSchick, Moriyah. Genre Analysis Via Constituent Tree Structure Presented to the S. Daniel Abraham Honors Program in Partial Fulfillment of the Requirements for Completion of the Program. NY: Stern College for Women. Yeshiva University, May 22, 2020. Mentor: Dr. Joshua Waxman, Computer Science.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12202/5642
dc.descriptionSenior honors thesis. Opt-out: For access, please contact yair@yu.eduen_US
dc.description.abstractAmong the many tasks within the field of natural language processing, genre analysis is one of the most difficult as there is no objective standard of what the features of a genre are. Past works have attempted to apply a combination of syntactic and lexical machine learning and deep learning models to categorize texts by genre effectively. Syntactic features have additionally been found to be important features in authorship analysis. This paper applies previous findings related to the use of syntactic features to the area of genre analysis, specifically testing whether constituency based parse trees derived from the Penn Treebank, and other related lexical features, are valuable to different supervised machine learning models, such as the Naive Bayes and Maximum Entropy classifiers in determining genre. The accuracies of these models as compared to the baseline show that these syntactic features are indeed important and result in a significant increase in accuracy.en_US
dc.description.sponsorshipS. Daniel Abraham Honors Programen_US
dc.language.isoen_USen_US
dc.publisherNew York, NY. Stern College for Women. Yeshiva University.en_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectSenior honors thesisen_US
dc.subjectnatural language processingen_US
dc.subjectgenre analysisen_US
dc.subjectsyntactic machine learningen_US
dc.subjectlexical machine learningen_US
dc.subjectPenn Treebanken_US
dc.subjectauthorship analysisen_US
dc.subjectmachine learning modelsen_US
dc.subjectNaive Bayesen_US
dc.subjectMaximum Entropy classifiersen_US
dc.subjectconstituent tree structureen_US
dc.titleGenre Analysis Via Constituent Tree Structureen_US
dc.typeThesisen_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States