Among the many tasks within the field of natural language processing, genre analysis
is one of the most difficult as there is no objective standard of what the features of a genre
are. Past works have attempted to apply a combination of syntactic and lexical machine
learning and deep learning models to categorize texts by genre effectively. Syntactic features
have additionally been found to be important features in authorship analysis. This paper
applies previous findings related to the use of syntactic features to the area of genre analysis,
specifically testing whether constituency based parse trees derived from the Penn Treebank,
and other related lexical features, are valuable to different supervised machine learning
models, such as the Naive Bayes and Maximum Entropy classifiers in determining genre.
The accuracies of these models as compared to the baseline show that these syntactic features
are indeed important and result in a significant increase in accuracy.
Senior honors thesis. Opt-out: For access, please contact email@example.com