Finalized instructions for the final assignment

The final assignment of the S20 semester uses R notebooks to look at sentiment analysis in your choice of two different textual media. As you recall, sentiment analysis attempts to extract affective or subjective information from textual data and the method we are using is a rule-based one where we match those words against a pre-defined lexicon. (Well, there is non-textual sentiment analysis as well, but that goes beyond this course!) 

The final assignment is not a full length final project, but is of the length and complexity of the midterm.

Value:  20% of your final grade.
Length: approx. 1000 words
Notebooks: available on GitHub or in our shared class drive.

To complete the final assignment, you should choose one of the two following topics.  If you have another idea for a final assignment and would like to discuss it with me, please be in touch with enough time to define the assignment.

In both cases, you should explain in your writeup (1) why you chose the texts/hashtags you did (2) what you expected to be the results (your hypothesis), (3) what your results actually were and (4) what tentative conclusions you can draw.  If you modify the code or you use custom stopwords, please explain your choice.  If you could design another lexicon, explain what it would look like. Please make sure that in your write up, you reference some of the readings of the course and that you include captioned visuals explaining your findings.

 

(1) The first project possibility deals with sentiment analysis using the `bing` and `nrc` lexicon and a small corpus of texts of your choice.  I would suggest that you use texts from Project Gutenberg. Please choose at least 3 texts for comparative purposes. The choice of the corpus is what helps you define the analysis you are doing.

Here is my suggestion, but you can approach it differently if you like. In Project Gutenberg, search for texts that might correspond to genres for which this kind of sentiment analysis could be interesting (horror, sci-fi, melodrama, autobiography, memoir).  Alternatively, you can choose three texts by the same author that are known to be very different in “feeling.”  Choose books that you know something about, so that your analysis is richer. Your goal in this project is to discuss what sentiment analysis help us understand about texts, and what it does not.

https://twitter.com/DJWrisley/status/1255411882107777024?s=20

 

(2) The second project possibility deals with sentiment analysis using the `bing` lexicon and two or three Twitter hashtags.  It is the choice of the ‘hashtag’ is what helps you define the analysis you are doing. In the notebook I oppose #trumpdisinfectant and #OPENAMERICANOW, precisely because they are opposing viewpoints on the same moment of pandemic (the first, representing a somewhat sarcastic approach to left politics, and the second, a right, pro-business approach–although the second ended up having different kinds of voices tweeting to it). One place to begin is on the “trends” tab in Twitter.

If you take the Twitter route, I would suggest that you choose hashtags for a subject that you know something about (finance, sports, music, space discovery, etc) and that once you choose your hashtags to repeat the analysis (saving the tweets each time) over a few days. Depending on the hashtag it might move very fast or very slow. This would give you a good idea of comparison, perhaps in reaction to current events. Be careful when you are choosing a hashtag to make sure that it is majority English-language, otherwise the sentiment lexicon matching won’t work.  Remember that with a developer account, you cannot request an infinite number of tweets per day.

If you choose this option, getting a developer account at Twitter in the current environment takes about a week.  After you have it set up, creating the app to get the codes necessary is a relatively fast process.  Plan ahead accordingly. When you are requesting it, you can simply state as justification that you are working on a class project about sentiment on Twitter. You can even send the URL of this assignment page.

https://twitter.com/DJWrisley/status/1255751117461434368?s=20

Alternative to this Twitter assignment: if you have been active enough on Twitter to have more than 1500 tweets, and you either have another account with the same number of tweets or have a friend who also has been active, you can follow the instructions in the original case study #7 from Text Mining with R and compare the two Twitter archives.  Instructions for requesting your Twitter archive are here.  Please make sure that you let me know which Twitter handles you are comparing. 

Good luck!