Week 6: Concretizing Text, February 12
Note: This week instead of zoom office hours on Wednesday, I will hold office hours right after class on February 12 2-4PM in my office Moore 2005a. You can sign up in class or according to my link. https://tinyurl.com/officehoursnguyen
PART 1 Frameworks (30 min)
What is possible to “know” by computing or reading? (Bode, 2023)
Binary/incompatibility of computational technologies and literary studies! –> rethinking computational practices and material conditions of knowledge not just as distanced representation
Transparency of methodologies, possibilities, limitations- What are we doing with the text? What is knowable?
- What is reductive?
- What is generative?
Though often conflated with distant reading computational modeling makes a principled departure from assertions of transparency, presenting data and models not as mechanisms for transmitting patterns but heuristic methods for complicating and deepening understandings of literary phenomena…In computational modeling, data and models do not reveal literary phenomena as they really are but re-present or externalize an understanding of them… [crude simplifications of a complex reality]
Katherine Bode, What’s the Matter with Computational Literary Studies (See Reading by Numbers, 2012 and A World of Fiction, 2018)
DH Project Development (Nguyen)
- Data: The What You Have
- Development: The Research Question(s) & Interventions ***
- Publication: The What You Want It To Be
PART 2: PRACTICES (60 min)
Practice 1: Computational Text Analysis with Voyant
- For class we will focus on Voyant Tools and this dataset, download the dataset here. Optional, bring your own dataset in .txt file format. We will explore one document in the corpus.
- Explore
- Cirrus
- What are the most frequent words in the corpus?
- Remove the stop word ‘like’ by selecting ‘define options for this tool’
- Terms
- What do the color red or green mean for the term? (Experimental feature ‘categories’, explore https://voyant-tools.org/docs/#!/guide/categories)
- TermsBerry
- What are the top two terms?
- What happens when you hover over a term in the Termsberry visualization and Trends visualization panel)
- Reference: https://voyant-tools.org/docs/#!/guide/termsberry
- Contexts
- Explore the contexts for the top two most frequent words. What did you learn?
- Reference: https://voyant-tools.org/docs/#!/guide/contexts
- Correlations
- Select a term to study the correlation between those two terms. What might this mean? What is correlation v. significance?
- Reference: https://voyant-tools.org/docs/#!/guide/correlations
- Cirrus
Voyant Tutorials Slides by Dr. Wendy Kurtz PDF Download
Link to Cindy Nguyen Slides and Case Study of Vietnamese Visual Texts
Final Project Milestone 2: Carving Time and Energy Plan (20 minutes)
*Reminders: ‘Getting the project done’ of tasks v. ‘getting the proposal done within this quarter’ alignment of tasks
Given the short nature of our class, you most likely won’t get to the entirety of the project done, so I would focus your attention on completing the proposal, and you *could* have a rough ‘proof of concept’ demo, images or examples of what you hope it might look like at the end. This could go into your public communication blog post or in the appendix of your proposal.
–> In class quick reflection/mid term check in: http://tinyurl.com/dh201checkin
- What is one thing you have learned so far?
- Identify the aspects of the course you have found most useful or valuable for learning.
- What suggestions would you make to me for improving the course?
Practice 2: TOPIC MODELING TOOL (Optional)
For class we will focus on Voyant Tools (which you don’t need to download) for class 6, but if you would like, here are optional instructions on Topic Modeling: (Optional)
- Follow this GitHub instructions to download Topic Modeling Tool: https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html (After downloading, you might need to right click, open)
- Read DSC #20 by Quinn Dombrowski: “Xanda Rescues the Topic Model Disaster”
- Read “Very basic strategies for interpreting results form the Topic Modeling Tool” by Miriam Posner and Andy Wallace
- Read “Topic modeling made just simple enough” by Ted Underwood
- Topic modeling – like Mallet; if you use the topic modeling tool for a GUI-based interface, be sure to go into the “optional settings” and remove the text in the “Tokenize with regular expression” field. Also, your text files must be saved as UTF-8 otherwise it won’t work.
Resources
On Data and Publishing
UCLA Library – Access Collections
This link includes many tools including the following data and publication platforms:
- Data Dryad – An open data publishing platform, Dryad allows UC researchers to archive and publish their data. Free for UC community.
- UCLA Dataverse – UCLA’s local data publishing repository
- eScholarship – UC’s scholarly publication platform, an option for publishing white papers
In addition, Zenodo is another data publishing platform to consider.
On Text Analysis
Multilingual NLP – Quinn Dombrowski
AntConc to find term frequencies
TagAnt multi-language segmenter and Part-Of-Speech (POS) tagger
Some favorite places to read about text analysis
- Cultural Analytics (academic journal)
- Ted Underwood’s blog such as https://tedunderwood.com/2013/02/20/wordcounts-are-amazing/ (quick blog post)
- Katherine Bode’s Reading By Numbers: Recalibrating the Literary Field
Intense debates if you want a deep dive into computational literary studies/cultural analytics/statistical modeling: In 2019 Nan Da writes a ‘field breaking’ article (If you’re curious, it’s here) and DH field responds. Literary scholar and digital humanist Ted Underwood responds here and also summarizes Da’s argument: “Da’s own argument remains limited by its assumption that statistics is an alien world, where humanistic guidelines like “acknowledge context” are replaced by rigid hypothesis-testing protocols. But the colleagues who follow her will recognize, I hope, that statistical reasoning is an extension of ordinary human activities like exploration and debate. Humanistic principles still apply here. Quantitative models can test theories, but they are also guided by theory, and they shouldn’t pretend to answer questions more precisely than our theories can frame them. In short, I am glad Da wrote “The Computational Case” because her argument has ended up demonstrating—as a social gesture—what its text denied: that questions about mathematical modeling are continuous with debates about interpretive theory.” (in other words, quantitative models and interpretive theory work together, the digital works together with the humanities))
Michael Gavin, “Is there text in my data?” Journal of Cultural Analytics Volume 5, Issue 1, 2020 https://culturalanalytics.org/article/11830-is-there-a-text-in-my-data-part-1-on-counting-words
Piper, Andrew “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel” New Literary history, 2015, 46:63-98. PDF
Leave a comment