Class 6: Concretizing Text

Week 6: Concretizing Text, February 12

Note: This week instead of zoom office hours on Wednesday, I will hold office hours right after class on February 12 2-4PM in my office Moore 2005a. You can sign up in class or according to my link. https://tinyurl.com/officehoursnguyen

PART 1 Frameworks (30 min)

What is possible to “know” by computing or reading? (Bode, 2023)

Binary/incompatibility of computational technologies and literary studies! –> rethinking computational practices and material conditions of knowledge not just as distanced representation

Transparency of methodologies, possibilities, limitations- What are we doing with the text? What is knowable?

What is reductive?
What is generative?

Though often conflated with distant reading computational modeling makes a principled departure from assertions of transparency, presenting data and models not as mechanisms for transmitting patterns but heuristic methods for complicating and deepening understandings of literary phenomena…In computational modeling, data and models do not reveal literary phenomena as they really are but re-present or externalize an understanding of them… [crude simplifications of a complex reality]
Katherine Bode, What’s the Matter with Computational Literary Studies (See Reading by Numbers, 2012 and A World of Fiction, 2018)

Andrew Piper, “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel”, p. 68

DH Project Development (Nguyen)

Data: The What You Have
Development: The Research Question(s) & Interventions ***
Publication: The What You Want It To Be

Link to Nguyen slides

PART 2: PRACTICES (60 min)

Practice 1: Computational Text Analysis with Voyant

For class we will focus on Voyant Tools and this dataset, download the dataset here. Optional, bring your own dataset in .txt file format. We will explore one document in the corpus.
Explore
- Cirrus
  - What are the most frequent words in the corpus?
  - Remove the stop word ‘like’ by selecting ‘define options for this tool’
    - Reference: https://voyant-tools.org/docs/#!/guide/stopwords
- Terms
  - What do the color red or green mean for the term? (Experimental feature ‘categories’, explore https://voyant-tools.org/docs/#!/guide/categories)
- TermsBerry
  - What are the top two terms?
  - What happens when you hover over a term in the Termsberry visualization and Trends visualization panel)
  - Reference: https://voyant-tools.org/docs/#!/guide/termsberry
- Contexts
  - Explore the contexts for the top two most frequent words. What did you learn?
  - Reference: https://voyant-tools.org/docs/#!/guide/contexts
- Correlations
  - Select a term to study the correlation between those two terms. What might this mean? What is correlation v. significance?
  - Reference: https://voyant-tools.org/docs/#!/guide/correlations

ucmerced_-_text_analysis_with_voyant_wendy kurtz Download

Voyant Tutorials Slides by Dr. Wendy Kurtz PDF Download

Link to Cindy Nguyen Slides and Case Study of Vietnamese Visual Texts

Final Project Milestone 2: Carving Time and Energy Plan (20 minutes)

*Reminders: ‘Getting the project done’ of tasks v. ‘getting the proposal done within this quarter’ alignment of tasks

Given the short nature of our class, you most likely won’t get to the entirety of the project done, so I would focus your attention on completing the proposal, and you *could* have a rough ‘proof of concept’ demo, images or examples of what you hope it might look like at the end. This could go into your public communication blog post or in the appendix of your proposal.

–> In class quick reflection/mid term check in: http://tinyurl.com/dh201checkin

What is one thing you have learned so far?
Identify the aspects of the course you have found most useful or valuable for learning.
What suggestions would you make to me for improving the course?

Practice 2: TOPIC MODELING TOOL (Optional)

For class we will focus on Voyant Tools (which you don’t need to download) for class 6, but if you would like, here are optional instructions on Topic Modeling: (Optional)

Follow this GitHub instructions to download Topic Modeling Tool: https://senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html (After downloading, you might need to right click, open)
Read DSC #20 by Quinn Dombrowski: “Xanda Rescues the Topic Model Disaster”
Read “Very basic strategies for interpreting results form the Topic Modeling Tool” by Miriam Posner and Andy Wallace
Read “Topic modeling made just simple enough” by Ted Underwood
Topic modeling – like Mallet; if you use the topic modeling tool for a GUI-based interface, be sure to go into the “optional settings” and remove the text in the “Tokenize with regular expression” field. Also, your text files must be saved as UTF-8 otherwise it won’t work.

Resources

On Data and Publishing

UCLA Library – Access Collections

This link includes many tools including the following data and publication platforms:

Data Dryad – An open data publishing platform, Dryad allows UC researchers to archive and publish their data. Free for UC community.
UCLA Dataverse – UCLA’s local data publishing repository
eScholarship – UC’s scholarly publication platform, an option for publishing white papers

In addition, Zenodo is another data publishing platform to consider.

On Text Analysis

Multilingual NLP – Quinn Dombrowski

Future Text Course (AI and Literatures, Cultures, and Languages) – Laura Wittman, Quinn Dombrowski, Eric Kim, Andrew Nepomuceno)

AntConc to find term frequencies

TagAnt multi-language segmenter and Part-Of-Speech (POS) tagger

Some favorite places to read about text analysis

Cultural Analytics (academic journal)
Ted Underwood’s blog such as https://tedunderwood.com/2013/02/20/wordcounts-are-amazing/ (quick blog post)
Katherine Bode’s Reading By Numbers: Recalibrating the Literary Field

Intense debates if you want a deep dive into computational literary studies/cultural analytics/statistical modeling: In 2019 Nan Da writes a ‘field breaking’ article (If you’re curious, it’s here) and DH field responds. Literary scholar and digital humanist Ted Underwood responds here and also summarizes Da’s argument: “Da’s own argument remains limited by its assumption that statistics is an alien world, where humanistic guidelines like “acknowledge context” are replaced by rigid hypothesis-testing protocols. But the colleagues who follow her will recognize, I hope, that statistical reasoning is an extension of ordinary human activities like exploration and debate. Humanistic principles still apply here. Quantitative models can test theories, but they are also guided by theory, and they shouldn’t pretend to answer questions more precisely than our theories can frame them. In short, I am glad Da wrote “The Computational Case” because her argument has ended up demonstrating—as a social gesture—what its text denied: that questions about mathematical modeling are continuous with debates about interpretive theory.” (in other words, quantitative models and interpretive theory work together, the digital works together with the humanities))

Michael Gavin, “Is there text in my data?” Journal of Cultural Analytics Volume 5, Issue 1, 2020 https://culturalanalytics.org/article/11830-is-there-a-text-in-my-data-part-1-on-counting-words

Piper, Andrew “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel” New Literary history, 2015, 46:63-98. PDF

Digital Humanities Project Management