Week 3 Readings & Responses: Humanities Data

Important: Sign up for facilitation here by January 15

Complete readings and post your response by Friday January 19. Post a peer comment by Sunday January 21.

Submit an extended ‘comment’ on this blog post below with the following:

  1. Write one takeaway point from the readings. (This could be a few brief sentences, reflecting on any of the following questions: Why is this significant? How might I apply some of these ideas to my own research and life? You can write the takeaway point from a summation of the readings, have the readings in conversation with one another, or focus on just one reading.)
  2. Write one clarification question to the class. What did not make sense or did you want to understand more?
  3. Write one discussion question to the class. When writing a discussion question, think about how the question might open up a multi-directional conversation about the topic.

Readings

  1. Posner, Miriam. “Humanities Data: A Necessary Contradiction
  2. Rawson, Katie and Munoz, Trevor. “Against Cleaning
  3. Groskopf, Christopher. “The Quartz Guide to Bad Data.” Quartz.
  4. Wickham, Hadley . “Tidy Data” Journal of Statistical Software [Online], Volume 59, Issue 10 (12 September 2014)
  5. Sanders, Ashley, Chapter 3 of Visualizing History’s Fragments “Humanistic Data – Classifying Individuals & Visualizing Silences” Link to Preprint File here
  6. Explore Library of Missing Datasets (2016) art installation by Mimi Onuoha

Preparation for Class 3

  1. Download OpenRefine. Once you’ve done that, double-click on the application to be sure it opens. If you’re on a Mac and you get a warning that says “MacOS cannot verify the developer of OpenRefine.app,” please see these instructions.
  2. Optional: Bring your own “dataset.” It can be in any form (spreadsheet, PDF, image), we’ll designate some time to work through how to transform the source into data.

*If you have any issues with the download, don’t worry, we will have a few back up machines with the software and we can also do group pairing to troubleshoot. It’s wonderful if you could get it downloaded to your local machine so that you could use it on your own data in the future, but if you can’t get to it before Week 3, don’t worry about it.

Responses to “Week 3 Readings & Responses: Humanities Data”

  1. xinmenghu

    Coco Hu
    Digital Humanities 201 Week 3 Reading Takeaways

    One of the most essential takeaways from Katie Rawson and Trevor Munoz’s study is to recognize the complexity and nuances of data cleaning in humanities research. The article points out that data cleaning in humanities research requires a more detailed and context-sensitive approach than in other fields. It is not simply a matter of correcting errors and inconsistencies in datasets; it also includes working with the underlying information and examining contextual aspects pertinent to humanities. Recognizing the complexity of data cleaning in humanities research allows researchers to design more effective and contextually sensitive ways of data management, resulting in more accurate and relevant research outcomes.

    Also concerning Rawson and Munoz’s reading, I want to know more about what specific challenges humanists encounter when engaging in data-cleaning processes within the context of historical and cultural data, and how these challenges differ from those encountered in other fields.

    I have come up with a discussion question from Hadley Wickham’s article introducing “tidy data” as an effective tool for providing a standard way of structuring a dataset, which makes it easier to manipulate, model, and visualize data. I am wondering how might the principles of tidy data and the development of tidy tools impact the reproducibility and transparency of data analysis in research and industry.
    

    Liked by 1 person

    1. Melinda Linyu Xiang

      Yes, data transparency is a huge problem. The dataset owned by private companies and government is massive and most of the time, what we can get is only the outcome. It is difficult to eliminate the possibility of data manipulation without knowing the initial data, the research methodology, and analysis process. Probably introducing a thrid and neutral insititution to do appraisal on published outcomes would help.

      Like

  2. Melinda Linyu Xiang

    After reading Miriam’s article,I have gained a comprehensive understanding of how digital tools contribute to enhancing the support for humanities, making findings more traceable and valid from others’ perspectives. Technologies continue to advance daily, and the data format may eventually become unprocessable, much like our inability to apply most software to recordings stored on CDs. Beyond technical barriers, societal issues are in constant change, such as those related to gender and race.

    From my perspective, digital humanities are making strides in documenting previously uncounted elements: undocumented gender, unheard voices, and overlooked events. However, there exists a potential risk when analyzing humanities data through digital means, particularly in the realm of categorization. Human minds have a tendency to create schemas, placing diverse individuals and subjects into similar categories for efficiency. Unfortunately, this inclination extends to man-made digital tools. Consequently, if we establish fixed categories and employ them in data analysis, we risk overlooking valuable nuances. Worse still, when we extend categorization to influence society, we may inadvertently reinforce rigid stereotypes.

    I am interested in the process of how to decide which aspects to transform from recordings into data – accent, tone, flow, choice or words, voice or others?

    Liked by 1 person

    1. edenwetzel

      Melinda I think this is a very interesting question you have brought up, and I am interested to learn more about what our peers think as well, as I have little experience in this field myself! It may be useful in some settings to transform one or a few of these aspects from recordings into data, depending on what you are trying to analyze, and may be valuable to transform all of these aspects as an act of preservation. What do others think?

      Liked by 1 person

    2. jo alvarado

      Melinda, this is such a fascinating question! Since I’m in literary studies, the act of transcription and translation is such a vexed, complicated, and subjective process. Transforming recordings into print opens up many questions on authenticity and preservation like Eden points out above. I agree that it depends on the kind of story the researcher wants to tell. But in making these specific choices, I wonder how we can avoid suppressing diversity and constructing restrictive schemas. I think this opens up to a bigger discussion about data transparency and researcher bias, and we must ensure that we are communicating nuanced stories rather than objective and static conclusions.

      Liked by 1 person

      1. Melinda Linyu Xiang

        Hi Jo! Yes, I believe the choice of which shall be taken into consideration and how we can be neutrual during the process might open huge discussion. And since you are in literatry studies and I am interested in learning linguistic stuff, I am looking forward to have further communication with you!

        Like

    3. craigdavidsmith

      Melinda, I also find your question regarding the “aspects to transform from recordings into data” to be intriguing. As Jo stated transcription and transclation is such a complicated process. As someone new to digital humanities, I am also curious what means are typically provided to researchers to directly access the source material. How much access should be allowed given?

      Like

  3. jo alvarado

    Reading Miriam Posner’s “Humanities Data” and Katie Rawson and Trevor Muñoz’s “Against Cleaning” together helped illuminate the various ways we can approach digital humanities that engages with the historical, cultural, and diverse contexts of datasets. As a literary scholar, it was beneficial to think about humanities data through the methods of archiving/archival work— that is, asking important questions such as: how do we collect objects? where do the objects come from? who is collecting them? who has access to these objects? how do the objects speak to each other? what story are they telling? Through this lens, it’s interesting to interrogate how we structure data that preserves rather than suppresses its diversity in order to articulate particular kind of story. In the same vein, I found Mimi Onuoha’s “Library of Missing Datasets” so fascinating because she understands that agency, collection, and ownership are sites of power. Data isn’t objective; it is intertwined with institutions of power and can help reproduce modes of exclusion, oppression, and violence.

    I’m interested in how these readings have reoriented your relationship to your own research, datasets, and data management. Do you have a newfound understanding of how you can use digital tools to explore your humanist inquiries?

    Like

    1. edenwetzel

      Responding to your question, I think I have a newfound appreciation for how humanists view datasets and their definitions of “data,” especially as someone who has spent almost all of my academic career dealing with the “traditional” view of data as a life scientist. Before completing this week’s readings, I had not really considered the idea that some humanists may define their nuanced understandings of different concepts/phenomenon as data itself, and that it can be very difficult to gather traditional tangible data for some humanities reasearch. I think I will take this acknowledgement with me throughout the rest of the class and into my public health work.

      Like

    2. rachel1232

      Engaging with these readings has let me to rethink my approach to research, particularly in the realm of Japanese contemporary novels. Traditionally, my research has been deeply textual and interpretative, with a focus on narrative structures, themes, and character development within the context of Japanese culture and history. However, the insights gleaned from these articles have introduced me to a novel conceptualization of data in humanities and the potential of digital tools.

      The notion that humanistic data extends beyond mere numbers or discrete categories, as discussed in the Posner article, resonates with my study of novels. I now perceive each narrative, character interaction, or thematic development as a unique data point that, while not quantifiable in the traditional sense, holds value in understanding the broader tapestry of Japanese contemporary literature. This perspective encourages me to look beyond the surface of the text and consider the metadata – the data about data, such as publication context, authorship, and reception – as integral elements of my research.

      Like

    3. vmariebarrios

      These readings have definitely opened a new avenue of inquiry for me. I am also used to more “traditional” datasets, and so thinking about how I can pursue data inquiry within my work as a literary scholar has been something of a roadblock. After reading these articles however, I feel I have a better understanding of how humanists can create data and models with different kinds of “objects”, and how that can be useful in my own work, particularly as a Chicanx literary scholar, since much of my work will probably be in the “hidden” data space.

      Like

    4. esperanzabey

      I totally agree with your point on Mimi Onuoha’s “Library of Missing Datasets.” This is such an issue within the field of archives especially in regards to archival erasure/silences, which are both examples of symbolic annihilation. This happens so much within institutional repositories, which often try to take an objective approach. However, there have been many calls for action – for archivists to state their political orientation to the material they’re stewarding. This would also include taking accountability for the silences and erasures within archives spaces. Great post!

      Like

  4. edenwetzel

    One thing I really took away from the readings was a conversation from the Posner article, which was the acknowledgement that humanists may define and view data in a way that differs from traditional quantitative scientific research. This was something I had not really considered previously as a life scientist who is used to dealing with datasets consisting of discrete values like blood pressure, or binomial data like whether someone is vaccinated.

    Something that I would like to understand more about is the idea of non-scalability theory, as discussed in the Against Cleaning article. I understand the idea that not all data can be or should be aggregated or cataloged, but I would like to have a more full understanding of how indexing helps to solve the problem of non-scalability.

    After reading the Quartz guide article, I wondered about the claim that manually edited data could potentially be bad data. In this case, would data “cleaning” as mentioned in the Against Cleaning article be considered data editing, therefore making the dataset less reliable?

    Like

    1. xinmenghu

      Hi! first of all, thank you for bringing up your perspective as a life scientist! Responding to your question, I think manually editing data doesn’t mean changing and rewriting data as individuals go along, but rather, it means making changes based on set rules and models. In this case, as long as the date cleaning rules has been developed and applies, the process will not make the dataset less reliable.

      Like

    2. aaronggoodman

      Hi Eden,

      I was also intrigued by the scalability framework referenced in the “Against Cleaning” article. Also going with Posner’s depiction of humanities “data” as anything but sterile, it is clear that a lot of literary and archival “data” mustn’t be treated as scalable items to be processed with one-size-fits-all methods. This makes sense but it’s a challenging pickle to be in when enticed with the potential of scale.

      I suppose, as a few of these readings have suggested, that success in aggregation or indexing requires the input of a trained social scientist or historian. I work in a music library and while the project I am assigned is one of great scale, each step in our digitization workflow depends on expertise. The processes for preservation and transfer to digital audio are streamlined but require careful configuration and tuning by the studio technician, whose “engineering” choices are informed by handbooks of historical EQ settings, for example. Even record companies like Columbia and RCA Victor issued discs with all kinds of textual and technical inconsistencies, so as a discographic database our project documents them all. In this way, the detailed and focused work requires nonscaled methods.

      In other ways this project is super scaled. Our output is free online streaming (digital/digitized) audio, which represents massive scale, as pointed out by Rawson and Munoz. Beyond this format, though, all kinds of metadata must be aggregated and indexed in order for our database to offer sorting or browsing by genre, culture, region, etc.

      This categorization/aggregation process is precisely where careful consideration of history and justice come into play, on the part of the curator. Leaning into Rawson and Munoz and Tsing’s scalability framework, I suppose effective curation is a job which demands constant consideration of both scalability and nonscability.

      Like

  5. rachel1232

    In Sanders’ Chapter 3, titled “Humanistic Data—Classifying Individuals & Visualizing Silences,” the emphasis is placed on the approach humanists adopt towards data selection and organization, stemming from a recognition that sources are not merely data but are imbued with nuanced categories, codes, and silences.
    Take away points:
    A pivotal element discussed is the significance of probing into the absent data, or ‘silences,’ and deliberating on the potential reasons behind these informational voids. Empty data is essential, which could tell a lot of understudied information.
    In addition, I learned that “prosopography,” defined as the collective study of individuals’ lives within a specific societal segment during a particular era, aims to shed light on the shared characteristics and dynamics of the group.
    Clarification question
    How does the transparency in documenting classification decisions aid in understanding and interpreting the ‘silences’ or gaps within the datasets, such as the brief tenures of certain governors?
    Discussion question
    Towards the end of the chapter, it touches upon the logistical aspects of data organization, suggesting a transition from spreadsheets to databases. I’m still not sure what a database is. What are a database’s characteristics, and why are databases recommended for their capacity to overcome the inherent limitations of spreadsheets, facilitating a more robust and scalable approach to data management?

    Like

    1. Melinda Linyu Xiang

      Hi Rachel! Just based on my limited knowledge on database, I believe the difference between spreedsheets and database is the processability for more complex research purpose. The database could contain more connections among variables, like multiple variance analysis (I guess), while spreedsheets only pertain two dimensional stuff. My understanding might not be not that accurate and I am open to more discussion!

      Like

  6. collinmoat
    1. I found the step of class creation especially interesting. I liked that Ashley Sanders addresses the inherent reduction of complexity in creating general categories and urges us to mitigate the loss of diversity in aggregation by creating classes that respect the original context. This could include using the vocabulary found in the sources and observing the contours and nuances of cultural institutions as well as other ways of trying to maintain an emic perspective on the information. Additionally, I enjoyed the section about data silences in the same chapter. Silences in the historical record are very relevant to the field I work in (Classics). Sometimes, these silences are overlooked. For example, canonical texts have such a gravity that they not only marginalize lesser read texts but also distract from the fact that, for example, 95–97% of Ancient Greek texts have disappeared. Other times, these silences are gestured at. For example, we have many different kinds of inscriptions from diverse localities and times that offer important information about people about whom we would have no record otherwise. We may learn the names, families, occupations, etc., but these gesture at the rest of the information we do not have about the lives of these people, not to mention the innumerable others about whom we know nothing. As Sanders notes, this both makes the data that we do have all the more precious and urges us to make space in our research for those who have been marginalized (via reference and inference). Meanwhile, we must be careful in handling the data in such a way that minimizes introducing new silences (cf. Fordham’s five ways).

    2. With new paywalls and proprietary rights, how are digital humanists now ensuring that all the stages of their datasets (from raw to the most refined) are open and accessible for future digital humanists to reevaluate and reconfigure for their own questions and insights?

    3. Sanders notes that silencing happens and is not necessarily a bad thing at times because it can “highlight…cases that meet certain criteria” (p. 28). But if these cases become canonical in understanding a certain dataset, they can create a cycle of silencing other cases or attributes. Can you think of some instances of this in your field? What are some steps we can take to break cycles of silencing the same cases and attributes?

    Like

  7. vmariebarrios

    Reading “Against Cleaning” and “Tidy Data” was very informative for me to know how digital humanities creates unique challenges and opportunities to analyze data sets, and to create data sets out of objects. I have an Economics background, but am now studying English, and these readings created a very interesting “combining of worlds” for me. One thing that strikes me is the way “cleaning data”, is something that is still being worked through in the humanities. Unlike other disciplines, where there are set rules on things such as outliers and statistical significance, digital humanities seems to me to create a much more open space for us to analyze objects in new ways beyond traditional data sets.

    I would like to understand how in the absence of a set way to “clean” or “create” datasets, how digital humanists are able to engage in the peer review process and engage with one another, and what the “rules” (for lack of a better term) are for data transparency.

    I’m curious to know how people in other disciplines found this way to think about and organize “objects” and “data” helpful? My brain is split between my Economics training and my Literary training, so I am very interested in hearing people’s perspectives from other fields!

    Like

  8. aaronggoodman

    Katie Rawson and Trevor Munoz’s “Against Cleaning” piece makes some interesting points building upon Anna Tsing’s “valuable theoretical framework through which to approach… preserving diversity within [a] large dataset” (283). Through their examinations of practical “scalable” and “nonscalable” methods of cataloging in the context of “cleaning” extensive library collections, the authors highlight reasons for which one would adopt scalable or nonscalable mindsets for various projects, or different parts of a project. Especially with social science data, the expertise with which data must be handled demands a nonscalable frame; data are unique and alive. The potential and power of computation, though, also beckon scalable interests. Rawson and Munoz’s portrayal of digitization projects as prime examples of library scalability is interesting; I suppose it is up to librarians and curators, and those “checking” them, to determine the balance of scale and nonscale within the workflows of their projects. Rawson and Munoz’s piece highlights their commitment to preserving diversity within data; I hope this is the norm.

    My clarification question: what are indices exactly, and can they be implemented in all kinds of databases? (Rawson & Munoz 289). I’ve used indices to improve performance with GIS layers in PostgreSQL databases, for both tabular data and geometries (B-tree and GIST); but I don’t really know what I’m creating… Rawson and Munoz reference the indices’ explicit ability to improve user access; does the configuration of an index require an expert in the data’s respective field, be it food history or biodiversity?

    My discussion question: Rawson and Munoz highlight the need to balance scalability and nonscalability with humanistic data. Humanistic “data” tends to demand the expertise and care of nonscalable methods. While this is certainly true– that digital humanities research, in particular, must balance scale and nonscale for accurate and thorough presentation–is this not true of all fields?

    Like

  9. craigdavidsmith

    1.In her lecture “Humanities Data: A Necessary Contradiction,” Miriam Posner makes a number of compelling arguments about the ways in which libraries can meaningfully assist “traditional humanists.” Her anecdote of a humanist who describes an elaborate visualization of information without data is mean to point out a fundamental different ways that humanists conceptualize information with respect to natural or social scientists. She furthers this conceptual difference noting how a humanist might be offended by the reduction of family photo album or a silent film series to a dataset and views his own subjective expertise is infinitely more valuable. Posner does not take an ideological stance against this stereotypical humanist point of view, rather she argues that the possibility of analyzing “something computationally” renders it an inevitability. I, however, would like to take a clear stance against this traditional humanist perspective. Although I come from the field of humanities, I have often been appalled by humanists disregard for facts and data and their own overestimation of their own “expertise.” Obviously photo albums and silent film series are more than just datasets but this does not mean that these datasets cannot be a powerful tool in their analysis. I for one have read much humanistic scholarship that would have been more informative and enlightening had it included datasets rather than solely the critic’s own subjective experience of the works.

    2.I found Katie Rawson and Trevor Muñoz’s article “Against Cleaning” to be quite informative but I was confused by their assertion that one should avoid creating normalized values of the various form of “Au Gratin Potatoes.” Might there be a better example of this point?

    3.Christopher Groskopf’s article “The Quartz Guide to Bad Data” discusses the many problems encountered with bad data and possible fixes. Which of these problems if most notable in your own particular field or experience?

    Like

    1. collinmoat

      Hi Craig! You make a good point about some humanists disregarding the analysis of some types of evidence as data. I think some people see this type of analysis as a threat to the “humanity” of the evidence since it often reduces it to something quantifiable, but this view casts the situation as a zero sum game, as if looking at the evidence computationally precludes more traditional analysis. As you say, there is a lot of scholarship that could benefit from a balance of both kinds of methodologies.

      Like

  10. esperanzabey

    Miriam Posner’s speech titled “Humanities Data: A Necessary Contradiction” was filled with quite a few takeaways in understanding how humanists think about data, how data mining is used by researchers, and the necessity to work against the intended use of data mining. As an Art History major during undergrad, her fifth point example on data modeling put everything into perspective in simplifying abstract research into categories of a data set. However, the most profound point was in the second to last paragraph, as she stated “taking datasets we’ve been given – which were not at all created for our purposes – and working against their grain or reinventing them to try and tease out the things we think are really interesting.” This point in particular reminds me of Sadiya Hartman’s book titled “Wayward Lives, Beautiful Experiments: Intimate Histories of Riotous Black Girls, Troublesome Women, and Queer Radicals.” Hartman’s use of silences /erasures in the archives of Black women navigating landscapes of impunity as data but also liberating their narratives by engaging in critical fabulation, is precisely what I envision when I think of Posner’s quote. It also brings me to think of the use of data modeling and digital mapping with projects like UC Berkeley’s Urban Displacement Project.

    Clarification Question:
    What is the process of making an API?

    Discussion Question:
    Can APIs be dangerous or extractive to non-Western cultures?

    Like

    1. collinmoat

      Hi Esperanza! Thank you so much for bringing up Saidiya Hartman! I think her tool of critical fabulation is a really interesting comparison to the visualization of silences in datasets. I wonder how these methodologies could weave into each other. For instance, maybe the visualization of silence could identify previously unseen gaps in the archive, and as an appendix to the data/visualization, critical fabulation could engage with these gaps and fill the silence with narrative.

      Like

    2. Jo Fobbs (they/she)

      Hi Esperanza! I love the connections you drew between the Posner quote and Hartman’s “critical fabulation”! Her framework came to mind for me as well when reading the Sanders article and looking at the Library of Missing Data installation, but I feel like the quote you pulled is most relevant to the practice of narrative reconstruction. It’s so refreshing to see similar concepts be employed outside the context of Black Studies, as I’ve been concerned that the framework wouldn’t be seen as legitimate in quantitative studies or fields that are typically more eurocentric. My understanding of critical fabulation mainly comes from “Venus in Two Acts”, as I’ve yet to read Wayward Lives, but your reference here makes me excited to read it very soon!

      Like

  11. Jo Fobbs (they/she)
    1. The Sanders reading and Library of Missing Data brought to mind what role certain frameworks — namely Sylvia Wynter’s “deciphering practice” and Saidiya Hartman’s “critical fabulation” — can play in addressing the absence of data. Wynter and Hartman, similar to Sanders, read for the concealed facts, silences and gaps in various texts, but rather than “abandon[ing] any hope of completeness” (6), they use their generative frameworks to fill such gaps in their primary sources. For example, Hartman studies the life of Venus, an enslaved Black girl who was killed along the Middle Passage. There is scant information on Venus other than the fact that she died, prompting Hartman’s attempt to re-present, rearrange and reconstruct a narrative of her life that may be understood as more imaginative than factual. Would such a framework be viable if we think of narratives as “datasets” with their own respective gaps?

    2. For what kind of projects is data cleaning absolutely necessary? As someone who has primarily used qualitative research methods, and welcomes messiness and incoherency in their data collection, I felt a bit out of my comfort zone in reading about “tidy” data. I’m struggling to figure out how it might be relevant to my own work, as it just sounds very cold and sanitized to me (no pun intended).

    3. How can digital humanities scholars intending to create social justice projects using quantitative data come to terms with the racist history of data collection and statistical analysis?

    Like

  12. masonmcclay

    Miriam Posner’s speech “Humanities Data: A Necessary Contradiction” foregrounded a fundamental principle about abstraction that in the sciences is often referred to as “levels of analysis”. This idea refers to the dilemma that any biological, psychological, or social system will be represented in multiple levels of a concept hierarchy, and what we can infer from our constructs of the system of interest will be constrained by the level in which we represent it. In other words, it is very easy to reduce a complex social phenomenon to numbers and lose all of the context that provides relevance. In psychology, this often manifests as a fetishization of neural/brain data when attempting to explain something that exists primarily in a more higher-order representation, like the experience of love, etc. In a similar way, Miriam Posner points out that humanities knowledge is represented in much higher-order levels of construal. When we represent complex socio-political events via data, what surrounding context is lost? What I find especially crucial about this is that the data (in most cases at least) has to be taken as just one level of analysis and therefore cannot represent the overarching structure of a phenomenon.

    Clarifying question:

    When digitizing an archive or piece of media, how do we decide what information, or missing information, gets interpolated?

    Discussion question:

    What absolutely necessary to represent before any data can be meaningfully interpreted?

    Like

  13. sparksdv

    Miriam Posner article on Humanities is great over few of the utility and validity of digital humanities. The reading helped illustrate different ways it would be used to engage with historical, cultural, and other variety of complex datasets. The example of how to model artistic periods, geographic movement and the convention of time was directly applicable to my research on ceramic production. Helping be theorize ways I might show changes within the time period. I am interested in knowing how we reposition our relationship with research, data management, and how connect these models to a wider audiences outside of academia

    Like

Leave a comment

Design a site like this with WordPress.com
Get started