Text mining for the humanities?

What could we need to know about interpreting and understanding the meaning of a work of literature that could be facilitated by digital text mining? I went to part of a dissertation proposal defense today (Bei's). Someone on her committee pointed out that literary scholars use digital tools for pre-critical work, finding the books they wish to criticize. Humanities scholars would only benefit if these text mining strategies used for classification had some way of giving them access to surprising or new meanings.

But what if these tools were used for post-critical work, to see if a book's or a group of books' critical reception had some basis in the frequency of word occurrences in the text? So say, for instance, librarians in the late 19th century deemed some books "realistic" and "true to life" and other books "improbable" or "false." Would there be anything in a text analysis to back up this kind of distinction? Particularly if the two sets of books were both about orphans... (see the post comparing books by Yonge and Finley)

It seems to me that text mining might be useful for criticizing cultural distinctions made between different sorts of books, such as books designated high or low culture.

If these tools were accessible... Bei said they weren't practical yet. But I still wonder why they would be useful. It seems to me children's literature scholarship is the most likely venue, in part because we're always concerned with reading level (and so need to think about running comprehensibility analyses on texts) and because we're concerned with "good" or "bad" books for kids. We want to distinguish quality from non-quality, in order to select which books get handed to kids. Even more importantly for critical literary scholarship, we have to ask ourselves careful questions about what we mean by those distinctions, and whether our assessments change over the course of history.

Popular posts from this blog

What Storytelling Is (Not)

Data Storytelling for Librarians, Augusta Baker Series 2023

ALA and the Data Storytelling Toolkit for Librarians