Text analysis as analysis

James O’Sullivan
曾于英国谢菲尔德大学、宾夕法尼亚州立大学执教,曾兼任华盛顿州立大学兼职教授,现任科克大学数字艺术与人文方向副教授,兼任科克理工大学兼职讲师。主要研究方向为文化产品的数字转换技术,同时关注阅读和写作实践。同时担任New Binary 出版社和《数字文学研究》编辑,已出版合著三本,期刊和会议论文三十余篇


  • ANY communication using words(novels,poems,newspaper magazine articles,letters, webpages,films,tweets,t-shirt slogans,bathroom graffiti…)


  • Does not usually involve computer
  • simply analysis and interpretation of text
  • make sense of culture
  • saying,political leanings,ideas,communicate,patient
  • no one always
  • highly interpretive

Texts can communicative
Texts can be creative

Culture is predomiantly textual

What are we trying to do here?

  • come up with a question that certain texts might help us answer
  • Identify texs that we think can tell us something about our question
  • Establish the best way to examine those texts to get our answer

Why do this using computers?

  • Because quantitavive approaches to text analysis can lend new( not better) forms of empirical evidence to interpretations
  • Computers can help us to identify patterns,trends,anomalies.
  • Computers can help us to explore a lot of texts.

Text analysis is the analysis of texts.
We analyse texts because of their cultural value…
using Computers.
powerful tools.

Qunantitavive Analysis and Literary Studies” (Hoover 2008)

  • “Quantitative approaches to literature represent elements or characterisics of literary texts numerically,appllying the powerful,accurate,and widely accepted methods of mathematics to measurement,classification, and analysis”
  • Almost any item,feature,or characteristic of a text that can be reliably identified can be counted, and most of them have been counted, and most of them have been counted.
  • Decision about what to count can be obvious,problematic,or extremely difficult,and poor initial choices can lead to wasted effort and worthless results.
  1. Visualise single Texts
  • xkcd.com/657
  1. Choose features to represent Texts
  2. Identify distinctive vocabulary
  3. Find or organize works. Maps.Similarity.
  4. Model literay forms or genres
  5. Model social boundaries

Dylan’s Lyrics(Schmidtke 2013)

The myth

  • learn
    the reality
  • a lot of work


  • Increased and more accessible computing power
  • Bigger(dirtier) data
  • Expanding repertoire of algorithms and techniques

Distant Reading - Moretti(2013)


  • Quantitative evidence to support qualitative claims
  • Big data
  • Identification of unforeseen trends
  • Representation of findings for:
    • interpretation
    • Representation


  • Parse not Read
  • Context-less approaches
  • Texts do not always obey laws
  • Professionalisation
  • Research driven by tools
  • Privileged knowledge
  • Scientification of Humanities

Some golden rules…

  • Question should dictate the technology
  • Something not expect: a disvovery or mistake
  • Use tools with strong (support / community)


  • voyant-tools.org
    • voyant-tools.org/docs/#!guide/tools
  • ictclas.nlpir.org/nlpir
  • Topic Modelling , Stylometry

Sentiment Analysis
Part of Speech Tagging


  • trump… crazy
  • diplomacy


  • Measuring Joycean Influences on Flann O’Brien
  • Authors: James O’Sullivan , Katarzyna Bazarnik, Maciej Eder, Jan Rybicki
  • Abstract
  • This paper examines the stylometric similarities between James Joyce and Flann O’Brien, demonstrating which works from the latter’s oeuvre are stylistically the most Joycean. We will outline the results of a series of quantitative enquiries focused specifically on Joyce and O’Brien, before offering a number of literary interpretations. It has long been argued that Brian O’Nolan, operating under the pseudonym of Flann O’Brien, is a disciple of James Joyce. This relationship remains a concern for scholars, and so our purpose here is to contribute some computational evidence to the discussion. We pinpoint those exact moments where O’Brien’s style is quantitatively similar to that of Joyce, using our results to re-engage existing arguments with renewed statistical precision.
  • Keywords: Stylometry, James Joyce, Flann O’Brien

2 Tools and Techeniques

James O’Sullivan


Maciej Edr, Jan Rybicki, Mike Kestemont, David Hoover
Karina van Oskam alen
Joanna Byszuck

Topic Model

3 Open Knowledge and Writing


Open access, open science, open data



DH: Future and Emerging Technologies