讲座笔记-《数字人文》

Jul 17, 2020 | 讲座笔记

Text analysis as analysis

2020-07-16
James O’Sullivan
曾于英国谢菲尔德大学、宾夕法尼亚州立大学执教，曾兼任华盛顿州立大学兼职教授，现任科克大学数字艺术与人文方向副教授，兼任科克理工大学兼职讲师。主要研究方向为文化产品的数字转换技术，同时关注阅读和写作实践。同时担任New Binary 出版社和《数字文学研究》编辑，已出版合著三本，期刊和会议论文三十余篇

Text

ANY communication using words(novels,poems,newspaper magazine articles,letters, webpages,films,tweets,t-shirt slogans,bathroom graffiti…)

Text-analysis

Does not usually involve computer
simply analysis and interpretation of text
make sense of culture
saying,political leanings,ideas,communicate,patient
no one always
highly interpretive

Texts can communicative
Texts can be creative

Culture is predomiantly textual

What are we trying to do here?

come up with a question that certain texts might help us answer
Identify texs that we think can tell us something about our question
Establish the best way to examine those texts to get our answer

Why do this using computers?

Because quantitavive approaches to text analysis can lend new( not better) forms of empirical evidence to interpretations
Computers can help us to identify patterns,trends,anomalies.
Computers can help us to explore a lot of texts.

Text analysis is the analysis of texts.
We analyse texts because of their cultural value…
using Computers.
powerful tools.

Qunantitavive Analysis and Literary Studies” (Hoover 2008)

“Quantitative approaches to literature represent elements or characterisics of literary texts numerically,appllying the powerful,accurate,and widely accepted methods of mathematics to measurement,classification, and analysis”
Almost any item,feature,or characteristic of a text that can be reliably identified can be counted, and most of them have been counted, and most of them have been counted.
Decision about what to count can be obvious,problematic,or extremely difficult,and poor initial choices can lead to wasted effort and worthless results.

Visualise single Texts

xkcd.com/657

Choose features to represent Texts
Identify distinctive vocabulary
Find or organize works. Maps.Similarity.
Model literay forms or genres
Model social boundaries
…

Dylan’s Lyrics(Schmidtke 2013)

The myth

learn
the reality
a lot of work

How

Increased and more accessible computing power
Bigger(dirtier) data
Expanding repertoire of algorithms and techniques

Distant Reading - Moretti(2013)

Affordances

Quantitative evidence to support qualitative claims
Big data
Identification of unforeseen trends
Representation of findings for:
- interpretation
- Representation
  ….

Limitations

Parse not Read
Context-less approaches
Texts do not always obey laws
Professionalisation
Research driven by tools
Privileged knowledge
Scientification of Humanities

Some golden rules…

Question should dictate the technology
Something not expect: a disvovery or mistake
Use tools with strong (support / community)

文本分析工具：

voyant-tools.org
- voyant-tools.org/docs/#!guide/tools
ictclas.nlpir.org/nlpir
Topic Modelling , Stylometry

Sentiment Analysis
Part of Speech Tagging

Tweeter

trump… crazy
diplomacy

www.digitalstudies.org/articles/10.16995/dscn.288

Measuring Joycean Influences on Flann O’Brien
Authors: James O’Sullivan , Katarzyna Bazarnik, Maciej Eder, Jan Rybicki
Abstract
This paper examines the stylometric similarities between James Joyce and Flann O’Brien, demonstrating which works from the latter’s oeuvre are stylistically the most Joycean. We will outline the results of a series of quantitative enquiries focused specifically on Joyce and O’Brien, before offering a number of literary interpretations. It has long been argued that Brian O’Nolan, operating under the pseudonym of Flann O’Brien, is a disciple of James Joyce. This relationship remains a concern for scholars, and so our purpose here is to contribute some computational evidence to the discussion. We pinpoint those exact moments where O’Brien’s style is quantitatively similar to that of Joyce, using our results to re-engage existing arguments with renewed statistical precision.
Keywords: Stylometry, James Joyce, Flann O’Brien

2 Tools and Techeniques

2020-07-20
James O’Sullivan

computationalstylistics.github.io
mallet.cs.umass.edu
www.nltk.org
www.digitalhumanities.org/dhq/vol/11/1/000286/000286.html

Maciej Edr, Jan Rybicki, Mike Kestemont, David Hoover
Karina van Oskam alen
Joanna Byszuck

Topic Model

3 Open Knowledge and Writing

07.21.18:00~20:35

Open access, open science, open data

chnmuseum.cn/zl

projects.iq.harvard.edu/chinesecbdb

07.21.18:00~20:35
DH: Future and Emerging Technologies

数字人文