Following up on one my latest book reviews, I
dug some more about the theme of Stylometry. Several chapters in Edmonson’s and
Well’s book deal with this topics in terms of proving Shakespeare authorship of
his plays. Very interesting stuff to follow-up on.
It was recently in the news that computer
analysis had been used to determine that the play “Double Falsehood” by Lewis
Theobald, first published in 1728, is possibly the work of William Shakespeare
in conjunction with John Fletcher. Computer programs were used to analyze the
writings of all three men and the result is that some people think this is a
lost play by Shakespeare. It is still quite controversial. I am just offering
it up as an example of computers being used in conjunction with literary
analysis:
Personally, I think computer programs are only
as good as the people who write them, meaning that they may inadvertently
contain flaws or biases. An additional layer of human error enters the mix in
how the data is interpreted. Then there is the question, is there such a thing
as a fixed and rigid interpretation. I have been doing my own personal literary
analysis for many years and when I revisit something like a play by Shakespeare
I find that my views and interpretations have changed over the years. However,
I wouldn't totally discard Data Science. It produces interesting facts and
features, which might serve to reinforce our initial human reactions to what we
are reading.
Having said that, let’s delve some more into it.
Performing a textual analysis on a Shakespeare
text has nevertheless some interesting points, namely, our ability to improve
on the techniques we use in Data Science. Is there a worthier subject than
devising a machine learning algorithm that would enable us to pin-point what
makes Shakespeare? When I say "pin-point" I'm thinking in
mathematical terms.
Everyone will have a distinct opinion on what
makes Shakespeare the greatest playwriter of the English Language. My
(Shakespeare) Nirvana would be to have some kind of enlightenment coming from
the field of Computer Science, that would tell us that certain
"traits" are what differentiates Shakespeare from the rest of the
pack (e.g., Ben Jonson, and Thomas Middleton).
Shakespeare's flair is one of a kind, granted,
but is it possible to name instances where we can say for sure why Jonson and
Middleton did not capture people's imaginations the way Shakespeare did. Again,
I'm not talking about individual opinions. I have my own on what makes
Shakespeare. I'm more interested in identifying data (e.g.,
patterns) that would have the weight of science behind it. Might these
"patterns" be identified and supported through the use of word
selection and frequency?
I've just run a statistical analysis on the
"Much Ado About Nothing" play in order to identify all of the atomic
components in the text, and what came out was this:
Surprisingly (or not), the number of
prepositions is not very high, maybe due to the fact that in Elizabethan times
its use was not so widespread as it's today (e.g., the use of the pronoun
"its" is seldom used by Shakespeare). It'd be interesting just for
analysis sake to make a comparison between the works of Jonson, Marlowe and
Middleton, just to name three of the icons writing at around the same time
period.
Another analysis
I did was by using Google Books Ngram Viewer and selecting three words that
Shakespeare is said to have coined. I went to Shakespeare-online.com for this
and chose the first three words that appeared on the list - academe, accused
and addicted. I selected a start date of 1600 and this is what I came up with.
It is interesting that the first large spike for the word "accused"
occurs around the time if the English Civil War. I also reran the data with a
start date of earlier than 1600 and the word "accused" may actually
predate Shakespeare's writing! So one does tend to wonder where the data from
shakespeare-online.com came from:
It seems that the number of words Shakespeare
likely coined has been exaggerated for a couple of reasons. Not all that much
survives from Shakespeare's day and before, so he's one of the few places to
look for any words in usage at the time. Of the works available, Shakespeare's
work is far and away the most famous and a 'got to' source. According to one
article I read, the originators of the OED had a tendency to stop if they found
something in Shakespeare and attribute it to him as the first usage. Many words
attributed to him have been found in earlier works since, but often the
attribution to Shakespeare hasn't been changed. It's never made sense to me
that Shakespeare could have SO many new words in individual plays- he wasn't
writing cutting edge, pretentious plays- he was writing plays that average
people would go to and if every fifth word was 'new,' the plays would have been
practically unintelligible to their initial audience. Shakespeare certainly
invented some words and skewed the meaning of other words by using them in
intelligible ways that they hadn't been used in previously. I'm sure he came up
with even more new phrases which have become common knowledge and which could
have been understood on first hearing, but he didn't 'invent' nearly as many
words as some claim.
Nevertheless I
completely agree about the priority of the text. Even when I use computational
analyses, I find they should be coupled with close readings of the texts. The
promise I see with computer analysis is that it can point out large-scale
patterns that may not have been visible with close reading alone, particularly
when you have a large corpus that you're working with -- it would be difficult
to compare 1,000 early modern plays with only close reading (and the reason we
might want to look at 1,000 early modern plays is to better characterize early
modern literature and the individual texts therein). There is a lot of talk
recently among digital humanists about how computers can help us access what
Margaret Cohen calls the "great unread" of literature, which would
include texts that are left out of traditional canons and that we can't
feasibly close read because there are so many.
There is so much to gain from Shakespeare. Why
would we limit our gains by restricting the methods with which we can derive
meaning? Watching a performance, making a close reading, performing algorithmic
analysis--all these methods can work hand-in-hand.
Shakespeare reflects life and life is a
glorious muddle of comedy, tragedy, romance and problem plays! And let's not
forget history. Indeed, how can we even say that Shakespeare writes history
plays? They are not accurate enough to be used as a history source, but they
are wonderfully rich dramas.
I'm not sure whether a machine would be able to
write Shakespeare-like literature or not. But let's lower the bar. What about a
sonnet of average quality? Could a machine be able to write
"something" that we'd consider having some quality? (I'm not going
into the debate of what I mean by quality).
Let's do a little test.
Would you say the
following sonnet was written by an human or a machine?
(This is another form of a Turing Test.)
"Whose shade in
dreams doth wake the sleeping morn,
The daytime shadow of
my love betrayed
Lends hideous night to
dreaming’s faded form;
Were painted frowns to
gild mere false rebuff,
Then shouldst my heart
be patient as the sands,
For nature’s smile is
ornament enough
When thy gold lips
unloose their drooping bands.
As clouds occlude the
globe’s enshrouded fears,
Which can by no
astronomy be assail’d,
Thus thine appearance,
tears in atmospheres,
No fond perceptions,
nor no gaze unveils.
Disperse the clouds
which banish light from thee,
For no tears be true
until we truly see."
(Later on I'll post the provenance of the abovementioned
sonnet...)


