Stand-up Comedy Analysis

Screen Shot 2017-11-17 at 11.32.49 AM

In this post, we delve into the most recent work of comedian Bill Burr, among others. Transcripts of his most recent comedy specials are provided by

The files could easily have been scraped using Python’s BeautifulSoup library, but I had already copied and pasted them into .txt files months before.
Below, we load the folder, which contains the file, as a corpus.
from nltk.corpus import PlaintextCorpusReader
corpus_root = 'standup'
standup = PlaintextCorpusReader(corpus_root, '.*')

First, I’m interested in how many letters and characters in each special.

for filename in standup.fileids():
    print(int(len(standup.words(filename))), filename)
10801 burr-2010-let_it_go.txt
17294 burr-2012-you_people_are_all_the_same.txt
16722 burr-2014-im_sorry_you_feel_that_way.txt
17547 burr-2017-walk_your_way_out.txt
7842 jesl-2013-caligula.txt
8871 jesl-2015-thoughts_and_prayers.txt
10444 loui-2008-chewed_up.txt
11417 loui-2011-live_at_the_bacon_theatre.txt
10131 loui-2013-oh_my_god.txt
12258 loui-2015-live_at_the_comedy_store.txt
12456 loui-2017-2017.txt
16743 mora-2015-off_the_hook.txt
It looks like Bill Burr leads in the character length of specials. He had a 7,000 character jump from his 2010 special to his 2012-2017 specials. Anthony Jeselnik has the lowest character length specials.
The below program displays three statistics for each text: average word length, average sentence length, and the number of times each vocabulary item appears in the text on
average (a lexical diversity score). These are followed by the name of the special/file.
for fileid in standup.fileids():
    num_chars = len(standup.raw(fileid))
    num_words = len(standup.words(fileid))
    num_sents = len(standup.sents(fileid))
    num_vocab = len(set([w.lower() for w in standup.words(fileid)]))
    print(int(num_chars/num_words), int(num_words/num_sents), int(num_words/num_vocab), fileid)
3 14 8 burr-2010-let_it_go.txt
3 12 9 burr-2012-you_people_are_all_the_same.txt
3 13 8 burr-2014-im_sorry_you_feel_that_way.txt
3 12 9 burr-2017-walk_your_way_out.txt
3 14 7 jesl-2013-caligula.txt
3 12 7 jesl-2015-thoughts_and_prayers.txt
3 26 8 loui-2008-chewed_up.txt
3 13 8 loui-2011-live_at_the_bacon_theatre.txt
3 16 7 loui-2013-oh_my_god.txt
3 15 10 loui-2015-live_at_the_comedy_store.txt
3 13 9 loui-2017-2017.txt
4 29 7 mora-2015-off_the_hook.txt

Note Dylan Moran, the only non-American on the list, has the highest average word length, and highest average sentence length and among the highest lexical diversity scores.

Now we’ll chart the most used words in each standup special.

import nltk.stem
from collections import Counter
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import matplotlib.pyplot as plt
for file in standup.fileids():
    # Convert the tokens into lowercase: lower_tokens
    lower_words = [t.lower() for t in standup.words(file)]
    # Retain alphabetic words: alpha_only
    alpha_words_only = [t for t in lower_words if t.isalpha()]
    # Remove all stop words: no_stops
    no_stops = [t for t in alpha_words_only if t not in english_stops]
    # Instantiate the WordNetLemmatizer
    word_net_lemmatizer = WordNetLemmatizer()
    # Lemmatize all tokens into a new list: lemmatized
    tokens_lemmatized = [word_net_lemmatizer.lemmatize(t) for t in no_stops]
    # Create the bag-of-words: bow
    word_bow = Counter(tokens_lemmatized)
    # Print the 10 most common tokens
    plt.barh(range(len(word_bow.most_common(20))),[val[1] for val in word_bow.most_common(20)], align='center')
    plt.yticks(range(len(word_bow.most_common(20))), [val[0] for val in word_bow.most_common(20)])
let it go
you people
im sorry
 comedy store
off the hook
Since a comedian’s goal is to relate similar topics for comedic effect, we should expect ‘like’ to be a most used word, which is shown in the above charts. Also, ‘fucking’ is the go-to curse word for these comedians.
The lexical dispersion plot, shown below, denotes when a word is used within the corpus.
tokens = nltk.word_tokenize(standup.raw())  # tokenize the corpus
text = nltk.Text(tokens)        # turn text into a NLTK Text object
text.dispersion_plot(["president", "war", "politics","woman", "woman", "bitch", "plague", "people"])
# Show every occurrence of a word using: text.concordance("")
Displaying 25 of 51 matches:
here ! How many ? How many more great men are gon na get chopped in half before
gging whores are the wife beaters for men . Yeah , they are . Except we don ’ t
 so many rich , famous , and powerful men act like absolute pigs ? ” Right ? An
. That brings resentment amongst your men . You ’ ve got to lead them into batt
e a man happy . The great thing about men is we ’ re fucking simple . We ’ re f
e . No . No . You have to call them “ men who talk too much. ” Right ? But I he
. One of those guys who believed that men just learn by doing things . You know
and it becomes the difference between men and women really Because a man will l
’ s not a skill they have generally . Men have it , that ’ s just different , w
t , we have different sexual skills . Men can fuck whatever , we don ’ t care .
ople . When women go wild , they kill men and drown their kids in a tub , that 
re . I can ’ t lift my arms . And for men , sex just is such a constant thing .
 front of our faces all the time . To men it ’ s just an element of the univers
how much we love it , we suck at it . Men are terrible at sex . It never even o
 and they ride . They go for a ride . Men don ’ t . We just climb on . [ grunti
desire as we do . That ’ s how stupid men are . We think they ’ re just weird .
. ” Another thing that proves how bad men are at sex is that after sex you ’ re
nts to cuddle . It ’ s something that men love to make fun of women for . “ The
re is no greater threat to women than men ? We ’ re the number-one threat to wo
 . Guys are– We love women a lot– all men do– And we just look at you . That ’ 
. Thank you . It ’ s really a flaw in men that we would all do that . If you ’ 
men ! Women ! ” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men !
men ! ” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! M
” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! Men ! ”
you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! Men ! ” Anywa
Displaying 25 of 103 matches:
nhaling all that coal dust. ” Dude , women are just constantly patting themselv
 party , and I quickly realized that women age a lot better towards the end , y
now ? I don ’ t know . I ’ m sick of women trying… every girl I ever date ’ s a
now ? Where are all those old-school women you can just take your day out on , 
st of us ? You ’ re never annoying ? Women , how many times have you thought ab
kin ’ middle . Talking about hitting women , sweetheart , and I think you just 
 what it is ? They never address how women argue , which I think is a core of a
hing , man . Like , I never knew how women argued , but after 20 years of losin
 na hit you . Like that ’ s how many women I ’ ve pissed off in my lifetime . I
oing down . This is how it is . Most women , they ’ re flailers . All right , u
amn ! I don ’ t get it . What is it… women , do you think I ’ m calling you… I 
 . I get it . There ’ s guys hitting women . They need to be stopped . We got t
I got ta tell you , I ’ m envious of women , okay ? I ’ m not saying your probl
 there ’ s a busload of Scandinavian women waiting to fuck my brains out . “ So
re is programmed to fuck 85 % of the women in this room . Right ? Yeah , we are
here , he wants to fuck just as many women as a celebrity , right ? But he can 
 never hit a woman . You can ’ t hit women . You honestly can not . You ever se
 out there telling people not to hit women , people still do it . What do you t
 the shit– I ’ m learning that about women . You just want to keep them calm . 
 ll kiss up against a wall . I guess women like walls . I didn ’ t know that . 
re . All right ? I don ’ t know what women rub one out to , but I know it ain ’
alk to the car . It ’ s incredible . Women are screaming , people tearing at yo
ave the best cars , we have the best women . Oktoberfest is the shit ! ” He ’ s
r man a sandwich. ” I ’ m not saying women belong in the kitchen , barefoot and
ng that . Okay ? I ’ m just saying , women , go in the kitchen . Just go in the

A concordance permits us to see words in context. For example, we saw that ‘women’
occurred in contexts such as “how ‘women’ argue” and “hit ‘women’ ,”. What other words
appear in a similar range of contexts?

you the people i them women me have that god myself yourself here day
my and her oh do so
you i they it me people and that what right there shit fuck night men
this dude so we god

The term common_contexts allows us to examine just the contexts that are shared by
two or more words, such as ‘women’ and…

text.common_contexts(["women", "fuck"])
the_are like_and a_i to_you


Finally, we look at the most popular bi-grams for Bill Burr’s ‘Let it go’ and Anthony Jeselnik’s ”Thoughts and Prayers” specials, respectively. These bigrams accurately depict the subject matter and manner of speech both performers exercise.






Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s