In this post, we delve into the most recent work of comedian Bill Burr, among others. Transcripts of his most recent comedy specials are provided by http://scrapsfromtheloft.com/
from nltk.corpus import PlaintextCorpusReader corpus_root = 'standup' standup = PlaintextCorpusReader(corpus_root, '.*') standup.fileids()
['burr-2010-let_it_go.txt', 'burr-2012-you_people_are_all_the_same.txt', 'burr-2014-im_sorry_you_feel_that_way.txt', 'burr-2017-walk_your_way_out.txt', 'jesl-2013-caligula.txt', 'jesl-2015-thoughts_and_prayers.txt', 'loui-2008-chewed_up.txt', 'loui-2011-live_at_the_bacon_theatre.txt', 'loui-2013-oh_my_god.txt', 'loui-2015-live_at_the_comedy_store.txt', 'loui-2017-2017.txt', 'mora-2015-off_the_hook.txt']
First, I’m interested in how many letters and characters in each special.
for filename in standup.fileids(): print(int(len(standup.words(filename))), filename)
10801 burr-2010-let_it_go.txt 17294 burr-2012-you_people_are_all_the_same.txt 16722 burr-2014-im_sorry_you_feel_that_way.txt 17547 burr-2017-walk_your_way_out.txt 7842 jesl-2013-caligula.txt 8871 jesl-2015-thoughts_and_prayers.txt 10444 loui-2008-chewed_up.txt 11417 loui-2011-live_at_the_bacon_theatre.txt 10131 loui-2013-oh_my_god.txt 12258 loui-2015-live_at_the_comedy_store.txt 12456 loui-2017-2017.txt 16743 mora-2015-off_the_hook.txt
average (a lexical diversity score). These are followed by the name of the special/file.
for fileid in standup.fileids(): num_chars = len(standup.raw(fileid)) num_words = len(standup.words(fileid)) num_sents = len(standup.sents(fileid)) num_vocab = len(set([w.lower() for w in standup.words(fileid)])) print(int(num_chars/num_words), int(num_words/num_sents), int(num_words/num_vocab), fileid)
3 14 8 burr-2010-let_it_go.txt 3 12 9 burr-2012-you_people_are_all_the_same.txt 3 13 8 burr-2014-im_sorry_you_feel_that_way.txt 3 12 9 burr-2017-walk_your_way_out.txt 3 14 7 jesl-2013-caligula.txt 3 12 7 jesl-2015-thoughts_and_prayers.txt 3 26 8 loui-2008-chewed_up.txt 3 13 8 loui-2011-live_at_the_bacon_theatre.txt 3 16 7 loui-2013-oh_my_god.txt 3 15 10 loui-2015-live_at_the_comedy_store.txt 3 13 9 loui-2017-2017.txt 4 29 7 mora-2015-off_the_hook.txt
Note Dylan Moran, the only non-American on the list, has the highest average word length, and highest average sentence length and among the highest lexical diversity scores.
Now we’ll chart the most used words in each standup special.
import nltk.stem from collections import Counter from nltk.stem import WordNetLemmatizer from nltk.corpus import stopwords import matplotlib.pyplot as plt
for file in standup.fileids(): # Convert the tokens into lowercase: lower_tokens lower_words = [t.lower() for t in standup.words(file)] # Retain alphabetic words: alpha_only alpha_words_only = [t for t in lower_words if t.isalpha()] # Remove all stop words: no_stops no_stops = [t for t in alpha_words_only if t not in english_stops] # Instantiate the WordNetLemmatizer word_net_lemmatizer = WordNetLemmatizer() # Lemmatize all tokens into a new list: lemmatized tokens_lemmatized = [word_net_lemmatizer.lemmatize(t) for t in no_stops] # Create the bag-of-words: bow word_bow = Counter(tokens_lemmatized) # Print the 10 most common tokens print(file) plt.barh(range(len(word_bow.most_common(20))),[val for val in word_bow.most_common(20)], align='center') plt.yticks(range(len(word_bow.most_common(20))), [val for val in word_bow.most_common(20)]) plt.show()
tokens = nltk.word_tokenize(standup.raw()) # tokenize the corpus text = nltk.Text(tokens) # turn text into a NLTK Text object text.dispersion_plot(["president", "war", "politics","woman", "woman", "bitch", "plague", "people"])
# Show every occurrence of a word using: text.concordance("") text.concordance("men")
Displaying 25 of 51 matches: here ! How many ? How many more great men are gon na get chopped in half before gging whores are the wife beaters for men . Yeah , they are . Except we don ’ t so many rich , famous , and powerful men act like absolute pigs ? ” Right ? An . That brings resentment amongst your men . You ’ ve got to lead them into batt e a man happy . The great thing about men is we ’ re fucking simple . We ’ re f e . No . No . You have to call them “ men who talk too much. ” Right ? But I he . One of those guys who believed that men just learn by doing things . You know and it becomes the difference between men and women really Because a man will l ’ s not a skill they have generally . Men have it , that ’ s just different , w t , we have different sexual skills . Men can fuck whatever , we don ’ t care . ople . When women go wild , they kill men and drown their kids in a tub , that re . I can ’ t lift my arms . And for men , sex just is such a constant thing . front of our faces all the time . To men it ’ s just an element of the univers how much we love it , we suck at it . Men are terrible at sex . It never even o and they ride . They go for a ride . Men don ’ t . We just climb on . [ grunti desire as we do . That ’ s how stupid men are . We think they ’ re just weird . . ” Another thing that proves how bad men are at sex is that after sex you ’ re nts to cuddle . It ’ s something that men love to make fun of women for . “ The re is no greater threat to women than men ? We ’ re the number-one threat to wo . Guys are– We love women a lot– all men do– And we just look at you . That ’ . Thank you . It ’ s really a flaw in men that we would all do that . If you ’ men ! Women ! ” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! men ! ” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! M ” And you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! Men ! ” you ’ re like , “ Men ! “ Men , men ! Men ! “ Men ! “ Men ! Men ! Men ! ” Anywa
Displaying 25 of 103 matches: nhaling all that coal dust. ” Dude , women are just constantly patting themselv party , and I quickly realized that women age a lot better towards the end , y now ? I don ’ t know . I ’ m sick of women trying… every girl I ever date ’ s a now ? Where are all those old-school women you can just take your day out on , st of us ? You ’ re never annoying ? Women , how many times have you thought ab kin ’ middle . Talking about hitting women , sweetheart , and I think you just what it is ? They never address how women argue , which I think is a core of a hing , man . Like , I never knew how women argued , but after 20 years of losin na hit you . Like that ’ s how many women I ’ ve pissed off in my lifetime . I oing down . This is how it is . Most women , they ’ re flailers . All right , u amn ! I don ’ t get it . What is it… women , do you think I ’ m calling you… I . I get it . There ’ s guys hitting women . They need to be stopped . We got t I got ta tell you , I ’ m envious of women , okay ? I ’ m not saying your probl there ’ s a busload of Scandinavian women waiting to fuck my brains out . “ So re is programmed to fuck 85 % of the women in this room . Right ? Yeah , we are here , he wants to fuck just as many women as a celebrity , right ? But he can never hit a woman . You can ’ t hit women . You honestly can not . You ever se out there telling people not to hit women , people still do it . What do you t the shit– I ’ m learning that about women . You just want to keep them calm . ll kiss up against a wall . I guess women like walls . I didn ’ t know that . re . All right ? I don ’ t know what women rub one out to , but I know it ain ’ alk to the car . It ’ s incredible . Women are screaming , people tearing at yo ave the best cars , we have the best women . Oktoberfest is the shit ! ” He ’ s r man a sandwich. ” I ’ m not saying women belong in the kitchen , barefoot and ng that . Okay ? I ’ m just saying , women , go in the kitchen . Just go in the
A concordance permits us to see words in context. For example, we saw that ‘women’
occurred in contexts such as “how ‘women’ argue” and “hit ‘women’ ,”. What other words
appear in a similar range of contexts?
you the people i them women me have that god myself yourself here day my and her oh do so
you i they it me people and that what right there shit fuck night men this dude so we god
The term common_contexts allows us to examine just the contexts that are shared by
two or more words, such as ‘women’ and…
the_are like_and a_i to_you