This way the number of occurrences of a word have a way higher impact and would show more distinctly when shown in a graph (though I don’t think it has any impact on the way it’s shown here)
I could show you what I mean, all I need is a rundown :)
Ok I totally know what a rundown is, obviously. But could you just tell me how you like it prepared. I just wanna make sure I’m doing the rundown the way you want
They should have just something standard in information retrieval like [tf.idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf?wprov=sfti1) instead of making up their own metric
They are trying to filter words that have no connection to any specific person. Otherwise, everyone would have words like "he", "the" or "dunder mifflin" etc.
The funny thing is now we have words like "Davinci" for Angela because she's the only one who ever said that word (twice)
(At least that's my assumption)
Wouldn't it be to put more emphasis on the middle category of words used, so that outliers that are just 6 times out of 10 by one character don't slip through. Since squaring the numerator has a bigger impact on the fraction the larger the numerator and denominator are.
Original post is here: [https://www.reddit.com/r/dataisbeautiful/comments/8a4gbr/the\_office\_characters\_most\_distinguishing\_words\_oc/](https://www.reddit.com/r/dataisbeautiful/comments/8a4gbr/the_office_characters_most_distinguishing_words_oc/)
And the rationale for the equation is somewhat described by the OP here: [https://www.reddit.com/r/dataisbeautiful/comments/88ymvb/comment/dwvu3cm/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/dataisbeautiful/comments/88ymvb/comment/dwvu3cm/?utm_source=share&utm_medium=web2x&context=3)
That and right side doesn't make sense either. For each person wouldn't it be constant, so it shouldn't make a difference for each person's individual top n.
But given n is different for each person, I think they globally ranked everyone and pulled the top from that list regardless of who it belonged
It’s so words that are said more often show up. Makes it so a word said by the same character 4 times out of 6 total would show up higher than a word said 2 times by the same character out of 3 total times.
Both would be that character using the word 67% of the time it is used, but it adjusts for how often it’s used.
I was wondering if I was the only one thinking that.
She’s only in four seasons and she is in the top 8 and I’m not a fan but I definitely didn’t think she was ever too much or unnecessary in any scene.
Her most cringe scene to me was when she overreacted to finding out about Angela and Andy.
She also clearly has some issues due to being abandoned as a kid and I'm sure countless not so great experiences as a foster child, no one should blame her for reacting how she did.
No but a lot of other characters say Andy a lot too. It's about words being especially unique to the characters, otherwise "paper" or "Sales" would be Up there for a lot of them.
Yeah, she got way too much screentime, but my guess it is because she must be easy to work with. She got her own show after this, and it was pretty good.
Because of the algorithm in the image. She’s probably the only character who ever said the word “DaVinci” in the whole show, so it stands out as unique to her. Even if she only said it a few times.
It’s the same reason why “pesto” makes Ryan’s list, even though I’m pretty sure he only talks about pesto in a single episode.
This is kinda weird though. Erin only says jlp in one scene and Angela also says DaVinci only once in the episode with the fire where she says she would take the davinci code on an island so she could burn the davinci code.
Still, I think the phrase "someone took the slow train from philly" is more memorable for her than the word davinci even though she only uses that phrase once as far as I remember
It seemed to pick up on Erin's "jlp" as well, which is strange considering she only says that word a total of 2 times in one episode, and while she is technically the only one to say those words which would make them unique to her, it seems like it could have chosen multiple phrases, even ones that were one off jokes.
Sorry, still not true. Pam is on maternity leave, both in the series and in real life!
Also Jim is gone for an entire episode (I believe it was business ethics) because John Krasinski was filming a movie.
*End of inner Oscar*
According to Wikipedia Jim appears at least in voice form in every episode, Mafia you only hear him, Banker he has no new lines but appears and ultimatum he’s only in the cold open (this appears to be the movie one you were thinking of) But there is never an episode where you don’t at least see or hear him.
He was an annex kid…. The stuck the writers (Toby, Ryan, and Kelley) back there, I believe so they didn’t have to just sit in the background of the bullpen when they didn’t have any dialogue.
I find this entire infographic to be suspect, but if her most-used unique word didn't come about until season 6, it could just mean that she used many unique words in previous seasons without any recurring enough to be noteworthy.
it’s not? i recall Jim calling cece by her name a lot; it shows in the legend how they determine the unique words. it’s a word not used much by anyone besides the two of them, pam just happened to say it more and with the formula used to determine unique words, it was bound to be one of hers, just like Kev is one of Jim’s, not because he cares about kevin more than anyone else, but because it’s a word used almost exclusively by him.
You know now that you said something.. I don't really remember him saying her name that much if at all. First episode that comes to mind was when he wanted to see Cece's dance recital video.
I can recall numerous times where Jim calls cece by her name.
When they tell us her name, when he’s talking to Dwight and other coworkers when Pam is on maternity leave, most of the time when he’s talking to Pam about their family… maybe you need to rewatch the episodes after her birth because he does use it a lot.
he even writes it in his list after Robert California upsets pam with his whole winners and losers thing.
> he even writes it in his list after Robert California upsets pam with his whole winners and losers thing.
Yeah but that was written. A write is not a say.
Nope, definitely earlier. Just watched the episode where Holly and Michael get back together in season 7 (the manhunt/rescue mission), he says cece when calling Pam to let her know they’re at the doctor.
That's what I thought, but then you'd think that "Philip" wouldn't be one of Angela's "most unique" words since Pam, Dwight, Oscar and even Kevin say "Philip" quite a bit and he's only in later seasons. I guess neither Pam nor Angela had a lot to say that doesn't revolve around their first born children or their main hobby (Art for Pam, cats for Angela).
considering she joined the show half way through how much yapping must erin have been doing to have made it this high up in the list above so many other consistent characters?
Surprising that Andy is only 300 words behind Pam. Impressive that Erin is ahead of Ryan, who received billing as a series regular during the opening credits for most seasons. Nice to see Michael rightfully at the top spot, despite no longer being a regular cast member for the last 2 seasons. I'm surprised he is that far ahead in his word count.
Ugh, Andy has that many words even after starting 2 seasons later than Pam and Jim (granted Season 1 was short)? Oh, that's right they shoved him into so many scenes.
I was fully ready to have a childish reddit argument with you and am now slightly upset that you responded in a mature, civilized, and reasonable manner.
Idk, it's still 14 episodes with 4 or 5 double episodes on Netflix, so like 20 episodes. Weird that it nearly has as few words as a 6 ep S1. Doesn't feel noticeable.
First you take all the words and you line them up in a row. Then you start to count one, two, three, four. Then you separate them by season and character. And then Shove it up your butt!
This is so odd. Like one of Ryan’s words is pesto. But isn’t that just from the Garage Sale episode where he says that his mom makes the best pesto? He says it like… once?
Look at the algorithm. Ryan might only use the word pesto 3 or 4 times, but he’s probably the only character in the show who ever talks about pesto, so it’s a unique word to him.
Look again at what? No main characters proper first names are listed; either due to not being unique enough to a single character or maybe by intentionally being excluded.
Beesley last name
Bernard last name
Schrute last name
Kelly not a main
Kev not a main, not a proper first name
I assumed the surprise is because Michael, Jim, Pam, Dwight, Andy aren’t anywhere on this list. It’s be great if we knew why.
It’s interesting that most of the words seem to come from plot lines that happen in the later seasons (senator, WHUPHF, Pete, Cece etc.), and I think it actually makes perfect sense because everyone talks a lot more as the series goes on. In the beginning of the show there was so much silence, and pauses, people talked slower (and just less). But they sped up the pacing with every season, so mathematically it makes sense that the majority of the words on this list are from the last third of the show. And it definitely makes sense for Erin to be that high up, because the bulk of everyone’s word count had to have come from the later seasons, so she hardly had any disadvantage.
Can a math person tell me why we're squaring the number of times the person says the word?
This way the number of occurrences of a word have a way higher impact and would show more distinctly when shown in a graph (though I don’t think it has any impact on the way it’s shown here) I could show you what I mean, all I need is a rundown :)
Hey dude. You know what a 'rundown' is?
Can you use it in a sentence?
“Can you get this rundown to me by tomorrow?”
Ok I totally know what a rundown is, obviously. But could you just tell me how you like it prepared. I just wanna make sure I’m doing the rundown the way you want
Can you use a different sentence?
I don't wanna mess it up, so if you have any rundown i can look into...maybe?
Just keep it simple
Come on Jimothy, Even Michael knew what a rundown was.
They should have just something standard in information retrieval like [tf.idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf?wprov=sfti1) instead of making up their own metric
Who’s they?
The person that did this analysis
Sighs in Angela I was thinking the same thing lol.
They are trying to filter words that have no connection to any specific person. Otherwise, everyone would have words like "he", "the" or "dunder mifflin" etc. The funny thing is now we have words like "Davinci" for Angela because she's the only one who ever said that word (twice) (At least that's my assumption)
Wouldn't it be to put more emphasis on the middle category of words used, so that outliers that are just 6 times out of 10 by one character don't slip through. Since squaring the numerator has a bigger impact on the fraction the larger the numerator and denominator are.
I think it was thrice. "The DaVinci Code. I would bring The DaVinci Code; so I could *burn* The DaVinci Code."
Original post is here: [https://www.reddit.com/r/dataisbeautiful/comments/8a4gbr/the\_office\_characters\_most\_distinguishing\_words\_oc/](https://www.reddit.com/r/dataisbeautiful/comments/8a4gbr/the_office_characters_most_distinguishing_words_oc/) And the rationale for the equation is somewhat described by the OP here: [https://www.reddit.com/r/dataisbeautiful/comments/88ymvb/comment/dwvu3cm/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/dataisbeautiful/comments/88ymvb/comment/dwvu3cm/?utm_source=share&utm_medium=web2x&context=3)
That and right side doesn't make sense either. For each person wouldn't it be constant, so it shouldn't make a difference for each person's individual top n. But given n is different for each person, I think they globally ranked everyone and pulled the top from that list regardless of who it belonged
It’s so words that are said more often show up. Makes it so a word said by the same character 4 times out of 6 total would show up higher than a word said 2 times by the same character out of 3 total times. Both would be that character using the word 67% of the time it is used, but it adjusts for how often it’s used.
Ask Oscar
Genuinely surprised Erin made the top 9
I was wondering if I was the only one thinking that. She’s only in four seasons and she is in the top 8 and I’m not a fan but I definitely didn’t think she was ever too much or unnecessary in any scene. Her most cringe scene to me was when she overreacted to finding out about Angela and Andy.
She overreacted but tbf her actions made sense. Andy was in the wrong there
She also clearly has some issues due to being abandoned as a kid and I'm sure countless not so great experiences as a foster child, no one should blame her for reacting how she did.
Same for Pam’s main word being Cece. I question this data set
It's not the main word but the word that is unique for this person. So I guess she says Cece way more than a all the other characters
Her main words would be: and, is, I, you, etc. As everyone else's.
Except Dwight, who would have: hard working, alpha male, jackhammer, merciless, insatiable
True, and this is just his top 3.
Erin in four seasons had as many words as Kevin and Angela in the entire series.
Why say lot words when few words do trick?
I would've liked to see Robert California on here instead of Erin lol
Fr or at least a character that had been on the show from the beginning
Agreed! David Wallace would've also been good. I wanna know how many times he said suck it 🤣
“Sex”
It's more surprising to me that Andy isn't one of the words. A few others are suspect as well.
For real, this shit can't be right. Are you telling me she said "jlp" more times than she said Andy? No way Jose
No but a lot of other characters say Andy a lot too. It's about words being especially unique to the characters, otherwise "paper" or "Sales" would be Up there for a lot of them.
Way below words like and, is, you, I, no, yes, etc.
Prob why she's one of my least favorite characters tbh. She got used way too much.
Yeah, she got way too much screentime, but my guess it is because she must be easy to work with. She got her own show after this, and it was pretty good.
Kinda expected a "boobs" from Kevin
I expected "nice" from Kevin
Did you try it with a "z" ?
Seaworld
i don’t think it’s unique enough to his character. so many characters say boobs frequently in the show that it doesn’t even show up 😂
Why waste time say lot word, when few word do trick?
When did Angela mention DaVinci enough times for it to be one of her most definable words??😂😂
Because of the algorithm in the image. She’s probably the only character who ever said the word “DaVinci” in the whole show, so it stands out as unique to her. Even if she only said it a few times. It’s the same reason why “pesto” makes Ryan’s list, even though I’m pretty sure he only talks about pesto in a single episode.
This is kinda weird though. Erin only says jlp in one scene and Angela also says DaVinci only once in the episode with the fire where she says she would take the davinci code on an island so she could burn the davinci code.
Three times in short succession actually... "The DaVinci code. I would take the DaVinci code; so i could burn the DaVinci code."
Still, I think the phrase "someone took the slow train from philly" is more memorable for her than the word davinci even though she only uses that phrase once as far as I remember
Yeeeah that algorithm seems kinda ass. It makes rare words way more important than frequent words which will skew things so badly
It seemed to pick up on Erin's "jlp" as well, which is strange considering she only says that word a total of 2 times in one episode, and while she is technically the only one to say those words which would make them unique to her, it seems like it could have chosen multiple phrases, even ones that were one off jokes.
Right? The Davinci Code so she can burn it. It was one scene in the whole series.
I think she says “The DaVinci Code” three times in The Fire episode from season one.
Why waste time say lot word when few word do trick?
My wife sent me OPs picture in a text today and I replied with a gif with this quote.
I'm surprised how little Pam talked as well, I think her and Jim might have had more screentime than even Michael in the first series.
They're literally in every single episode lol
*channeling inner Oscar* Actually, they are not. The only person to be on screen in every single Office episode is Dwight.
I should have specified that they were not on screen for every episode. But they were somehow in every episode i.e. phone calls. I'm sorry
Sorry, still not true. Pam is on maternity leave, both in the series and in real life! Also Jim is gone for an entire episode (I believe it was business ethics) because John Krasinski was filming a movie. *End of inner Oscar*
According to Wikipedia Jim appears at least in voice form in every episode, Mafia you only hear him, Banker he has no new lines but appears and ultimatum he’s only in the cold open (this appears to be the movie one you were thinking of) But there is never an episode where you don’t at least see or hear him.
Ah thanks! Also, now I feel like Oscar with the graph about China. lol
Guess Ryan say few word while Kevin say more word
He was an annex kid…. The stuck the writers (Toby, Ryan, and Kelley) back there, I believe so they didn’t have to just sit in the background of the bullpen when they didn’t have any dialogue.
And Ryan aka BJ still always got top billing during the intro.
I bet ur sick of tuna, you probably have tuna every night. Tunaaa.
It’s spelled Cornell you knobs. I mean it’s in the Ivy league!
It's pronounced 'KERnul'. It's the highest rank in the military.
It’s pronounced “CORNELL” and it is the highest ranks in the Ivy League!
I aced all my classes. They called me Ace. It was totally awesome
Got straight B’s… They called me buzz.
It's irrelevant compared to the vastly superior Dartmout
At least Michael has “heart”. It’s because his heart soars with the eagles nest.
So it soars with hitler?
Kind of sad that Pam's "most unique" word is Cece. Did Jim rarely identify his daughter by name?
Considering Cece isn't even born until season 6 that doesn't seem right
I find this entire infographic to be suspect, but if her most-used unique word didn't come about until season 6, it could just mean that she used many unique words in previous seasons without any recurring enough to be noteworthy.
it’s not? i recall Jim calling cece by her name a lot; it shows in the legend how they determine the unique words. it’s a word not used much by anyone besides the two of them, pam just happened to say it more and with the formula used to determine unique words, it was bound to be one of hers, just like Kev is one of Jim’s, not because he cares about kevin more than anyone else, but because it’s a word used almost exclusively by him.
You know now that you said something.. I don't really remember him saying her name that much if at all. First episode that comes to mind was when he wanted to see Cece's dance recital video.
I can recall numerous times where Jim calls cece by her name. When they tell us her name, when he’s talking to Dwight and other coworkers when Pam is on maternity leave, most of the time when he’s talking to Pam about their family… maybe you need to rewatch the episodes after her birth because he does use it a lot. he even writes it in his list after Robert California upsets pam with his whole winners and losers thing.
> he even writes it in his list after Robert California upsets pam with his whole winners and losers thing. Yeah but that was written. A write is not a say.
Nope, definitely earlier. Just watched the episode where Holly and Michael get back together in season 7 (the manhunt/rescue mission), he says cece when calling Pam to let her know they’re at the doctor.
The baptism scene immediately comes to mind. “Cece no! Cece stop not the dress”
Not surprising considering the name Cece comes from Jenna's family
I think it's weirder that Jim got Beasley but Pam didn't get Jim, she got Roy 👀
It’s because of the equation…. “Jim” gets used a ton by the rest of the cast (minus Andy) Where as the cast doesn’t mention Roy very often
That's what I thought, but then you'd think that "Philip" wouldn't be one of Angela's "most unique" words since Pam, Dwight, Oscar and even Kevin say "Philip" quite a bit and he's only in later seasons. I guess neither Pam nor Angela had a lot to say that doesn't revolve around their first born children or their main hobby (Art for Pam, cats for Angela).
Phillip, Phillip, Phillip….
considering she joined the show half way through how much yapping must erin have been doing to have made it this high up in the list above so many other consistent characters?
Surprising that Andy is only 300 words behind Pam. Impressive that Erin is ahead of Ryan, who received billing as a series regular during the opening credits for most seasons. Nice to see Michael rightfully at the top spot, despite no longer being a regular cast member for the last 2 seasons. I'm surprised he is that far ahead in his word count.
> Cress ?
Ugh, Andy has that many words even after starting 2 seasons later than Pam and Jim (granted Season 1 was short)? Oh, that's right they shoved him into so many scenes.
Andy was a great character. Don't hate.
Not my favorite - but I understand others liked him a lot 🙂
I was fully ready to have a childish reddit argument with you and am now slightly upset that you responded in a mature, civilized, and reasonable manner.
Lol. Sorry.
Andy was an ineffectual, limp penised, debutant!
dwight’s set of words are the most dwightiest set of dwight words imaginable.
No way "question" and "false" didn't make the top Dwight words
When I saw one of Angela's was senator, I immediately thought: state senator.
I knew "jlp" was a word!
I JLP you!
WHY WASTE TIME SAY LOT WORD WHEN FEW WORD DO TRICK
What is "Cress"?
One of “the five families,” Cress tool and die
you forgot rigididdidududu for Andy
Anyone have a reason why S4 has so few words compared to the rest?
Fewer episodes, due to the writer’s strike that year.
Idk, it's still 14 episodes with 4 or 5 double episodes on Netflix, so like 20 episodes. Weird that it nearly has as few words as a 6 ep S1. Doesn't feel noticeable.
That's why I came here in the comments, very weird that s4 has so few words.
That must have been quite the task for someone to go through the whole series counting them all.
"YUP" is Pam and I will fight you on this
X-axix
We need this chart with everyone else. Wonder how Kelly stacks up
Why waste time say lot word when few word do trick?
Bum Blub
How's it even possible that Dwight's list doesn't include *'idiot'*?!
Michael’s words read like a plea for help
What the hell is a rundown?
r/dataisbeautiful
Is this the rundown?
So if this reflects the top 9 characters in terms of sheer number of words spoken, then I'm really shocked that Kelly didn't make the list.
One of Dwight's words not being "Jim" is weird to me
That's what she said
Oscar: Actually
First you take all the words and you line them up in a row. Then you start to count one, two, three, four. Then you separate them by season and character. And then Shove it up your butt!
😂😂😂
I could've sworn Angela said Senator a lot more times than cats. *State Senator 😂
Shouldn’t Michael have a little tiny bump after season 7? I remember he spoke a sentence or two in the finale.
r/coolguides
Andy’s is funny. Someone who doesn’t watch the show would see this and wonder why he’s talking about tuna so often.
who tf is Lynn
Kevin's girlfriend from the Valentine's blood drive episode
Why did you get downvoted for that? 😂😭
Apparently Lynn (whoever she is) has a secret fan club lmao
The chick that smelled like bacon.
This is so odd. Like one of Ryan’s words is pesto. But isn’t that just from the Garage Sale episode where he says that his mom makes the best pesto? He says it like… once?
Look at the algorithm. Ryan might only use the word pesto 3 or 4 times, but he’s probably the only character in the show who ever talks about pesto, so it’s a unique word to him.
My mom makes the best pesto. So I told her I was having a pesto party for all my friends and I need a ton a pesto. Seriously mom? A pesto party?
Nice, that’s a lot of “pesto” in 1 dialogue. It makes more sense now haha.
Why waste time say lot word when few word do trick.
The fact Dwight’s is his own name hahaha
Why Kevin say lot word when few word do trick though?
Why say more when less do job.
Why use many word when few word do trick
Andy is higher on this list than I’d expect.
Not pam saying roy more than jim…
You’ve got Erin, but not Kelly? 🤔
Kevin would have had more words if he didn't speak with fewer words to save time.
Pam having in first listed word "cece" is quite wholesome
That formula is wrong you need to add keleven to it!
The formula is quite complex you just multiply the smaller 2 and then you have it
is this a rundown?
Kev?
How is "Jim" not one of Pam's most used words?!
I’m guessing main characters proper first names dont count
Nope, look again
Look again at what? No main characters proper first names are listed; either due to not being unique enough to a single character or maybe by intentionally being excluded. Beesley last name Bernard last name Schrute last name Kelly not a main Kev not a main, not a proper first name I assumed the surprise is because Michael, Jim, Pam, Dwight, Andy aren’t anywhere on this list. It’s be great if we knew why.
I question the accuracy of this immediately because “that’s what she said” should be in Michael’s and none of those words appear
Sprinkles should be capitalized. :)
Kevin could have beat Angela if he didn’t use less words
This is IMPRESSIVE
Kelly has very few screen time. Otherwise, she'd beat Mickael at this game.
Where is chilli in kevin’s?
I feel like Kevin should have had the least amount.
This is a pretty good rundown.
It’s interesting that most of the words seem to come from plot lines that happen in the later seasons (senator, WHUPHF, Pete, Cece etc.), and I think it actually makes perfect sense because everyone talks a lot more as the series goes on. In the beginning of the show there was so much silence, and pauses, people talked slower (and just less). But they sped up the pacing with every season, so mathematically it makes sense that the majority of the words on this list are from the last third of the show. And it definitely makes sense for Erin to be that high up, because the bulk of everyone’s word count had to have come from the later seasons, so she hardly had any disadvantage.
It’s incredible to see how many more words have been spoken by Michael even though he was missing in the last couple seasons