by Harry Chiel & Owen Prunskis
In their blog post “Belichick and Brady Address the Media: A Statistical Report,” Ben Blatt and Andrew Mooney accurately describe the mind-numbing, yet occult, phenomenon that is a Bill Belichick postgame interview:
“Bill Belichick and the postgame podium are notorious for being a lethally boring tandem. Week after week, he stymies reporters with concise non-answers, vague summaries, and robotic praise of players, giving them nothing of substance to write about his team.”
His succinct, formulaic speeches seem so simple and devoid of information that a computer could spit them out. In fact, that was precisely the challenge we accepted: to use the word frequencies that Bill Belichick, head coach of the New England Patriots, employs in his postgame pressers, in order to train a computer to produce his speeches.
C.E. Shannon’s “A Mathematical Theory of Communication,” an influential paper in the field of information theory written in 1948, deals with methods of approximating the English language using letter and word frequencies. Although he did not have computers at his disposal, Shannon used two strategies that we found both helpful and particularly straightforward to apply with the power of a computer:
- To randomly generate words based purely on the frequencies that they are used in the sample text.
- To randomly generate the first word, and then to randomly generate the next word based on the frequencies of the words found after that first word in the sample text. For example, if the first word generated were “football,” the algorithm would search the document for “football”, record each word used after “football,” and choose the next word based on the frequency with which it was used after “football.” This is known as a “2-gram” approximation.
Equipped with Shannon’s ideas and some strategies of our own, we gathered a sample of 45 of Belichick’s postgame transcripts, comprised of 27 wins and 18 losses. From those speeches, we used his opening statements, the segments unadulterated by the questions of reporters. We then set out to have our laptops spawn Belichick-talk.
For our first order approximation, we used Shannon’s first idea—to have the computer spit out words using the probability distribution observed from the sample text. Here is one win and one loss speech that we obtained:
Win:“Team a high-scoring defense there there that handle on to fought for they game the of Colts glad I’m the they did phases a a and so are hard win football really time we against and on just fought the 357 certainly plays players today in well great more stepped some but Jets guys did champions special played end proud we in were good need we good and good players in touchdown got we lead.”
Loss:“First, why the around on we of I minutes outstanding they Pittsburgh minutes obviously certainly the were balls able next credit a come Dolphins out little everything in credit they give a of a we last chances just ill a us it outplayed by did obviously us tonight here is of our credit they the more next and our to got better do competitive quite next I the they better with better first better describe.”
As you can see, the text seen here is almost completely nonsensical. Although the words appear in the text with the proper probabilities, only by luck do they sometimes form strings that would possibly be heard in English sentences.
At this point, we thought it worthwhile to split up the words from our sample into their appropriate parts of speech. We only went through this process for our “losses” sample. For our second-order approximation, we set a specific sentence structure (“Subject-verb-article-adjective-object-adverb-conjunction-subject-verb-adjective.”), and for each word, we had the computer randomly choose the word based on its frequency among that part of speech in the text:
“We played a whole credit better and we played good. We fought a every team today and we did any.  We thought the our plays better and game outplayed all.  They have the more chances obviously and Giants had more.  It have a last games quite and we did any.  We had the few us better and guys did enough.  You did the our job better and kicking have those.”
Although the words are coming together to actually sound more like human sentences, no one on Earth talks without any sentence variation. Thus, for our third order approximation, we used the same methodology, only here, we allowed the computer to choose between 20 different sentence structures with equal probability:
“Position tonight didn’t was a some a lot loss. The we getting clearly plays better. Well win better is our to do and they were to play tonight. Really they did to did enough of all job of a football. Well they made to were a any job. Defense fought the our advantage really but we just think little team.  They had better a good enough job.”
Already, our third order approximation is a bit more realistic—the computer was generating randomly structured sentences that had long strings of words that could fit together in an English sentence. Our main critique of this method, though, is that it required too much human involvement; we had to classify each word as a particular part of speech, and we had to record by hand Belichick’s usual sentence structures.
Finally, we decided to use Shannon’s 2-gram approach as our fourth order approximation. Here are examples of what the computer came up with:
Win: “I really great job with the season but they really proud of football team and that just a huge role in the win I hope we played today in the Jets it was reinforced by Devin McCourty there and that was today they certainly a big plays than this very competitive very competitive very good some guys played hard as today came through it was a long but I thought those games.”
Loss: “We had a better job of games coming up a tough loss like it came down to getting field position to say here but we just not too much to say well they hit the better than we just have some pretty good football game and had their chances disappointing but we had our players fought all the story of the kicking game I give the first half but that is in the things.”
Without sifting through the text and sorting out which word belonged to which part of speech, this algorithm yielded a relatively excellent result. We find several strings here that we might actually expect Belichick to say: “a huge role in the win,” “it was reinforced by Devin McCourty,” “it came down to getting field position,” and “just not too much to say.” Purely using the frequencies of Belichick’s words and their sequence within a text, this computer program produced phrases that are positively Belichickian.
As these approximations are refined, it is plausible that the Patriots could send a computer to the podium to feed reporters the “concise non-answers” that they are accustomed to. In fact, we suspect that Coach Belichick would prefer this arrangement as well.
2 Comments