Presidential inaugural addresses are historically known to be devoid of real substance and instead formulaically seek to create a sense of bipartisanship through affirming the nation’s values. This stands in stark contrast to the bold claims and aspirations stated in presidential debates. However, through the use of Natural Language Processing, I was able to cut through the characterless speeches and gain interesting insights through topic modelling. “Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents¹.” Taking a deeper look at these topics, we get a glimpse of both domestic and international affairs.
I was able to download all historical text files for each address (excluding President Biden) from the Center for Open Science. I separately collected President Biden’s address and saved the contents within a text file.
Exploratory Data Analysis
Prior to diving into topic modelling, I wanted to understand broader trends like address word length, words per sentence and “I” vs. “We”.
- Address Word Length
Taking a look at total words for each address, we can see word count has remained generally constant over time. The notable exception is President William Henry Harrison’s address, whose lengthy address unfortunately cost him his life. The 68 year old former military officer refused to wear an overcoat, hat or gloves during his speech, which later led him to develop pneumonia.
- Words / Sentence
Words spoken per sentence has rapidly declined over time. This is unsurprising given changes in vernacular as well as the primary conduit for the address moving from newspaper text to audio and television.
- “I” vs. “We”
The ratio of “We” vs “I” has also increased over time, as president’s have sought to create a sense of unity in their addresses.
Stop Word and Punctuation Removal
Stop words are common words throughout a text that carry no real meaning such as, “the”, “and”, and “in”. Since these words typically appear at a high frequency within a corpus they are removed prior to topic modelling. I also removed additional stop words beyond the default stop words, given there are many words that are common to presidential addresses that carry little value. Additional words that were removed included:
government, people, nation, states, make, long, come, day, know, day, way, fellow, americans, citizens, united, america, shall, must, may, upon, every, let, one, would, great
Additionally, all punctuations were removed from the corpus as they hold no value in the topics discussed.
Lemmatization and Lowercase
Words carry the same meaning regardless of their case and so all words were converted to lowercase. All words were also lemmatized so that only the lemma, which contains the meaning of words remained. For instance, “studies” and “studying” both contain essentially the same meaning and so should be lemmatized to simply “study”. Note that when lemmatizing, it is important to lemmatize your stop words as well to ensure they are removed.
Part of Speech Removal
I also removed all words that were not nouns, adjectives, verbs and adverbs. Words that fall outside these parts of speech contain limited semantic value in topic modelling.
After pre-processing, I performed topic modelling through both LDA (Latent Dirichlet Allocation) and NMF (Non-negative matrix factorization). I found both performed similarly well and ultimately elected to utilize the NMF approach throughout.
2 Component NMF
I started using only a two component NMF and then assigning each address to its majority topic. The top words for each topic show distinctly different topics. The first topic shows the United States as insular in nature and upholding the values of the constitution. While the second topic shows a focus on a global world and freedom.
These topics fell very neatly when placed on a time-scale. Prior to President McKinley, inaugural addresses fell in the first topic (“Insular/Constitution upholding”) and from President McKinley onward all presidents (excluding Taft) fell in the second topic (“Global/Freedom”).
It is not surprising that there is a marked shift starting at McKinley. During McKinley’s presidency the United States abruptly became a global power. As the U.S. became more involved in foreign affairs militarily and with its imperial motives, it makes sense “freedom” and “peace” became a key point of discussion. Some events that prove this:
Early 1890s: No interest in foreign affairs
1898 (McKinley): The United States wins the Spanish American War and annexes Puerto Rico, Guam, and the Philippines. The U.S. also separately annexes Hawaii.
1899: Out of fear of being shut out of trade with China, the Roosevelt Corollary to the Monroe Doctrine declared the U.S. would exercise “international police power” in the Western Hemisphere
20 Component NMF
Next, I wanted to take a more discerning approach and see what more granular topics existed and how they changed through time. In order to achieve this, I utilized a 20 component NMF. I found that with fewer components, the topics were less interesting; however, this resulted in multiple overlapping topics. In these instances, I summed the topic vectors and was left with a total of 14 topics. One of these topics was an “other” bucket where the topics could not be discerned. I will show a series of charts where each column represents a president and the “y” column represents the value of the topic.
Topic 1: Power Granted
Key Words: power, executive, grant, control, act, state, sovereignty, possess, grant power, judiciary
This topic is mostly about the power vested in the branches of the government and president. We can see that in general, discussion around this topic have declined over time.
Topic 2: Business
Key Words: revenue, business, well, party, system, country
This topic revolves around business and it is unsurprising McKinley scored high marks here given he was a proponent of big business through protectionism and high tariffs.
Topic 3: Avoidance of Political Agitation
Key Words: political, agitation, institution, interest, never, subject
In his inaugural address, Martin Van Buren applauded the American people for the nation’s success and how they have overcome a multitude of challenges and dangers, avoiding crippling political agitation. He explains that through strict adherence to the constitution they have avoided the “rapid failure” that other countries had predicted of them.
Topic 4: American Spirit
Key Words: spirit, liberty, party, whole, power, character, interest, free
A decline in words related to the founding principles have declined over time.
Topic 5: Upholding Amendments
Key Words: law, man, enforce, pass, amendment, support
Increased discussion of the amendments occurred around the civil war and World War 1.
Topic 6: Protecting National Interests
Key Words: right, revenue, free, interest, respect, protection, duty, American, home, foreign
In John Tyler’s inaugural address, he stressed how he would scale back the “monarchial” Jacksonian democracy. He compared Jackson’s presidency to Cromwell or Caesar and feared the continuation of it would lead to violence and the decline of the nation.
Topic 7: Public Service
Key Words: public, duty, good, service, office, economy
There has been a decline in president’s discussing their public service.
Topic 8: Preserve Values
Key Words: spirit, honor, interest, preserve, love, wish
There has been a decline in president’s discussing the preservation of values.
Topic 9: Uphold Constitution
Key Words: time, hope, believe, constitutional, good, right
At first, this topic seems quite nebulous; however, after reading Franklin Pierce’s inaugural address it is all about upholding the constitution. In his address, he stresses the importance of states rights and the balance of the federal government. This address proved to be a harbinger for the Civil War, which would begin less than 10 years after giving the address.
Topic 10: War
Key Words: war, force, invasion, power, time, naval
War discussed in its “general” sense is high around the War of 1812 and the Civil War.
Topic 11: War/Peace in an External Context
Key Words: peace, policy, war, world, foreign, treaty
This topic discusses war, but in a more global context. Thus, we see the Civil War era with lower values and the World War I era with higher values.
Topic 12: Political Peace
Key Words: party, political, peace, time, war, government
John Quincy Adams was elected through the House of Representatives as there was no majority electoral college vote. After the election, there was fear of potential violence and in his address he tried to heal electoral divides.
Topic 13: World Freedom / Peace
Key Words: world, new, freedom, American, work, time
After McKinley, we see a large shift to discussing the world and freedom.
Share of Topics through Time
Finally, I wanted to compare these topics over time as a total share of the speeches. Most topics have seen limited change or a decline as the “World Freedom / Peace” topics has rapidly increased over time.
This project was completed as part of 3-month Metis data science bootcamp program.
For more information on this project, including code and presentation slides, please check out my GitHub repository here.