A New Nation — Topic Modelling Presidential Inaugural Addresses

Andrew Smith
8 min readMar 26, 2021
McKinley and Roosevelt created a dramatic shift in U.S. foreign affairs (and fraught with imperialism)— this can be seen in the topic modelling of inaugural addresses

Introduction

Presidential inaugural addresses are historically known to be devoid of real substance and instead formulaically seek to create a sense of bipartisanship through affirming the nation’s values. This stands in stark contrast to the bold claims and aspirations stated in presidential debates. However, through the use of Natural Language Processing, I was able to cut through the characterless speeches and gain interesting insights through topic modelling. “Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents¹.” Taking a deeper look at these topics, we get a glimpse of both domestic and international affairs.

Data Collection

I was able to download all historical text files for each address (excluding President Biden) from the Center for Open Science. I separately collected President Biden’s address and saved the contents within a text file.

Exploratory Data Analysis

Prior to diving into topic modelling, I wanted to understand broader trends like address word length, words per sentence and “I” vs. “We”.

  1. Address Word Length

Taking a look at total words for each address, we can see word count has remained generally constant over time. The notable exception is President William Henry Harrison’s address, whose lengthy address unfortunately cost him his life. The 68 year old former military officer refused to wear an overcoat, hat or gloves during his speech, which later led him to develop pneumonia.

  1. Words / Sentence

Words spoken per sentence has rapidly declined over time. This is unsurprising given changes in vernacular as well as the primary conduit for the address moving from newspaper text to audio and television.

  1. “I” vs. “We”

The ratio of “We” vs “I” has also increased over time, as president’s have sought to create a sense of unity in their addresses.

Preprocessing

Stop Word and Punctuation Removal

Stop words are common words throughout a text that carry no real meaning such as, “the”, “and”, and “in”. Since these words typically appear at a high frequency within a corpus they are removed prior to topic modelling. I also removed additional stop words beyond the default stop words, given there are many words that are common to presidential addresses that carry little value. Additional words that were removed included:

government, people, nation, states, make, long, come, day, know, day, way, fellow, americans, citizens, united, america, shall, must, may, upon, every, let, one, would, great

Additionally, all punctuations were removed from the corpus as they hold no value in the topics discussed.

Lemmatization and Lowercase

Words carry the same meaning regardless of their case and so all words were converted to lowercase. All words were also lemmatized so that only the lemma, which contains the meaning of words remained. For instance, “studies” and “studying” both contain essentially the same meaning and so should be lemmatized to simply “study”. Note that when lemmatizing, it is important to lemmatize your stop words as well to ensure they are removed.

Part of Speech Removal

I also removed all words that were not nouns, adjectives, verbs and adverbs. Words that fall outside these parts of speech contain limited semantic value in topic modelling.

Topic Modelling

After pre-processing, I performed topic modelling through both LDA (Latent Dirichlet Allocation) and NMF (Non-negative matrix factorization). I found both performed similarly well and ultimately elected to utilize the NMF approach throughout.

2 Component NMF

I started using only a two component NMF and then assigning each address to its majority topic. The top words for each topic show distinctly different topics. The first topic shows the United States as insular in nature and upholding the values of the constitution. While the second topic shows a focus on a global world and freedom.

Top words for a two component topic model

These topics fell very neatly when placed on a time-scale. Prior to President McKinley, inaugural addresses fell in the first topic (“Insular/Constitution upholding”) and from President McKinley onward all presidents (excluding Taft) fell in the second topic (“Global/Freedom”).

Presidents who are enclosed within the red box fell within the “Global / Freedom” topic and those that did not fell within the “Insular / Constitution” topic.

It is not surprising that there is a marked shift starting at McKinley. During McKinley’s presidency the United States abruptly became a global power. As the U.S. became more involved in foreign affairs militarily and with its imperial motives, it makes sense “freedom” and “peace” became a key point of discussion. Some events that prove this:

Early 1890s: No interest in foreign affairs

1898 (McKinley): The United States wins the Spanish American War and annexes Puerto Rico, Guam, and the Philippines. The U.S. also separately annexes Hawaii.

1899: Out of fear of being shut out of trade with China, the Roosevelt Corollary to the Monroe Doctrine declared the U.S. would exercise “international police power” in the Western Hemisphere

A caricature of Teddy Roosevelt and his Corollary to the Monroe Doctrine

20 Component NMF

Next, I wanted to take a more discerning approach and see what more granular topics existed and how they changed through time. In order to achieve this, I utilized a 20 component NMF. I found that with fewer components, the topics were less interesting; however, this resulted in multiple overlapping topics. In these instances, I summed the topic vectors and was left with a total of 14 topics. One of these topics was an “other” bucket where the topics could not be discerned. I will show a series of charts where each column represents a president and the “y” column represents the value of the topic.

Topic 1: Power Granted

Key Words: power, executive, grant, control, act, state, sovereignty, possess, grant power, judiciary

This topic is mostly about the power vested in the branches of the government and president. We can see that in general, discussion around this topic have declined over time.

Topic 2: Business

Key Words: revenue, business, well, party, system, country

This topic revolves around business and it is unsurprising McKinley scored high marks here given he was a proponent of big business through protectionism and high tariffs.

Topic 3: Avoidance of Political Agitation

Key Words: political, agitation, institution, interest, never, subject

In his inaugural address, Martin Van Buren applauded the American people for the nation’s success and how they have overcome a multitude of challenges and dangers, avoiding crippling political agitation. He explains that through strict adherence to the constitution they have avoided the “rapid failure” that other countries had predicted of them.

Topic 4: American Spirit

Key Words: spirit, liberty, party, whole, power, character, interest, free

A decline in words related to the founding principles have declined over time.

Topic 5: Upholding Amendments

Key Words: law, man, enforce, pass, amendment, support

Increased discussion of the amendments occurred around the civil war and World War 1.

Topic 6: Protecting National Interests

Key Words: right, revenue, free, interest, respect, protection, duty, American, home, foreign

In John Tyler’s inaugural address, he stressed how he would scale back the “monarchial” Jacksonian democracy. He compared Jackson’s presidency to Cromwell or Caesar and feared the continuation of it would lead to violence and the decline of the nation.

Topic 7: Public Service

Key Words: public, duty, good, service, office, economy

There has been a decline in president’s discussing their public service.

Topic 8: Preserve Values

Key Words: spirit, honor, interest, preserve, love, wish

There has been a decline in president’s discussing the preservation of values.

Topic 9: Uphold Constitution

Key Words: time, hope, believe, constitutional, good, right

At first, this topic seems quite nebulous; however, after reading Franklin Pierce’s inaugural address it is all about upholding the constitution. In his address, he stresses the importance of states rights and the balance of the federal government. This address proved to be a harbinger for the Civil War, which would begin less than 10 years after giving the address.

Topic 10: War

Key Words: war, force, invasion, power, time, naval

War discussed in its “general” sense is high around the War of 1812 and the Civil War.

Topic 11: War/Peace in an External Context

Key Words: peace, policy, war, world, foreign, treaty

This topic discusses war, but in a more global context. Thus, we see the Civil War era with lower values and the World War I era with higher values.

Topic 12: Political Peace

Key Words: party, political, peace, time, war, government

John Quincy Adams was elected through the House of Representatives as there was no majority electoral college vote. After the election, there was fear of potential violence and in his address he tried to heal electoral divides.

Topic 13: World Freedom / Peace

Key Words: world, new, freedom, American, work, time

After McKinley, we see a large shift to discussing the world and freedom.

Share of Topics through Time

Finally, I wanted to compare these topics over time as a total share of the speeches. Most topics have seen limited change or a decline as the “World Freedom / Peace” topics has rapidly increased over time.

This project was completed as part of 3-month Metis data science bootcamp program.

For more information on this project, including code and presentation slides, please check out my GitHub repository here.

[1] https://monkeylearn.com/blog/introduction-to-topic-modeling/

--

--