A New Nation — Topic Modelling Presidential Inaugural Addresses

McKinley and Roosevelt created a dramatic shift in U.S. foreign affairs (and fraught with imperialism)— this can be seen in the topic modelling of inaugural addresses


Presidential inaugural addresses are historically known to be devoid of real substance and instead formulaically seek to create a sense of bipartisanship through affirming the nation’s values. This stands in stark contrast to the bold claims and aspirations stated in presidential debates. However, through the use of Natural Language Processing, I was able to cut through the characterless speeches and gain interesting insights through topic modelling. “Topic modeling is an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents¹.” Taking a deeper look at these topics, we get a glimpse of both domestic and international affairs.

Data Collection

I was able to download all historical text files for each address (excluding President Biden) from the Center for Open Science. I separately collected President Biden’s address and saved the contents within a text file.

Exploratory Data Analysis

Prior to diving into topic modelling, I wanted to understand broader trends like address word length, words per sentence and “I” vs. “We”.


Stop Word and Punctuation Removal

Topic Modelling

After pre-processing, I performed topic modelling through both LDA (Latent Dirichlet Allocation) and NMF (Non-negative matrix factorization). I found both performed similarly well and ultimately elected to utilize the NMF approach throughout.

2 Component NMF

I started using only a two component NMF and then assigning each address to its majority topic. The top words for each topic show distinctly different topics. The first topic shows the United States as insular in nature and upholding the values of the constitution. While the second topic shows a focus on a global world and freedom.

Top words for a two component topic model
Presidents who are enclosed within the red box fell within the “Global / Freedom” topic and those that did not fell within the “Insular / Constitution” topic.
A caricature of Teddy Roosevelt and his Corollary to the Monroe Doctrine

20 Component NMF

Next, I wanted to take a more discerning approach and see what more granular topics existed and how they changed through time. In order to achieve this, I utilized a 20 component NMF. I found that with fewer components, the topics were less interesting; however, this resulted in multiple overlapping topics. In these instances, I summed the topic vectors and was left with a total of 14 topics. One of these topics was an “other” bucket where the topics could not be discerned. I will show a series of charts where each column represents a president and the “y” column represents the value of the topic.

Share of Topics through Time

Data Analytics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store