LDA topic modeling discovers topics that are hidden (latent) in a set of text documents. It can be trained via collapsed Gibbs sampling. As a rule of thumb for a good LDA model, the perplexity score should be low while coherence should be high. Latent Dirichlet allocation is one of the most popular methods for performing topic modeling. One method to test how good those distributions fit our data is to compare the learned distribution on a training set to the distribution of a holdout set. Topic Coherence measures score a single topic by measuring the degree of semantic similarity between high scoring words. Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. 