5, 10 or 20 seats+ for your team - learn more
In this series of liveProjects, you’ll explore different techniques for topic modeling. Topic modeling is an incredibly useful unsupervised machine learning technique that allows you to find topics in text without needing any manual labelling. It’s a great way to quickly derive insights from text data and share them with key stakeholders. You’ll work with a variety of different text data corpuses to go hands-on with NMF algorithms from scikit-learn, LDA algorithms from Gensim, and even new neural network techniques using the OCTIS (Optimizing and Comparing Topic Models is Simple!) library.
In this liveProject you’ll use scikit-learn’s non-negative matrix factorization algorithm to perform topic modeling on a dataset of Twitter posts. You’ll step into the role of a data scientist tasked with summarizing Twitter discussions for the customer support team of an airline company and use this powerful algorithm to rapidly make sense of a large and complex text corpus. You’ll build a text preprocessing pipeline from scratch, visualize topic models, and finally compile a report of support topics for the customer services team.
In this liveProject, you’ll use the latent dirichlet allocation (LDA) algorithm from the Gensim library to model topics from a magazine’s article back catalog. Thanks to your work on topic modeling, the new Policy and Ethics editor will be better equipped to strategically commission new articles for under-represented topics. You’ll build your text preprocessing pipeline, use topic coherence to find the number of topics, and visualize and curate the algorithm’s output for your stakeholders to easily read.
In this liveProject, you’ll use the neural network-inspired Contextual Topic Model to identify and visualize all of the articles in a scientific magazine’s back catalog. This cutting-edge technique is made easy by the OCTIS (Optimizing and Comparing Topic Models is Simple!) library. Once you’ve established your text-processing pipeline, you’ll use coherence and diversity metrics to evaluate the output of your topic models, tune your neural network’s hyperparameters to improve results, and visualize your results for printing on posters and other media.
This liveProject series is for data scientists and developers who are confident programming with Python and the Python data ecosystem. To begin this liveProject you will need to be familiar with the following:
geekle is based on a wordle clone.