Easily identifying themes in text
This is a package that wraps up common theme identification (Topic Modeling) techniques in Python. SimpleTopicModel is currently under development, and subject to change.
Currently, you can git clone this repo and import it locally. Be sure to run pip install -r requirements.txt in the repo folder, to ensure you’ve got the relevant requirements.
I’m working on setting up a pypi release, slated for the near future.
Use couldn’t be easier. Most topic modeling techniques follow the same paradigm:
Sentence-Transformers package does this for us, using Microsoft’s Mini-LM model.UMAP, but you could substitute TSNE or PCA if you wanted to.HDBSCAN to build hierarchial clusters (which we’d like to traverse in a later release), but you could also use a KNN, GMM, etc.This builds on previous work including Gensim (LDA), BERTopic, Top2Vec, and pyLDAvis. They’re all excellent, more mature alternatives to SimpleTopicModel, and I’d encourage you to go check them out!