Choosistant is an ML-powered system which helps you decide which product to buy by summarizing the pros and cons from written product reviews. The system leverages an NLP model to predict spans of text describing product features which are considered to be good and bad by reviewers.
We all experience it. You need to buy a new weighing scale. Your old scale stopped working. You want to replace it in a hurry. A quick search on amazon.com and you find a weighing scale with the Amazon’s Choice tag:
The product looks slick, is highly rated and reasonably priced. But before clicking on the screaming Buy Now button, you are curious about what other people are saying about this product. Mostly the reviews are positive and say good things about the scale. But then you stumble on a seemingly positive review:
Wait, what?! So this weighing scale cannot even provide its basic functionality correctly? You find other consumers reporting similar inaccuracies:
It seems that this scale is neither consistent nor accurate with its measurements. You also find worrying reviews about the scale’s robustness.
Now what? You look for alternative products and read their reviews. Soon, you are spending a huge amount of time reading a bunch of reviews, wishing there was a way to have a summary of all the reviews in one place. Enter choosistant!
The problem is as follows:
There are several ways to frame this problem as a machine learning task. One approach is to perform traditional sentiment analysis and capture words that get high attention. However, it is not clear how to aggregate these results into coherent pros and cons. Another approach is to formulate the problem as a summarization task. The main issue is that summarization models tend to hallucinate i.e. include phrases and words that are not part of the original text. We found that the best method is to cast the problem as an Extractive Question Answering (QA) task. This alleviates the hallucination issue with text summarizers.
If you want to more about how Extractive QA works, please read the excellent articles written by the folks at Hugging Face and deepset.
Although the pre-trained QA model by deepset seemed promising, its predictions were qualitatively not great. We needed a way to fine-tune the pre-trained model to our task at hand. For this, we had to annotate a training set. First, we wrote an annotation guideline because we wanted to ensure consistent annotations. Early on, we identified that one big risk for the project was annotations being inconsistent and thus creating bad models. Then we sampled a small set (~200 reviews) from the Amazon Review data set and began the annotation process using Label Studio.
We wanted to get an MVP up and running quickly so we could iterate on it. Since our sampled data was not very large, we created a GitHub repo to store Label Studio’s artifacts including SQLite database, reviews in JSON files and exported annotations in JSON.
The annotated data were used to fine-tune both the QA model as well as a seq2seq model. We created an account for Weights & Biases, and incorporated it in the trainer. Pytorch Lightning make things easy. However, we did not fully utilize W&B because we did not perform a lot of experiments with our models. Once a model was fine-tuned, we stored the model artefacts on Hugging Face Models, essentially using the HF service as a light-weight model registry.
Our models were downloaded by the inference server and used to make predictions. We used FastAPI to expose a /predict
REST-endpoint. The app assigns each request a unique ID and logs information about the request, the output of the model, inference time and device (CPU or GPU). The prediction ID is sent as part of the HTTP response such that any logging that happens on the client side can be linked to the predictions made by inference service.
In order to interact with our models, we built a simple user interface using Gradio. We were able to quickly put together a skeleton UI that interacts with the inference service and presents the results to the user. The user can flag the output of the model.