Sentiment analysis using MindsDb and Reddit

Sentiment analysis is a process of identifying and extracting the emotional tone of a piece of text, such as tweets, reviews, or news articles. It is widely used in many fields, including marketing, customer service, and political analysis, to understand the opinions and attitudes of people toward a product, service, or topic.

In this article, we will discuss how to perform sentiment analysis on Reddit data using MindsDB's Reddit Handler. Reddit is a social news and discussion website with over 55 million daily active users, making it an excellent source of text data for sentiment analysis. You can signup on mindsdb here: cloud.mindsdb.com and Hashnode hashnode.com

Using the Reddit application handler

You can create a new database that allows you to read from the Reddit API as follows:

The Reddit handler is initialized with the following parameters:

client_id: a required Reddit API client ID
client_secret: a required Reddit API client secret
user_agent: a required user agent string to identify your application Read about creating a Reddit API application here.

CREATE DATABASE my_reddit
With 
    ENGINE = 'reddit',
    PARAMETERS = {
     "client_id":"YOUR_CLIENT_ID",
     "client_secret":"YOUR_CLIENT_SECRET",
     "user_agent":"YOUR_USER_AGENT"
    };

Once you've created the database, two tables are created automatically. One for comments in a subreddit and another for posts/submissions in a subreddit.

If you'd like to get the posts in a subreddit you can run this SQL query.

SELECT *
FROM my_reddit.submission
WHERE subreddit = 'MachineLearning' AND sort_type = 'top' AND items = 5;

items: This is the number of items you want to retrieve, similar to using a limit.

sort_type: There are various filter types for posts in a subreddit(top, controversial and new)

Now, let’s create a model table to identify the sentiment for all replies in a post:

In practical terms, executing the CREATE MODEL statement prompts MindsDB to create an AI table named sentiment_classifier_model, which leverages the OpenAI integration to predict a column called sentiment. This model is housed within the default MindsDb project.

We will be running a sentiment analysis on the subreddit for The Marvelous Mrs. Maisel Thread to check the reviews of the fourth season of the show.

https://www.reddit.com/r/TheMarvelousMrsMaisel/comments/sv7go7/season_4_general_discussion_and_episode_thread_hub/

Take note of the submission id, which is `sv7go7` as seen in the link above.

CREATE MODEL sentiment_classifier_model
PREDICT sentiment
USING
  engine = 'openai',
  prompt_template = 'describe the sentiment of the reviews
                     strictly as "positive", "neutral", or "negative".
                     "I love the movie":positive
                     "It is a scam":negative
                     "{{body}}.":',
  api_key = 'YOUR-API-KEY';

After running the SQL query, The output should look like this, with the status as complete.

We can then join the created table with another table for batch predictions:


SELECT input.body, output.sentiment
FROM my_reddit.comment AS input
JOIN sentiment_classifier_model AS output
WHERE input.submission_id = 'sv7go7'
LIMIT 3;

Running the SQL query above gives this

Conclusion

We have been able to review the Sentiment of replies or responses to the Tv show using MindsDb, OpenAi and the Reddit application Handler.

In summary, MindsDB is a convenient solution for building machine-learning models quickly and efficiently, without requiring extensive expertise in the field. To learn more about the capabilities of MindsDB, consult their documentation. Additionally, all the relevant commands and data used are available on GitHub.