Movies are my passion. There is no language, culture or geography that can limit my love for movies. I love to watch movies and learn about the qualities and nuances that make a movie successful. You may have seen me wonder if I could alter some aspects to make a difference in the movie’s success rating. This would include predicting the success score of the movie and then playing around with variables to adjust the results. Because I wanted to create a quick and simple solution that didn’t require so much effort, it was important to me.
As I was working through the Databases in Google Cloud Blog series recently, I was considering BigQuery as part 6. That is when I realized that BigQuery not only has a multi-cloud, serverless data warehouse option for my training dataset that can scale from bytes up to petabytes with 0 operational overhead but also has Machine Learning (ML), Analytics, and Machine Learning (ML). I can use it to create no-code prediction using SQL queries. Federated Queries allow me to directly access data from other sources without having to store or move it into BigQuery. That’s cool! It’s even cooler!
Finally, I was able to test some of my theories regarding factors that influence a movie’s success or rating. It took me very little time and effort. Are you a movie lover like me? Get ready to buckle your seatbelts , because it’s going to be a bumpy night – read:
This is what you will learn about
1. Cloud Shell creates a BigQuery dataset and loads data from file
2. BigQuery ML was used for supervised learning to create an Multiclass Regression Prediction Model for the Movie Score prediction concept. I did not have to code, but I only used SQL queries to ML.
3. A similar model was also created using Vertex AI AutoML and BigQuery data. I used the new direct integration to BigQuery in Vertex AI. No additional components were required for moving your data.
BigQuery Dataset with Cloud Shell
BigQuery lets you focus on data analysis to uncover meaningful insights. This blog will show you how to use the bq command line tool to load a CSV file from your local computer into a new BigQuery Table. To enable BigQuery, follow the steps below.
Create your project by activating Cloud Shell
- Select or create a Google Cloud Project in the Google Cloud console.
- You must ensure that billing is enabled on your Cloud project. How to verify that billing is enabled for a project
- Navigate to BigQuery. The following URL can be used to open BigQuery’s web UI:
- You will use Cloud Shell to run a command-line environment in Google Cloud.
Predicting Movie Success Score (user score on an scale of 1-10
BigQuery ML supports supervised Learning with the logistical regression model type. To predict whether a value falls within one of the two categories, you can either use the binary logistic regression model type or the multi-class model type. These problems are called classification problems because they try to classify data into multiple categories.
This experiment uses to predict the success score (user score/rating) of the movie using a multi-class classification model based on the movie dataset.
Select your training data
I have divided the movie data into three categories using the field “data_cat”, which has one of the following values: TRAIN, TEST or PREDICT. The model requires that the data be split for training and testing purposes. This scenario involved visualizing the independent variables in the training dataset and their impact on dependent variables. It was important to narrow down the column in the CSV file. For more information about dataset splitting, see the documentation.
Here’s a quick overview of the model selection process:
This is an experiment. I chose this model based on my evaluation of the results. After looking at a few models, I decided to go with LOGISTIC REG. It’s simpler and gives results that are closer to the actual movie rating. This model is not recommended for this purpose and should only be used as an example. Another way to implement this model is to predict the outcome by using the Logistic Regression model, instead of predicting it.
Vertex AI AutoML Integration using BigQuery data
To create an AutoML model directly with Vertex AI, you can use your BigQuery data. This blog does not cover details about the Vertex AI API AutoML API that is invoked after the deployment. That is for another blog. But, I will be covering the part that allows us to integrate BigQuery datasource to Vertex AI directly without moving data between services. This is a demonstration of BigQuery data integration in Vertex AI’s AutoML.
I used the AutoML model API endpoint in this experiment to create a web application that had transactional data managed by MongoDB, Cloud Functions, and MongoDB. This is the next and final part of our series.
You can find a few tasks I have for you in the interim.
- Compare the accuracy and other parameters of the BigQueryML created manually with SQLs and the Vertex AI AutoML model
- Try to adjust the variables independent of each other to improve the prediction accuracy.
- You can take it one step further by trying the Linear Regression model to predict the score using float or decimal point values instead of rounded integers