⚡ Free Classes and Scholarships Available for Underprivileged Students -

Developing ML Models by Using BigQuery ML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
You are building a customer churn prediction model using BigQuery ML. Your dataset contains 500,000 rows with features like customer_age, monthly_charges, contract_type, and a binary label is_churned. Which BigQuery ML model type is most appropriate for this use case?
Explanation
LOGISTIC_REG is the correct choice because customer churn prediction is a binary classification problem (churned vs. not churned). Logistic regression is specifically designed for binary classification tasks. LINEAR_REG is incorrect as it's used for regression problems predicting continuous values, not binary outcomes. KMEANS is an unsupervised clustering algorithm and wouldn't be appropriate for a labeled classification task. ARIMA_PLUS is for time-series forecasting, not binary classification.
Question 2 of 10
You need to predict monthly sales revenue for the next 12 months using historical sales data stored in BigQuery. The data shows clear seasonal patterns and trends. Which BigQuery ML model would be most effective?
Explanation
ARIMA_PLUS is the correct choice for time-series forecasting with seasonal patterns. It's specifically designed to handle trend, seasonality, and holiday effects in time-series data. While BOOSTED_TREE_REGRESSOR and DNN_REGRESSOR can predict continuous values, they don't inherently handle time-series components like seasonality and trend as effectively as ARIMA_PLUS. AUTOML_REGRESSOR could work but ARIMA_PLUS is purpose-built for time-series forecasting and would be more efficient and interpretable for this specific use case.
Question 3 of 10
You're implementing a product recommendation system in BigQuery ML for an e-commerce platform with user-product interaction data. You want to predict which products a user might be interested in based on implicit feedback (views, clicks). Which model type should you use?
Explanation
MATRIX_FACTORIZATION is the correct choice for collaborative filtering-based recommendation systems. It discovers latent factors in user-item interactions to generate personalized recommendations. LOGISTIC_REG would require explicit binary labels and doesn't capture the collaborative filtering aspect. KMEANS is for clustering and wouldn't provide personalized recommendations. BOOSTED_TREE_CLASSIFIER is for classification tasks with explicit labels, not for discovering implicit patterns in user-item interactions that drive recommendations.
Question 4 of 10
You need to perform feature engineering in BigQuery ML to improve model accuracy. Your dataset contains a 'purchase_date' column, and you want to extract meaningful temporal features. Which SQL transformation would be most effective for creating features in your BigQuery ML training query?
Explanation
Extracting multiple temporal components (day of week, month, hour) creates meaningful features that capture patterns like weekly cycles, seasonal trends, and time-of-day behaviors, which significantly improve model performance. Simply using the raw purchase_date or casting it to STRING doesn't create useful numeric features for ML models. While DATE_DIFF could be useful, it only provides one feature (days since purchase) and misses important cyclical patterns. The comprehensive extraction of temporal components provides the richest feature set for the model to learn from.
Question 5 of 10
After training a binary classification model in BigQuery ML to detect fraudulent transactions, you need to evaluate its performance. Your business prioritizes minimizing false negatives (missing actual fraud cases) over false positives. Which evaluation metric should you focus on primarily?
Explanation
Recall is the correct metric to prioritize when minimizing false negatives is critical. Recall measures the proportion of actual positive cases (fraud) that were correctly identified, directly addressing the concern about missing fraud cases. Precision measures how many predicted positives were actually positive, but doesn't address missing fraud cases. Accuracy can be misleading in imbalanced datasets and doesn't specifically address false negatives. While ROC AUC provides overall model performance across thresholds, recall specifically measures the ability to catch all fraud cases, which is the stated priority.
Question 6 of 10
You're using BigQuery ML to predict house prices (continuous values) and want to understand how well your model explains the variance in the target variable. Which metric from ML.EVALUATE should you examine?
Explanation
r2_score (R-squared or coefficient of determination) specifically measures the proportion of variance in the dependent variable that is explained by the model, ranging from 0 to 1 (higher is better). This directly answers the question about variance explanation. mean_absolute_error, mean_squared_error, and root_mean_squared_error measure prediction error magnitude but don't directly indicate how much variance is explained. While error metrics are important for understanding prediction accuracy, R-squared is the standard metric for assessing explanatory power in regression models.
Question 7 of 10
You need to generate real-time predictions from your BigQuery ML model for a web application that requires sub-second latency. Which approach would be most appropriate?
Explanation
Exporting the BigQuery ML model and deploying it to Vertex AI Endpoints provides true real-time, low-latency predictions suitable for web applications requiring sub-second response times. Scheduled queries running every minute cannot meet sub-second latency requirements. Materialized views still involve querying BigQuery and won't consistently provide sub-second latency for individual predictions. Using ML.PREDICT directly in application queries would have higher latency and cost compared to a dedicated prediction endpoint, and isn't designed for high-throughput, low-latency serving scenarios.
Question 8 of 10
You're building a BigQuery ML model to classify customer support tickets into 5 categories. You have 100,000 labeled tickets with features including text descriptions and metadata. Which model type would provide the best balance of accuracy and ease of implementation?
Explanation
BOOSTED_TREE_CLASSIFIER is the best choice for multi-class classification with structured and text features, offering excellent accuracy with minimal hyperparameter tuning and handling both numeric and categorical features well. LOGISTIC_REG can handle multi-class classification but typically has lower accuracy than boosted trees for complex patterns. DNN_CLASSIFIER could work but requires more careful feature engineering and hyperparameter tuning. KMEANS is an unsupervised clustering algorithm and cannot be used for labeled classification tasks, making it inappropriate despite having 5 clusters matching the 5 categories.
Question 9 of 10
You want to perform batch predictions on 10 million new records using your trained BigQuery ML model. The predictions will be used for a weekly marketing campaign. What is the most cost-effective and appropriate approach?
Explanation
Using ML.PREDICT directly in BigQuery for batch predictions is the most cost-effective and straightforward approach when data is already in BigQuery and predictions are needed periodically (weekly). This leverages BigQuery's scalability and eliminates data movement costs. Exporting to Vertex AI adds unnecessary complexity and cost for batch processing when data is already in BigQuery. Individual REST API calls would be extremely slow, expensive, and inefficient for 10 million records. Creating a Dataflow pipeline adds unnecessary infrastructure complexity when BigQuery can handle the batch prediction natively and efficiently.
Question 10 of 10
You're training a BigQuery ML model and notice that many features have very different scales (e.g., age: 20-80, income: 20000-200000). Some features also have missing values. Which BigQuery ML feature would automatically help address these issues?
Explanation
BigQuery ML automatically performs feature preprocessing including standardization (scaling), one-hot encoding for categorical variables, and handling of NULL values, making it unnecessary to manually preprocess most features. While manual feature scaling would work, BigQuery ML handles this automatically, saving development time and reducing errors. AUTO_CLASS_WEIGHTS addresses class imbalance, not feature scaling or missing values. Creating separate models for different scale ranges is an incorrect and inefficient approach. BigQuery ML's automatic preprocessing is a key advantage that simplifies model development while ensuring features are properly prepared for training.