⚡ Free Classes and Scholarships Available for Underprivileged Students -

Developing ML Models by Using BigQuery ML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
A retail company wants to predict customer churn based on historical transaction data stored in BigQuery. They have a binary outcome (churned: yes/no) and multiple features including purchase frequency, average order value, and customer tenure. Which BigQuery ML model type should they use?
Explanation
LOGISTIC_REG is correct because customer churn is a binary classification problem (churned vs. not churned). Logistic regression is specifically designed for binary outcomes and will output probabilities between 0 and 1. LINEAR_REG is incorrect as it's used for continuous numerical predictions, not binary classification. KMEANS is incorrect as it's an unsupervised clustering algorithm, not suitable for predicting known binary outcomes. ARIMA_PLUS is incorrect as it's designed for time-series forecasting, not binary classification.
Question 2 of 10
You are building a BigQuery ML model to predict housing prices. After training, you need to evaluate the model's performance. Which SQL statement would allow you to view comprehensive evaluation metrics including mean absolute error and R-squared?
Explanation
ML.EVALUATE is correct because it computes evaluation metrics on the test dataset, including R-squared, mean absolute error, mean squared error, and other regression metrics. ML.PREDICT is incorrect as it generates predictions on new data but doesn't provide evaluation metrics. ML.TRAINING_INFO is incorrect as it shows training progress and loss per iteration but not comprehensive evaluation metrics on test data. ML.WEIGHTS is incorrect as it only shows feature weights/coefficients, not performance metrics.
Question 3 of 10
A media streaming company wants to build a recommendation system in BigQuery ML to suggest content to users based on their viewing history. They have a user-item interaction matrix. Which model type is most appropriate for this collaborative filtering use case?
Explanation
MATRIX_FACTORIZATION is correct because it's specifically designed for recommendation systems and collaborative filtering. It decomposes the user-item interaction matrix to discover latent factors and predict missing values (recommendations). BOOSTED_TREE_REGRESSOR is incorrect as it's used for general regression tasks, not optimized for collaborative filtering. AUTOENCODER is incorrect as while it can be used for recommendations, it's primarily for dimensionality reduction and anomaly detection in BigQuery ML. DNN_CLASSIFIER is incorrect as it's used for classification tasks, not recommendation systems based on interaction matrices.
Question 4 of 10
You need to forecast monthly sales for the next 12 months using BigQuery ML. Your data contains 3 years of historical monthly sales with clear seasonal patterns and trends. Which approach should you use?
Explanation
ARIMA_PLUS with HORIZON=12 is correct because it's specifically designed for time-series forecasting and can automatically handle seasonality, trends, and holidays. The HORIZON parameter specifies forecasting 12 periods ahead. LINEAR_REG is incorrect as it doesn't natively handle time-series patterns like seasonality and autocorrelation. LOGISTIC_REG is incorrect as it's for binary classification, not continuous time-series forecasting. KMEANS is incorrect as it's an unsupervised clustering algorithm, not suitable for forecasting future values.
Question 5 of 10
You're implementing feature engineering for a BigQuery ML model to predict customer lifetime value. You want to create a new feature that categorizes customers into spending tiers based on their total purchases. Which SQL transformation should you use within your CREATE MODEL statement?
Explanation
CASE WHEN statements are correct because they allow explicit definition of business-logic-based spending tiers (e.g., low: <$1000, medium: $1000-$5000, high: >$5000) directly in the training query. This approach gives full control over tier boundaries. ML.QUANTILE_BUCKETIZE is incorrect as it's not a standard BigQuery ML function; you would use ML.BUCKETIZE for automatic bucketing. ML.FEATURE_CROSS is incorrect as it creates interaction features between multiple categorical features, not categorization of a single continuous variable. ML.STANDARD_SCALER is incorrect as it normalizes values but doesn't categorize them into discrete tiers.
Question 6 of 10
Your BigQuery ML classification model is showing high precision (0.95) but low recall (0.45) for detecting fraudulent transactions. The business priority is to catch as many fraudulent transactions as possible, even if it means some false positives. What should you do?
Explanation
Decreasing the classification threshold below 0.5 is correct because it will classify more instances as positive (fraudulent), increasing recall (catching more actual fraud cases) at the cost of lower precision (more false positives). This aligns with the business priority. Increasing the threshold is incorrect as it would further decrease recall, catching even fewer fraud cases. Adding more features might help but doesn't directly address the precision-recall tradeoff needed here. Switching to a regression model is incorrect as fraud detection is a classification problem requiring binary outcomes, not continuous predictions.
Question 7 of 10
You need to generate real-time predictions for a fraud detection system using a BigQuery ML model. The system receives individual transaction requests that need immediate classification. Which approach provides the lowest latency for online predictions?
Explanation
Exporting to Vertex AI Prediction endpoints is correct because it provides dedicated, low-latency infrastructure optimized for real-time predictions with auto-scaling and sub-second response times. Scheduled queries are incorrect as they provide batch predictions, not real-time individual transaction scoring. BigQuery ML doesn't have built-in HTTP endpoints for online predictions, making that option incorrect. Cloud Functions with ML.PREDICT is incorrect as it introduces additional latency from cold starts and BigQuery query execution time, making it unsuitable for real-time requirements.
Question 8 of 10
You're training a BOOSTED_TREE_CLASSIFIER in BigQuery ML for a multi-class classification problem with imbalanced classes. Which option should you specify in your CREATE MODEL statement to handle the class imbalance?
Explanation
AUTO_CLASS_WEIGHTS=TRUE is correct because it automatically adjusts the weights inversely proportional to class frequencies, giving more importance to minority classes during training. This is the recommended BigQuery ML approach for handling imbalanced datasets. BALANCE_CLASSES is incorrect as it's not a valid BigQuery ML option (it exists in other frameworks). CLASS_WEIGHT with manual values is incorrect as BigQuery ML doesn't support manual class weight specification in this format. ENABLE_GLOBAL_EXPLAIN is incorrect as it's for model interpretability and doesn't address class imbalance.
Question 9 of 10
A financial services company wants to detect anomalous network traffic patterns using BigQuery ML. They have normal traffic data but limited examples of anomalies. Which model type is most suitable for this anomaly detection use case?
Explanation
AUTOENCODER is correct because it learns to reconstruct normal patterns during training, and anomalies will have high reconstruction errors, making them detectable even with limited anomaly examples. This unsupervised approach is ideal when you have abundant normal data but few anomaly examples. LOGISTIC_REG is incorrect as it requires labeled anomaly examples for supervised training. KMEANS with k=2 is incorrect as simple clustering doesn't effectively capture complex normal patterns and may not reliably separate anomalies. ARIMA_PLUS is incorrect as it's for time-series forecasting, not anomaly detection in multi-dimensional traffic patterns.
Question 10 of 10
You've trained a BigQuery ML regression model to predict product demand. During evaluation, you observe an R-squared value of 0.89 on training data but 0.52 on test data. What does this indicate and what should you do?
Explanation
The model is overfitting is correct because the significant gap between training R-squared (0.89) and test R-squared (0.52) indicates the model learned training data too well but generalizes poorly to new data. Solutions include applying L1/L2 regularization, reducing features, or simplifying the model. Deploying to production is incorrect as the poor test performance indicates unreliable predictions on new data. Underfitting is incorrect as high training performance rules this out; underfitting shows poor performance on both training and test sets. The metrics aren't incorrect; they're revealing legitimate overfitting that needs to be addressed.