Architecting Low-Code AI Solutions - Quiz

Question 1 of 10

A retail company wants to predict customer churn based on historical transaction data stored in BigQuery. They have a binary outcome (churned: yes/no) and multiple features including purchase frequency, average order value, and customer tenure. Which BigQuery ML model type should they use?

CREATE MODEL with model_type='LINEAR_REG' CREATE MODEL with model_type='LOGISTIC_REG' CREATE MODEL with model_type='KMEANS' CREATE MODEL with model_type='ARIMA_PLUS'

Explanation

LOGISTIC_REG is correct because customer churn is a binary classification problem (churned vs. not churned). Logistic regression is specifically designed for binary outcomes and will output probabilities between 0 and 1. LINEAR_REG is incorrect as it's used for continuous numerical predictions, not binary classification. KMEANS is incorrect as it's an unsupervised clustering algorithm, not suitable for predicting known binary outcomes. ARIMA_PLUS is incorrect as it's designed for time-series forecasting, not binary classification.

Question 2 of 10

You are building a BigQuery ML model to predict housing prices. After training, you need to evaluate the model's performance. Which SQL statement would allow you to view comprehensive evaluation metrics including mean absolute error and R-squared?

SELECT * FROM ML.PREDICT(MODEL `project.dataset.model_name`, TABLE `project.dataset.test_data`) SELECT * FROM ML.EVALUATE(MODEL `project.dataset.model_name`, (SELECT * FROM `project.dataset.test_data`)) SELECT * FROM ML.TRAINING_INFO(MODEL `project.dataset.model_name`) SELECT * FROM ML.WEIGHTS(MODEL `project.dataset.model_name`)

Explanation

ML.EVALUATE is correct because it computes evaluation metrics on the test dataset, including R-squared, mean absolute error, mean squared error, and other regression metrics. ML.PREDICT is incorrect as it generates predictions on new data but doesn't provide evaluation metrics. ML.TRAINING_INFO is incorrect as it shows training progress and loss per iteration but not comprehensive evaluation metrics on test data. ML.WEIGHTS is incorrect as it only shows feature weights/coefficients, not performance metrics.

Question 3 of 10

A media streaming company wants to build a recommendation system in BigQuery ML to suggest content to users based on their viewing history. They have a user-item interaction matrix. Which model type is most appropriate for this collaborative filtering use case?

BOOSTED_TREE_REGRESSOR MATRIX_FACTORIZATION AUTOENCODER DNN_CLASSIFIER

Explanation

MATRIX_FACTORIZATION is correct because it's specifically designed for recommendation systems and collaborative filtering. It decomposes the user-item interaction matrix to discover latent factors and predict missing values (recommendations). BOOSTED_TREE_REGRESSOR is incorrect as it's used for general regression tasks, not optimized for collaborative filtering. AUTOENCODER is incorrect as while it can be used for recommendations, it's primarily for dimensionality reduction and anomaly detection in BigQuery ML. DNN_CLASSIFIER is incorrect as it's used for classification tasks, not recommendation systems based on interaction matrices.

Question 4 of 10

You need to forecast monthly sales for the next 12 months using BigQuery ML. Your data contains 3 years of historical monthly sales with clear seasonal patterns and trends. Which approach should you use?

CREATE MODEL with model_type='LINEAR_REG' and use month as a feature CREATE MODEL with model_type='ARIMA_PLUS' with HORIZON=12 CREATE MODEL with model_type='LOGISTIC_REG' with time-based features CREATE MODEL with model_type='KMEANS' to cluster similar months

Explanation

ARIMA_PLUS with HORIZON=12 is correct because it's specifically designed for time-series forecasting and can automatically handle seasonality, trends, and holidays. The HORIZON parameter specifies forecasting 12 periods ahead. LINEAR_REG is incorrect as it doesn't natively handle time-series patterns like seasonality and autocorrelation. LOGISTIC_REG is incorrect as it's for binary classification, not continuous time-series forecasting. KMEANS is incorrect as it's an unsupervised clustering algorithm, not suitable for forecasting future values.

Question 5 of 10

You're implementing feature engineering for a BigQuery ML model to predict customer lifetime value. You want to create a new feature that categorizes customers into spending tiers based on their total purchases. Which SQL transformation should you use within your CREATE MODEL statement?

Use ML.QUANTILE_BUCKETIZE to automatically create equal-sized buckets Use CASE WHEN statements in the SELECT clause to manually define spending tiers Use ML.FEATURE_CROSS to combine spending with other features Use ML.STANDARD_SCALER to normalize spending values

Explanation

CASE WHEN statements are correct because they allow explicit definition of business-logic-based spending tiers (e.g., low: <$1000, medium: $1000-$5000, high: >$5000) directly in the training query. This approach gives full control over tier boundaries. ML.QUANTILE_BUCKETIZE is incorrect as it's not a standard BigQuery ML function; you would use ML.BUCKETIZE for automatic bucketing. ML.FEATURE_CROSS is incorrect as it creates interaction features between multiple categorical features, not categorization of a single continuous variable. ML.STANDARD_SCALER is incorrect as it normalizes values but doesn't categorize them into discrete tiers.

Question 6 of 10

Your BigQuery ML classification model is showing high precision (0.95) but low recall (0.45) for detecting fraudulent transactions. The business priority is to catch as many fraudulent transactions as possible, even if it means some false positives. What should you do?

Increase the classification threshold above 0.5 Decrease the classification threshold below 0.5 Add more features to the model Switch to a regression model instead

Explanation

Decreasing the classification threshold below 0.5 is correct because it will classify more instances as positive (fraudulent), increasing recall (catching more actual fraud cases) at the cost of lower precision (more false positives). This aligns with the business priority. Increasing the threshold is incorrect as it would further decrease recall, catching even fewer fraud cases. Adding more features might help but doesn't directly address the precision-recall tradeoff needed here. Switching to a regression model is incorrect as fraud detection is a classification problem requiring binary outcomes, not continuous predictions.

Question 7 of 10

You need to generate real-time predictions for a fraud detection system using a BigQuery ML model. The system receives individual transaction requests that need immediate classification. Which approach provides the lowest latency for online predictions?

Use ML.PREDICT in BigQuery with scheduled queries every minute Export the model and deploy it to Vertex AI Prediction endpoints Use BigQuery ML's built-in HTTP endpoint for online predictions Create a Cloud Function that runs ML.PREDICT for each request

Explanation

Exporting to Vertex AI Prediction endpoints is correct because it provides dedicated, low-latency infrastructure optimized for real-time predictions with auto-scaling and sub-second response times. Scheduled queries are incorrect as they provide batch predictions, not real-time individual transaction scoring. BigQuery ML doesn't have built-in HTTP endpoints for online predictions, making that option incorrect. Cloud Functions with ML.PREDICT is incorrect as it introduces additional latency from cold starts and BigQuery query execution time, making it unsuitable for real-time requirements.

Question 8 of 10

You're training a BOOSTED_TREE_CLASSIFIER in BigQuery ML for a multi-class classification problem with imbalanced classes. Which option should you specify in your CREATE MODEL statement to handle the class imbalance?

AUTO_CLASS_WEIGHTS=TRUE BALANCE_CLASSES=TRUE CLASS_WEIGHT=[1.0, 5.0, 3.0] ENABLE_GLOBAL_EXPLAIN=TRUE

Explanation

AUTO_CLASS_WEIGHTS=TRUE is correct because it automatically adjusts the weights inversely proportional to class frequencies, giving more importance to minority classes during training. This is the recommended BigQuery ML approach for handling imbalanced datasets. BALANCE_CLASSES is incorrect as it's not a valid BigQuery ML option (it exists in other frameworks). CLASS_WEIGHT with manual values is incorrect as BigQuery ML doesn't support manual class weight specification in this format. ENABLE_GLOBAL_EXPLAIN is incorrect as it's for model interpretability and doesn't address class imbalance.

Question 9 of 10

A financial services company wants to detect anomalous network traffic patterns using BigQuery ML. They have normal traffic data but limited examples of anomalies. Which model type is most suitable for this anomaly detection use case?

LOGISTIC_REG with anomaly labels AUTOENCODER with reconstruction error threshold KMEANS with k=2 clusters ARIMA_PLUS to forecast expected traffic

Explanation

AUTOENCODER is correct because it learns to reconstruct normal patterns during training, and anomalies will have high reconstruction errors, making them detectable even with limited anomaly examples. This unsupervised approach is ideal when you have abundant normal data but few anomaly examples. LOGISTIC_REG is incorrect as it requires labeled anomaly examples for supervised training. KMEANS with k=2 is incorrect as simple clustering doesn't effectively capture complex normal patterns and may not reliably separate anomalies. ARIMA_PLUS is incorrect as it's for time-series forecasting, not anomaly detection in multi-dimensional traffic patterns.

Question 10 of 10

You've trained a BigQuery ML regression model to predict product demand. During evaluation, you observe an R-squared value of 0.89 on training data but 0.52 on test data. What does this indicate and what should you do?

The model is performing well; deploy it to production The model is underfitting; add more complex features and increase model complexity The model is overfitting; apply regularization or simplify the model The evaluation metrics are incorrect; retrain the model

Explanation

The model is overfitting is correct because the significant gap between training R-squared (0.89) and test R-squared (0.52) indicates the model learned training data too well but generalizes poorly to new data. Solutions include applying L1/L2 regularization, reducing features, or simplifying the model. Deploying to production is incorrect as the poor test performance indicates unreliable predictions on new data. Underfitting is incorrect as high training performance rules this out; underfitting shows poor performance on both training and test sets. The metrics aren't incorrect; they're revealing legitimate overfitting that needs to be addressed.

Welcome Back

Terms of Service & Privacy Notice

Create Account

Verify Your Phone Number

Reset Your Password

Login Required

Get in Touch

Developing ML Models by Using BigQuery ML