Architecting Low-Code AI Solutions - Quiz

Question 1 of 10

You are preparing a tabular dataset in BigQuery for AutoML Tables training to predict customer churn. Your dataset contains 50,000 rows with 25 features, including customer demographics, transaction history, and support ticket counts. During initial exploration, you notice that 3 features have more than 40% missing values. What is the BEST approach to handle these features before training with AutoML Tables?

Remove all rows with missing values to ensure data quality Manually impute missing values using mean/median before importing to AutoML Leave the features as-is since AutoML Tables automatically handles missing values during training Remove all three features with high missing rates to avoid imputation errors

Explanation

AutoML Tables automatically handles missing values during training using sophisticated imputation techniques based on the data distribution and relationships between features. Option A is incorrect because removing 40% of rows would significantly reduce the training dataset size and potentially introduce bias. Option B is unnecessary because AutoML handles this automatically, and manual imputation might not be as sophisticated as AutoML's approach. Option D is incorrect because these features might still contain valuable predictive signals in the 60% of cases where they are present; AutoML can leverage partial information effectively.

Question 2 of 10

Your team is building a time-series forecasting model using AutoML to predict weekly product demand for the next 12 weeks. Your dataset spans 3 years of historical sales data with multiple products across different regions. Which feature engineering approach is MOST appropriate when preparing data for AutoML forecasting?

Create lag features, rolling averages, and seasonality indicators manually before importing to AutoML Provide the raw time-series data with proper time column identification and let AutoML extract temporal features automatically Aggregate all products into a single time series to simplify the model training process Split the time series into separate training datasets for each product-region combination

Explanation

AutoML forecasting automatically extracts temporal features including trends, seasonality, and lag features when you properly identify the time column and configure the forecast horizon. Option A is incorrect because manual feature engineering for time series is redundant and may interfere with AutoML's automated feature extraction. Option C is incorrect because aggregating different products loses granular patterns and reduces model accuracy. Option D is incorrect because AutoML can handle multiple time series simultaneously and will learn patterns across different product-region combinations, which is more efficient than training separate models.

Question 3 of 10

You are using AutoML Vision to train an image classification model to identify defective products on a manufacturing line. You have collected 10,000 images but only 500 show defects (5% of total). During model evaluation, you notice the model achieves 95% accuracy but rarely identifies actual defects. What is the MOST effective approach to improve defect detection?

Increase the training budget to allow AutoML more time to learn defect patterns Use data augmentation techniques to artificially increase the number of defect images and balance the dataset before training Change the optimization objective in AutoML to maximize recall instead of accuracy Collect more normal product images to improve the model's understanding of non-defective items

Explanation

This is a classic imbalanced classification problem where high accuracy is misleading. Changing the optimization objective to maximize recall ensures the model prioritizes identifying defects (positive class) over overall accuracy. AutoML allows you to specify different optimization objectives. Option A won't solve the fundamental imbalance issue. Option B, while helpful, requires additional preprocessing work and AutoML already applies some augmentation; the key issue is the optimization metric. Option D would worsen the imbalance problem and make defect detection even harder.

Question 4 of 10

Your organization needs to train an AutoML Natural Language model for sentiment analysis on customer reviews. The reviews contain personally identifiable information (PII) like customer names, email addresses, and phone numbers. What is the BEST practice for handling this sensitive data while maintaining model performance?

Enable Cloud Data Loss Prevention (DLP) API to automatically detect and redact PII before importing data to AutoML Train the model with raw data and rely on AutoML's built-in PII protection features Manually remove all PII using regex patterns, then import the cleaned data to AutoML Use Vertex AI Workbench to anonymize data and store it in a separate BigQuery dataset before training

Explanation

Cloud DLP API provides comprehensive, automated PII detection and redaction capabilities that integrate well with AutoML workflows. It can identify various PII types accurately and mask them while preserving text structure for sentiment analysis. Option B is incorrect because AutoML doesn't have built-in PII protection features; you must clean data beforehand. Option C is problematic because regex patterns may miss complex PII patterns and require extensive maintenance. Option D is more complex than necessary and Vertex AI Workbench is primarily for data science notebooks, not automated PII handling.

Question 5 of 10

You are preparing video data to train an AutoML Video Classification model that categorizes sports activities. Your dataset contains 2,000 videos ranging from 30 seconds to 10 minutes in length, with various resolutions and formats. Which approach ensures optimal data preparation for AutoML Video?

Standardize all videos to the same resolution, length, and format before uploading to Cloud Storage Upload videos in their original formats to Cloud Storage and provide a CSV with video URIs and labels Extract frames at 1-second intervals from each video and train an AutoML Vision image classification model instead Use Dataflow to transcode all videos to a consistent format and create uniform 1-minute segments

Explanation

AutoML Video accepts videos in various formats and automatically handles preprocessing, including standardization and frame extraction. You only need to upload videos to Cloud Storage and provide a properly formatted CSV with video URIs and labels. Option A is unnecessary because AutoML handles format variations automatically. Option C loses temporal information critical for video classification and unnecessarily increases data preparation complexity. Option D is overcomplicated and segmenting videos could split important action sequences, reducing model accuracy.

Question 6 of 10

Your company is using AutoML Tables to predict loan default risk. During model training configuration, you have the option to specify a budget between 1-1000 node hours. Your dataset has 100,000 rows and 40 features. Early experiments with a 1-node-hour budget yielded an AUC-ROC of 0.72. What is the MOST cost-effective strategy to improve model performance?

Immediately increase the budget to 1000 node hours to maximize model quality Incrementally increase the budget to 8-10 node hours and evaluate performance improvements before investing more Keep the 1-node-hour budget but reduce the number of features to speed up training iterations Split the data into smaller subsets and train multiple models with 1-node-hour budgets each

Explanation

AutoML Tables shows diminishing returns with increased training budget. Starting with 8-10 node hours typically provides significant improvements over 1 hour, allowing you to assess whether additional budget is justified. Option A is wasteful as returns diminish significantly after a certain point, and 1000 hours would be extremely expensive with likely minimal improvement. Option C may hurt performance as AutoML handles feature selection well, and premature feature reduction could remove important signals. Option D creates operational complexity and loses the benefit of training on the full dataset.

Question 7 of 10

You are implementing a Tabular Workflow in Vertex AI AutoML to predict product recommendations. Your source data is in BigQuery and updates daily. You want to ensure new data is automatically incorporated into model retraining. Which architecture provides the MOST automated and maintainable solution?

Create a Cloud Function triggered by BigQuery table updates that exports data to CSV and initiates AutoML training via REST API Use Vertex AI Pipelines to orchestrate data extraction from BigQuery, feature engineering, and AutoML training on a scheduled basis Set up a Cloud Scheduler job that runs a script to manually export BigQuery data and upload to AutoML daily Configure BigQuery ML to train the model directly and use AutoML only for initial model prototyping

Explanation

Vertex AI Pipelines provides a fully managed, orchestrated workflow that can handle data extraction, transformation, and AutoML training with proper monitoring, versioning, and error handling. It's specifically designed for ML workflow automation. Option A is fragile because Cloud Functions have execution time limits and lack robust orchestration capabilities for complex ML workflows. Option C requires manual scripting and lacks the robustness, monitoring, and versioning that Pipelines provides. Option D misunderstands the use case—if AutoML is the chosen solution, it should be used for production, not just prototyping.

Question 8 of 10

Your team trained an AutoML Natural Language entity extraction model to identify product names, prices, and categories from customer inquiries. After deployment, you notice the model performs poorly on recent queries about a new product line launched last month. What is the BEST approach to address this issue?

Increase the model serving resources to improve inference performance Collect and label new examples containing the new product line, add them to the training dataset, and retrain the model Adjust the confidence threshold for predictions to capture more entities Switch to a pre-trained foundation model that has more general knowledge

Explanation

The issue is data drift—the model hasn't seen examples of the new product line during training. The solution is to collect labeled examples representing the new data distribution and retrain. This is a fundamental principle of maintaining ML models in production. Option A addresses serving performance, not model accuracy. Option C might increase recall but will also increase false positives and doesn't address the root cause (lack of training data). Option D is incorrect because pre-trained foundation models wouldn't have knowledge of your company's specific new product line either.

Question 9 of 10

You are preparing an image dataset for AutoML Vision object detection to identify safety equipment (helmets, vests, gloves) in construction site photos. Your dataset contains 5,000 images with bounding box annotations. During data validation, you discover that 800 images have multiple overlapping bounding boxes for the same object. How should you handle this before training?

Remove all 800 images with overlapping boxes to ensure clean training data Clean the annotations by removing duplicate/overlapping boxes, keeping only the most accurate bounding box per object Leave the data as-is since AutoML Vision can handle overlapping annotations automatically Split overlapping boxes into separate object instances to provide more training examples

Explanation

Overlapping bounding boxes for the same object create conflicting labels that confuse the model during training. The correct approach is to clean annotations by keeping only one accurate bounding box per object instance. Option A wastes valuable training data—15% of your dataset—when the issue can be fixed through annotation cleaning. Option C is incorrect because overlapping boxes for the same object are annotation errors, not valid multiple detections, and will negatively impact training. Option D would incorrectly teach the model that one object is multiple objects, degrading detection quality.

Question 10 of 10

Your organization is using AutoML Tables to predict customer lifetime value (CLV). After training, you need to explain individual predictions to the business team. The model uses 30 features including demographics, purchase history, and engagement metrics. Which AutoML feature provides the MOST useful information for understanding individual predictions?

Global feature importance scores showing which features matter most across all predictions Local feature attributions (Shapley values) showing which features influenced a specific prediction Confusion matrix showing overall model performance across different CLV ranges Model architecture details explaining the neural network layers used internally

Explanation

Local feature attributions (Shapley values) explain individual predictions by showing how each feature contributed to that specific prediction, which is exactly what's needed to explain predictions to business stakeholders. Option A provides global understanding but doesn't explain individual predictions. Option C shows aggregate performance metrics but doesn't explain why a specific customer received their CLV prediction. Option D is too technical and AutoML abstracts away architecture details; moreover, architecture doesn't explain individual prediction logic in a business-friendly way.

Welcome Back

Terms of Service & Privacy Notice

Create Account

Verify Your Phone Number

Reset Your Password

Login Required

Get in Touch

Training models by using AutoML