⚡ Free Classes and Scholarships Available for Underprivileged Students -

Training models by using AutoML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
You are preparing a tabular dataset in BigQuery for AutoML Tables training to predict customer churn. Your dataset contains 50,000 rows with 25 features, including customer demographics, transaction history, and support ticket counts. During initial exploration, you notice that 3 features have more than 40% missing values. What is the BEST approach to handle these features before training with AutoML Tables?
Explanation
AutoML Tables automatically handles missing values during training using sophisticated imputation techniques based on the data distribution and relationships between features. Option A is incorrect because removing 40% of rows would significantly reduce the training dataset size and potentially introduce bias. Option B is unnecessary because AutoML handles this automatically, and manual imputation might not be as sophisticated as AutoML's approach. Option D is incorrect because these features might still contain valuable predictive signals in the 60% of cases where they are present; AutoML can leverage partial information effectively.
Question 2 of 10
Your team is building a time-series forecasting model using AutoML to predict weekly product demand for the next 12 weeks. Your dataset spans 3 years of historical sales data with multiple products across different regions. Which feature engineering approach is MOST appropriate when preparing data for AutoML forecasting?
Explanation
AutoML forecasting automatically extracts temporal features including trends, seasonality, and lag features when you properly identify the time column and configure the forecast horizon. Option A is incorrect because manual feature engineering for time series is redundant and may interfere with AutoML's automated feature extraction. Option C is incorrect because aggregating different products loses granular patterns and reduces model accuracy. Option D is incorrect because AutoML can handle multiple time series simultaneously and will learn patterns across different product-region combinations, which is more efficient than training separate models.
Question 3 of 10
You are using AutoML Vision to train an image classification model to identify defective products on a manufacturing line. You have collected 10,000 images but only 500 show defects (5% of total). During model evaluation, you notice the model achieves 95% accuracy but rarely identifies actual defects. What is the MOST effective approach to improve defect detection?
Explanation
This is a classic imbalanced classification problem where high accuracy is misleading. Changing the optimization objective to maximize recall ensures the model prioritizes identifying defects (positive class) over overall accuracy. AutoML allows you to specify different optimization objectives. Option A won't solve the fundamental imbalance issue. Option B, while helpful, requires additional preprocessing work and AutoML already applies some augmentation; the key issue is the optimization metric. Option D would worsen the imbalance problem and make defect detection even harder.
Question 4 of 10
Your organization needs to train an AutoML Natural Language model for sentiment analysis on customer reviews. The reviews contain personally identifiable information (PII) like customer names, email addresses, and phone numbers. What is the BEST practice for handling this sensitive data while maintaining model performance?
Explanation
Cloud DLP API provides comprehensive, automated PII detection and redaction capabilities that integrate well with AutoML workflows. It can identify various PII types accurately and mask them while preserving text structure for sentiment analysis. Option B is incorrect because AutoML doesn't have built-in PII protection features; you must clean data beforehand. Option C is problematic because regex patterns may miss complex PII patterns and require extensive maintenance. Option D is more complex than necessary and Vertex AI Workbench is primarily for data science notebooks, not automated PII handling.
Question 5 of 10
You are preparing video data to train an AutoML Video Classification model that categorizes sports activities. Your dataset contains 2,000 videos ranging from 30 seconds to 10 minutes in length, with various resolutions and formats. Which approach ensures optimal data preparation for AutoML Video?
Explanation
AutoML Video accepts videos in various formats and automatically handles preprocessing, including standardization and frame extraction. You only need to upload videos to Cloud Storage and provide a properly formatted CSV with video URIs and labels. Option A is unnecessary because AutoML handles format variations automatically. Option C loses temporal information critical for video classification and unnecessarily increases data preparation complexity. Option D is overcomplicated and segmenting videos could split important action sequences, reducing model accuracy.
Question 6 of 10
Your company is using AutoML Tables to predict loan default risk. During model training configuration, you have the option to specify a budget between 1-1000 node hours. Your dataset has 100,000 rows and 40 features. Early experiments with a 1-node-hour budget yielded an AUC-ROC of 0.72. What is the MOST cost-effective strategy to improve model performance?
Explanation
AutoML Tables shows diminishing returns with increased training budget. Starting with 8-10 node hours typically provides significant improvements over 1 hour, allowing you to assess whether additional budget is justified. Option A is wasteful as returns diminish significantly after a certain point, and 1000 hours would be extremely expensive with likely minimal improvement. Option C may hurt performance as AutoML handles feature selection well, and premature feature reduction could remove important signals. Option D creates operational complexity and loses the benefit of training on the full dataset.
Question 7 of 10
You are implementing a Tabular Workflow in Vertex AI AutoML to predict product recommendations. Your source data is in BigQuery and updates daily. You want to ensure new data is automatically incorporated into model retraining. Which architecture provides the MOST automated and maintainable solution?
Explanation
Vertex AI Pipelines provides a fully managed, orchestrated workflow that can handle data extraction, transformation, and AutoML training with proper monitoring, versioning, and error handling. It's specifically designed for ML workflow automation. Option A is fragile because Cloud Functions have execution time limits and lack robust orchestration capabilities for complex ML workflows. Option C requires manual scripting and lacks the robustness, monitoring, and versioning that Pipelines provides. Option D misunderstands the use case—if AutoML is the chosen solution, it should be used for production, not just prototyping.
Question 8 of 10
Your team trained an AutoML Natural Language entity extraction model to identify product names, prices, and categories from customer inquiries. After deployment, you notice the model performs poorly on recent queries about a new product line launched last month. What is the BEST approach to address this issue?
Explanation
The issue is data drift—the model hasn't seen examples of the new product line during training. The solution is to collect labeled examples representing the new data distribution and retrain. This is a fundamental principle of maintaining ML models in production. Option A addresses serving performance, not model accuracy. Option C might increase recall but will also increase false positives and doesn't address the root cause (lack of training data). Option D is incorrect because pre-trained foundation models wouldn't have knowledge of your company's specific new product line either.
Question 9 of 10
You are preparing an image dataset for AutoML Vision object detection to identify safety equipment (helmets, vests, gloves) in construction site photos. Your dataset contains 5,000 images with bounding box annotations. During data validation, you discover that 800 images have multiple overlapping bounding boxes for the same object. How should you handle this before training?
Explanation
Overlapping bounding boxes for the same object create conflicting labels that confuse the model during training. The correct approach is to clean annotations by keeping only one accurate bounding box per object instance. Option A wastes valuable training data—15% of your dataset—when the issue can be fixed through annotation cleaning. Option C is incorrect because overlapping boxes for the same object are annotation errors, not valid multiple detections, and will negatively impact training. Option D would incorrectly teach the model that one object is multiple objects, degrading detection quality.
Question 10 of 10
Your organization is using AutoML Tables to predict customer lifetime value (CLV). After training, you need to explain individual predictions to the business team. The model uses 30 features including demographics, purchase history, and engagement metrics. Which AutoML feature provides the MOST useful information for understanding individual predictions?
Explanation
Local feature attributions (Shapley values) explain individual predictions by showing how each feature contributed to that specific prediction, which is exactly what's needed to explain predictions to business stakeholders. Option A provides global understanding but doesn't explain individual predictions. Option C shows aggregate performance metrics but doesn't explain why a specific customer received their CLV prediction. Option D is too technical and AutoML abstracts away architecture details; moreover, architecture doesn't explain individual prediction logic in a business-friendly way.