⚡ Free Classes and Scholarships Available for Underprivileged Students -

Training models by using AutoML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
You are preparing a dataset for AutoML Tables to predict customer churn. Your dataset contains 50,000 rows with 30 features, including customer_id, timestamp of last login, and various behavioral metrics. Which features should you exclude or handle differently before training?
Explanation
The correct answer is to exclude customer_id and engineer time-based features from timestamp. Customer_id is a unique identifier that doesn't provide predictive value and can cause overfitting. While AutoML can handle timestamps, deriving meaningful features like 'days_since_last_login' provides more predictive power. Option A is wrong because keeping raw timestamps without feature engineering misses valuable information. Option C is incorrect because customer_id should always be excluded as it's a unique identifier. Option D is wrong because AutoML Tables can handle timestamps, and excluding them would lose valuable temporal information.
Question 2 of 10
Your team is using Vertex AI AutoML to train an image classification model for defect detection in manufacturing. You have 10,000 labeled images across 8 defect categories, but one category (hairline cracks) has only 200 examples while others have 1,000+ examples. What is the BEST approach to improve model performance for the underrepresented class?
Explanation
The best approach is to use AutoML's automatic class balancing combined with data augmentation for underrepresented classes. Data augmentation (rotation, flipping, brightness adjustment) can artificially increase the training samples for hairline cracks while AutoML's class balancing ensures fair learning. Option B (downsampling) wastes valuable data from other categories and reduces overall model performance. Option C relies only on class weighting which may not be sufficient for such severe imbalance (5:1 ratio). Option D creates unnecessary complexity and loses the benefit of multi-class learning where the model can learn distinguishing features across all defect types simultaneously.
Question 3 of 10
You need to create a time-series forecasting model using AutoML to predict weekly sales for 500 retail stores. Your dataset spans 3 years with weekly granularity. Which configuration approach will yield the most accurate forecasts?
Explanation
The correct approach is to set a reasonable forecast horizon (4-12 weeks), use store_id as the time series identifier, and include context variables. Forecasting accuracy degrades with longer horizons, so 4-12 weeks is more practical than 52 weeks. Context variables (promotions, holidays, location) provide crucial external signals that improve predictions. Option A's 52-week horizon is too long and lacks context. Option C aggregating all stores loses store-specific patterns and is inappropriate for multi-series forecasting. Option D with 1-week horizon and weekly retraining is operationally expensive and doesn't leverage AutoML's ability to learn longer-term patterns, though it might work for very short-term needs.
Question 4 of 10
Your AutoML text classification model for customer support tickets shows 95% accuracy overall, but when deployed, it performs poorly on tickets related to a new product line launched after training. What is the MOST effective strategy to address this issue?
Explanation
The most effective solution is to collect and label new data from the new product line and retrain the model. This is a classic case of training-serving skew where the production data distribution differs from training data. The model has never seen examples from the new product line, so it cannot classify them accurately regardless of accuracy on old data. Option A (increasing training budget) won't help since the model has no relevant training data. Option C (adjusting confidence threshold) only changes when the model makes predictions, not the quality of those predictions. Option D (automatic model refresh) isn't a real feature in AutoML; models need explicit retraining with new data to adapt to distribution changes.
Question 5 of 10
You are preparing a dataset for AutoML Vision to detect objects in retail store images. Your dataset contains images at various resolutions from 800x600 to 4000x3000 pixels. How should you handle image resolution before uploading to Vertex AI AutoML?
Explanation
The correct approach is to upload images in their original resolutions and let AutoML Vision handle preprocessing. AutoML Vision automatically resizes and preprocesses images optimally based on the model architecture and training requirements. It preserves important details while normalizing inputs appropriately. Option A (manual resizing to 1024x1024) is unnecessary and may introduce distortion if aspect ratios don't match. Option C (downscaling to 640x480) may lose important details needed for object detection. Option D (splitting by resolution) creates unnecessary complexity, reduces training data per model, and doesn't leverage AutoML's built-in capability to handle varied image sizes.
Question 6 of 10
Your AutoML Tables model for predicting loan defaults has been trained successfully, but during evaluation you notice that precision is 0.85 while recall is only 0.45. Your business requirement is to identify at least 80% of potential defaults (even if it means more false positives). What should you do?
Explanation
The correct solution is to adjust the prediction confidence threshold lower to increase recall. AutoML models output probability scores, and lowering the classification threshold will classify more cases as positive (defaults), increasing recall at the expense of precision. This directly addresses the business requirement of identifying 80% of defaults. Option A (higher training budget) might improve overall performance but doesn't specifically address the precision-recall trade-off. Option C (adding features) could help but requires data collection and retraining, and doesn't guarantee higher recall specifically. Option D (ensemble methods) is not a standard AutoML Tables feature and adds unnecessary complexity when threshold adjustment directly solves the problem.
Question 7 of 10
You need to train an AutoML video classification model to categorize surveillance footage into 'normal activity', 'suspicious activity', and 'emergency'. Your dataset consists of 1000 videos ranging from 30 seconds to 5 minutes in length. What is the recommended approach for preparing this data?
Explanation
The recommended approach is to use videos as-is with their original lengths and single labels. AutoML Video Intelligence can handle variable-length videos and automatically extracts temporal features across the entire video. Each video should have one predominant label representing the overall content. Option A (splitting into segments) could work but requires careful labeling of each segment and may lose temporal context across segment boundaries. Option C (extracting key frames) loses the critical temporal information that distinguishes video classification from image classification, missing motion patterns crucial for detecting activities. Option D (standardizing duration) artificially manipulates content through trimming or looping, potentially introducing artifacts and losing important information.
Question 8 of 10
Your organization wants to build a custom entity extraction model using AutoML Natural Language to identify specific product codes and internal terminology from customer emails. You have 5,000 emails with annotations. What is the minimum additional information required for each annotation to properly train the model?
Explanation
AutoML Natural Language for entity extraction requires entity text spans (start/end positions) and their corresponding entity type labels. The model learns to recognize text patterns and associate them with specific entity types you define. Option A (only text spans) is insufficient because the model needs to know what type of entity each span represents. Option C includes confidence scores and relationships which are model outputs, not required training inputs - AutoML generates confidence scores, and entity relationships are not part of basic entity extraction training. Option D (detailed descriptions) is not used by AutoML; the model learns from labeled examples, not textual explanations of labeling decisions.
Question 9 of 10
You are using Tabular Workflows in Vertex AI AutoML to build a model predicting house prices. During the data validation step, AutoML flags several columns as having high cardinality (>10,000 unique values) and suggests transformations. The flagged columns include property_id, latitude, longitude, and zip_code. How should you handle these features?
Explanation
The correct approach is to remove property_id (unique identifier with no predictive value), keep zip_code as a categorical feature (potentially with hash bucketing for efficiency), and keep latitude/longitude as numerical features since they're valid coordinates that provide granular location information. Different cardinality features need different treatments based on their nature. Option A is too aggressive - not all high-cardinality features should be removed. Option B is wrong to keep property_id (overfitting risk) and wrong to remove coordinates which provide more precise location data than zip_code alone. Option D (converting to embeddings) is not necessary with AutoML Tables which handles categorical features appropriately, and embeddings are more relevant for text/NLP tasks.
Question 10 of 10
You've trained an AutoML model for document classification, and during the debugging phase, you notice the model has high training accuracy (98%) but much lower validation accuracy (72%). The confusion matrix shows the model performs well on 3 out of 5 classes but poorly on 2 classes. What is the MOST likely cause and appropriate solution?
Explanation
This is a classic overfitting scenario - high training accuracy with significantly lower validation accuracy indicates the model memorized training data rather than learning generalizable patterns. The poor performance on 2 specific classes suggests those classes may have insufficient or non-representative training data. The solution is to apply regularization (which AutoML does automatically but may need stronger settings) and ensure adequate, diverse training examples for underperforming classes. Option A (underfitting) is incorrect because underfitting shows poor performance on both training and validation sets. Option C (increasing validation size) doesn't address the root cause of overfitting. Option D (data leakage) would actually cause artificially HIGH validation accuracy, not low, as the model would have seen validation data during training.