Architecting Low-Code AI Solutions - Quiz

Question 1 of 10

You are preparing a dataset for AutoML Tables to predict customer churn. Your dataset contains 50,000 rows with 30 features, including customer_id, timestamp of last login, and various behavioral metrics. Which features should you exclude or handle differently before training?

Exclude customer_id and use timestamp as-is since AutoML handles all data types automatically Exclude customer_id and engineer time-based features from timestamp (e.g., days_since_last_login) before training Keep all features including customer_id since AutoML's feature selection will automatically identify irrelevant features Only exclude timestamp since it's a temporal feature that AutoML Tables cannot process

Explanation

The correct answer is to exclude customer_id and engineer time-based features from timestamp. Customer_id is a unique identifier that doesn't provide predictive value and can cause overfitting. While AutoML can handle timestamps, deriving meaningful features like 'days_since_last_login' provides more predictive power. Option A is wrong because keeping raw timestamps without feature engineering misses valuable information. Option C is incorrect because customer_id should always be excluded as it's a unique identifier. Option D is wrong because AutoML Tables can handle timestamps, and excluding them would lose valuable temporal information.

Question 2 of 10

Your team is using Vertex AI AutoML to train an image classification model for defect detection in manufacturing. You have 10,000 labeled images across 8 defect categories, but one category (hairline cracks) has only 200 examples while others have 1,000+ examples. What is the BEST approach to improve model performance for the underrepresented class?

Use AutoML's automatic class balancing feature and additionally apply data augmentation techniques specifically for the hairline crack images before uploading Reduce the number of images in other categories to match the 200 examples in the hairline crack category Train the model as-is and rely solely on AutoML's built-in class weighting to handle the imbalance Remove the hairline crack category entirely and create a separate binary classifier for it later

Explanation

The best approach is to use AutoML's automatic class balancing combined with data augmentation for underrepresented classes. Data augmentation (rotation, flipping, brightness adjustment) can artificially increase the training samples for hairline cracks while AutoML's class balancing ensures fair learning. Option B (downsampling) wastes valuable data from other categories and reduces overall model performance. Option C relies only on class weighting which may not be sufficient for such severe imbalance (5:1 ratio). Option D creates unnecessary complexity and loses the benefit of multi-class learning where the model can learn distinguishing features across all defect types simultaneously.

Question 3 of 10

You need to create a time-series forecasting model using AutoML to predict weekly sales for 500 retail stores. Your dataset spans 3 years with weekly granularity. Which configuration approach will yield the most accurate forecasts?

Set the forecast horizon to 52 weeks and use store_id as the time series identifier without any additional context variables Set the forecast horizon to 4-12 weeks, use store_id as the time series identifier, and include relevant context variables like promotions, holidays, and store location Aggregate all stores into a single time series to have more data points and set the forecast horizon to 52 weeks Set the forecast horizon to 1 week and retrain the model weekly to always have the most recent data

Explanation

The correct approach is to set a reasonable forecast horizon (4-12 weeks), use store_id as the time series identifier, and include context variables. Forecasting accuracy degrades with longer horizons, so 4-12 weeks is more practical than 52 weeks. Context variables (promotions, holidays, location) provide crucial external signals that improve predictions. Option A's 52-week horizon is too long and lacks context. Option C aggregating all stores loses store-specific patterns and is inappropriate for multi-series forecasting. Option D with 1-week horizon and weekly retraining is operationally expensive and doesn't leverage AutoML's ability to learn longer-term patterns, though it might work for very short-term needs.

Question 4 of 10

Your AutoML text classification model for customer support tickets shows 95% accuracy overall, but when deployed, it performs poorly on tickets related to a new product line launched after training. What is the MOST effective strategy to address this issue?

Increase the training budget to allow AutoML to train longer and learn more complex patterns Collect and label new tickets from the new product line, add them to the training dataset, and retrain the model Adjust the confidence threshold for predictions to be more conservative Enable AutoML's automatic model refresh feature to continuously improve the model

Explanation

The most effective solution is to collect and label new data from the new product line and retrain the model. This is a classic case of training-serving skew where the production data distribution differs from training data. The model has never seen examples from the new product line, so it cannot classify them accurately regardless of accuracy on old data. Option A (increasing training budget) won't help since the model has no relevant training data. Option C (adjusting confidence threshold) only changes when the model makes predictions, not the quality of those predictions. Option D (automatic model refresh) isn't a real feature in AutoML; models need explicit retraining with new data to adapt to distribution changes.

Question 5 of 10

You are preparing a dataset for AutoML Vision to detect objects in retail store images. Your dataset contains images at various resolutions from 800x600 to 4000x3000 pixels. How should you handle image resolution before uploading to Vertex AI AutoML?

Manually resize all images to a uniform resolution of 1024x1024 pixels to ensure consistency Upload images in their original resolutions; AutoML Vision automatically handles preprocessing and resizing appropriately Downscale all images to 640x480 to reduce training time and costs Split the dataset into separate models based on resolution ranges to maintain image quality

Explanation

The correct approach is to upload images in their original resolutions and let AutoML Vision handle preprocessing. AutoML Vision automatically resizes and preprocesses images optimally based on the model architecture and training requirements. It preserves important details while normalizing inputs appropriately. Option A (manual resizing to 1024x1024) is unnecessary and may introduce distortion if aspect ratios don't match. Option C (downscaling to 640x480) may lose important details needed for object detection. Option D (splitting by resolution) creates unnecessary complexity, reduces training data per model, and doesn't leverage AutoML's built-in capability to handle varied image sizes.

Question 6 of 10

Your AutoML Tables model for predicting loan defaults has been trained successfully, but during evaluation you notice that precision is 0.85 while recall is only 0.45. Your business requirement is to identify at least 80% of potential defaults (even if it means more false positives). What should you do?

Retrain the model with a higher training budget to improve overall performance metrics Adjust the prediction confidence threshold lower to increase recall at the expense of precision Add more features related to credit history and retrain the model Use ensemble methods by training multiple AutoML models and combining their predictions

Explanation

The correct solution is to adjust the prediction confidence threshold lower to increase recall. AutoML models output probability scores, and lowering the classification threshold will classify more cases as positive (defaults), increasing recall at the expense of precision. This directly addresses the business requirement of identifying 80% of defaults. Option A (higher training budget) might improve overall performance but doesn't specifically address the precision-recall trade-off. Option C (adding features) could help but requires data collection and retraining, and doesn't guarantee higher recall specifically. Option D (ensemble methods) is not a standard AutoML Tables feature and adds unnecessary complexity when threshold adjustment directly solves the problem.

Question 7 of 10

You need to train an AutoML video classification model to categorize surveillance footage into 'normal activity', 'suspicious activity', and 'emergency'. Your dataset consists of 1000 videos ranging from 30 seconds to 5 minutes in length. What is the recommended approach for preparing this data?

Split longer videos into uniform 10-second segments and label each segment individually to create more training examples Use the videos as-is with their original lengths, ensuring each video has a single label for the predominant activity Extract key frames from each video and convert the problem to image classification using AutoML Vision Standardize all videos to exactly 1-minute duration by trimming or looping content

Explanation

The recommended approach is to use videos as-is with their original lengths and single labels. AutoML Video Intelligence can handle variable-length videos and automatically extracts temporal features across the entire video. Each video should have one predominant label representing the overall content. Option A (splitting into segments) could work but requires careful labeling of each segment and may lose temporal context across segment boundaries. Option C (extracting key frames) loses the critical temporal information that distinguishes video classification from image classification, missing motion patterns crucial for detecting activities. Option D (standardizing duration) artificially manipulates content through trimming or looping, potentially introducing artifacts and losing important information.

Question 8 of 10

Your organization wants to build a custom entity extraction model using AutoML Natural Language to identify specific product codes and internal terminology from customer emails. You have 5,000 emails with annotations. What is the minimum additional information required for each annotation to properly train the model?

Only the entity text spans (start and end positions) within each email The entity text spans and their corresponding entity type labels (e.g., PRODUCT_CODE, INTERNAL_TERM) The entity text, entity type, confidence score, and entity relationships The entity text and a detailed description explaining why each entity was labeled

Explanation

AutoML Natural Language for entity extraction requires entity text spans (start/end positions) and their corresponding entity type labels. The model learns to recognize text patterns and associate them with specific entity types you define. Option A (only text spans) is insufficient because the model needs to know what type of entity each span represents. Option C includes confidence scores and relationships which are model outputs, not required training inputs - AutoML generates confidence scores, and entity relationships are not part of basic entity extraction training. Option D (detailed descriptions) is not used by AutoML; the model learns from labeled examples, not textual explanations of labeling decisions.

Question 9 of 10

You are using Tabular Workflows in Vertex AI AutoML to build a model predicting house prices. During the data validation step, AutoML flags several columns as having high cardinality (>10,000 unique values) and suggests transformations. The flagged columns include property_id, latitude, longitude, and zip_code. How should you handle these features?

Remove all flagged high-cardinality features as they will cause overfitting Keep property_id and zip_code as-is since AutoML can handle high cardinality; remove latitude and longitude as they're redundant with zip_code Remove property_id; keep zip_code with hash bucketing or as categorical; keep latitude and longitude as numerical features since they're valid coordinates Convert all high-cardinality features to embeddings before training

Explanation

The correct approach is to remove property_id (unique identifier with no predictive value), keep zip_code as a categorical feature (potentially with hash bucketing for efficiency), and keep latitude/longitude as numerical features since they're valid coordinates that provide granular location information. Different cardinality features need different treatments based on their nature. Option A is too aggressive - not all high-cardinality features should be removed. Option B is wrong to keep property_id (overfitting risk) and wrong to remove coordinates which provide more precise location data than zip_code alone. Option D (converting to embeddings) is not necessary with AutoML Tables which handles categorical features appropriately, and embeddings are more relevant for text/NLP tasks.

Question 10 of 10

You've trained an AutoML model for document classification, and during the debugging phase, you notice the model has high training accuracy (98%) but much lower validation accuracy (72%). The confusion matrix shows the model performs well on 3 out of 5 classes but poorly on 2 classes. What is the MOST likely cause and appropriate solution?

The model is underfitting; increase the training budget and model complexity The model is overfitting; apply regularization techniques and ensure the two poorly performing classes have sufficient representative training examples The validation set is too small; increase its size by reducing the training set proportion There is data leakage between training and validation sets; recreate the data split randomly

Explanation

This is a classic overfitting scenario - high training accuracy with significantly lower validation accuracy indicates the model memorized training data rather than learning generalizable patterns. The poor performance on 2 specific classes suggests those classes may have insufficient or non-representative training data. The solution is to apply regularization (which AutoML does automatically but may need stronger settings) and ensure adequate, diverse training examples for underperforming classes. Option A (underfitting) is incorrect because underfitting shows poor performance on both training and validation sets. Option C (increasing validation size) doesn't address the root cause of overfitting. Option D (data leakage) would actually cause artificially HIGH validation accuracy, not low, as the model would have seen validation data during training.

Welcome Back

Terms of Service & Privacy Notice

Create Account

Verify Your Phone Number

Reset Your Password

Login Required

Get in Touch

Training models by using AutoML