Architecting Low-Code AI Solutions - Quiz

Question 1 of 10

You are preparing a dataset for AutoML Tables to predict customer churn. Your dataset contains 50,000 rows with 25 features, including customer_id, account_creation_date, and last_login_timestamp. Which features should you exclude or transform before training?

Exclude customer_id and transform timestamps into derived features like days_since_last_login Keep all features as AutoML Tables automatically handles feature selection Only exclude customer_id and keep all timestamps as-is Exclude all timestamp features as they cannot be used in AutoML Tables

Explanation

Correct answer: Exclude customer_id and transform timestamps into derived features. Customer_id is a unique identifier with no predictive value and can cause overfitting. Raw timestamps are difficult for models to interpret; they should be transformed into meaningful features like 'days_since_last_login' or 'account_age_days'. While AutoML does automatic feature engineering, providing well-engineered temporal features improves model performance. Option B is incorrect because AutoML cannot effectively use unique identifiers. Option C is incorrect because raw timestamps need transformation. Option D is incorrect because timestamps can be very useful when properly transformed.

Question 2 of 10

Your team is using Vertex AI AutoML to train an image classification model for detecting defects in manufacturing. You have 10,000 labeled images, but the distribution is highly imbalanced: 8,500 images show no defects and only 1,500 show defects across 5 defect categories. What is the BEST approach to improve model performance?

Use data augmentation techniques to generate synthetic images of defect categories and balance the dataset before training Train the model as-is and adjust the classification threshold during inference Remove some non-defect images to balance the dataset to 1,500 images per class Add a sample weight column in your dataset to increase the importance of defect images

Explanation

Correct answer: Use data augmentation to generate synthetic images and balance the dataset. For image classification with severe class imbalance, data augmentation (rotation, flipping, cropping, brightness adjustment) creates more training examples for underrepresented classes, helping the model learn better features. Option B (adjusting threshold) only works for binary classification and doesn't help the model learn defect features. Option C (removing data) wastes valuable training data and reduces model accuracy. Option D is incorrect because Vertex AI AutoML for images doesn't support custom sample weights; this feature is available for AutoML Tables but not for image classification.

Question 3 of 10

You need to create a demand forecasting model using AutoML for time series data in Vertex AI. Your retail dataset includes daily sales data for 500 products across 50 stores for the past 3 years. Which data organization approach is required for AutoML Forecasting?

Create separate datasets for each product-store combination and train 25,000 individual models Organize data with columns for timestamp, product_id, store_id, sales, and relevant covariates in a single dataset with time series identifiers Aggregate all sales data by timestamp only, removing product and store dimensions Create a wide-format dataset with each product-store combination as a separate column

Explanation

Correct answer: Organize data with timestamp, identifiers, target variable, and covariates in a single dataset. AutoML Forecasting expects data in long format with a time column, time series identifier columns (product_id, store_id), the target variable (sales), and optional covariates. This allows training a single model that learns patterns across all time series while respecting their individual characteristics. Option A is inefficient and ignores cross-series learning. Option C loses crucial granularity needed for product-store level forecasts. Option D (wide format) is not supported by AutoML Forecasting, which requires long format data.

Question 4 of 10

You are configuring an AutoML Tables model to predict loan default risk. Your dataset contains sensitive features including social_security_number, exact_income, date_of_birth, and credit_score. How should you handle these features to follow responsible AI practices while maintaining model performance?

Remove social_security_number, generalize exact_income into income_bracket, convert date_of_birth to age_range, and keep credit_score Keep all features as they provide valuable predictive signals Encrypt all sensitive features before uploading to Vertex AI Remove all sensitive features and rely only on non-sensitive data

Explanation

Correct answer: Remove direct identifiers, generalize sensitive features, and retain appropriately transformed data. Social security numbers are direct identifiers with no predictive value and pose privacy risks. Exact income and date of birth should be generalized to income brackets and age ranges to reduce privacy exposure while retaining predictive power. Credit score is already a derived metric and appropriate to use. Option B violates privacy principles and risks exposing PII. Option C is incorrect because encryption doesn't prevent the model from learning from and potentially exposing sensitive patterns. Option D unnecessarily discards valuable predictive information when proper generalization would suffice.

Question 5 of 10

You have trained an AutoML Vision model for detecting product categories from images. After deployment, you notice the model performs poorly on images taken in low-light conditions, even though your training set included various lighting conditions. What is the MOST effective debugging approach?

Analyze the confusion matrix filtered by image metadata to identify if low-light images form a distinct failure pattern, then augment training data with more low-light examples Increase the training budget to allow AutoML to train longer Reduce the number of product categories to simplify the classification task Apply brightness normalization as a preprocessing step during inference only

Explanation

Correct answer: Analyze confusion matrix by metadata and augment training data. The confusion matrix in Vertex AI allows filtering by image characteristics to identify systematic failures. If low-light images are underrepresented in training data, adding more such examples or using augmentation to simulate low-light conditions addresses the root cause. Option B (more training time) won't help if the training distribution doesn't match inference conditions. Option C doesn't address the lighting issue. Option D (inference-only preprocessing) creates train-serve skew and may not match AutoML's internal preprocessing; it's better to ensure training data represents the inference distribution.

Question 6 of 10

Your company wants to build a text classification model to categorize customer support tickets into 15 different categories using AutoML Natural Language. You have 30,000 labeled tickets, but labeling quality is inconsistent as tickets were labeled by different teams over 2 years. What should you prioritize to improve model quality?

Perform label quality analysis to identify inconsistently labeled examples, establish clear labeling guidelines, and relabel a representative sample for validation Train the model immediately with all 30,000 labels to maximize training data volume Use only the most recently labeled 10,000 tickets assuming newer labels are more accurate Apply unsupervised clustering first to automatically relabel all tickets

Explanation

Correct answer: Perform label quality analysis, establish guidelines, and create a validation set. Inconsistent labels directly degrade model performance. Vertex AI provides label quality metrics to identify problematic examples. Creating clear labeling guidelines and relabeling a validation set ensures you can properly measure model performance. This approach improves both training quality and evaluation reliability. Option B ignores label quality issues that will limit model accuracy. Option C wastes 20,000 labeled examples and assumes recency equals quality without verification. Option D is incorrect because unsupervised methods cannot accurately assign labels to match your specific 15 categories; supervised relabeling is needed.

Question 7 of 10

You are using Vertex AI AutoML Tables to predict equipment failure. Your training dataset is stored in BigQuery and includes a TIMESTAMP column for measurement_time and various sensor readings. The table contains 1 million rows collected over 6 months. What is the recommended approach for creating your training dataset?

Create a Vertex AI managed dataset by importing directly from BigQuery, ensuring the schema is correctly interpreted with appropriate data types Export the BigQuery table to CSV files in Cloud Storage, then import the CSV to Vertex AI Use the BigQuery ML CREATE MODEL statement instead of AutoML Tables Create a Dataflow pipeline to stream data directly to AutoML during training

Explanation

Correct answer: Import directly from BigQuery to Vertex AI managed dataset. Vertex AI AutoML Tables natively supports BigQuery as a data source, automatically handling large datasets efficiently without requiring exports. Direct import preserves data types, including timestamps, and maintains data lineage. Option B (CSV export) is unnecessary, time-consuming, risks data type loss, and requires additional storage. Option C (BigQuery ML) is a different service with different capabilities; the question specifically asks about AutoML Tables. Option D is incorrect because AutoML Tables expects static training datasets, not streaming data; you import a snapshot of data for training.

Question 8 of 10

You need to train an AutoML forecasting model to predict website traffic for the next 30 days. Your dataset includes 2 years of daily traffic data along with marketing_spend, holidays, and day_of_week features. During model configuration, how should you specify the forecast horizon and context window?

Set forecast horizon to 30 days and context window to at least 30 days, ideally longer to capture seasonal patterns Set forecast horizon to 30 days and context window to exactly 30 days to match Set forecast horizon to 365 days to give the model flexibility and context window to 7 days Let AutoML automatically determine both forecast horizon and context window

Explanation

Correct answer: Set forecast horizon to 30 days and context window to at least 30 days, ideally longer. The forecast horizon should match your business requirement (30 days ahead). The context window (how much historical data the model uses) should be at least as long as the forecast horizon, but typically longer to capture patterns like weekly seasonality, monthly trends, or yearly patterns. For daily data, a context window of 60-90 days or more helps capture these patterns. Option B's matching windows is too restrictive. Option C's 365-day forecast horizon doesn't match requirements and 7-day context is too short for 30-day forecasts. Option D is incorrect; these are required configuration parameters that you must specify.

Question 9 of 10

You are preparing video data for training an AutoML Video Classification model to detect different types of customer interactions in retail stores. You have 500 hours of video footage. What is the correct data preparation approach for AutoML Video?

Split videos into clips of 10 seconds to 5 minutes, label each clip with the interaction type, and provide a CSV with video URIs and labels Keep all videos in their original length and provide frame-by-frame labels Extract individual frames as images and use AutoML Vision instead Provide unlabeled videos and let AutoML automatically detect interaction patterns

Explanation

Correct answer: Split videos into labeled clips and provide a CSV with URIs and labels. AutoML Video Classification expects videos segmented into meaningful clips (typically 10 seconds to 5 minutes) with each clip labeled. The CSV manifest file contains Cloud Storage URIs pointing to video files and their corresponding labels. This format allows the model to learn temporal patterns within clips. Option B is incorrect because AutoML Video Classification works with clip-level labels, not frame-level annotations (that would be for Video Object Tracking). Option C loses temporal information that's crucial for video understanding. Option D is incorrect because AutoML Video requires supervised learning with labeled data.

Question 10 of 10

After training an AutoML Tables model, you notice the model's precision is 0.92 but recall is only 0.45 for predicting fraudulent transactions. Your business requirement is to catch most fraud cases even if it means more false positives. How should you configure the model for deployment?

Adjust the classification threshold lower to increase recall at the expense of precision Retrain the model with a higher training budget Accept the current model as 0.92 precision indicates good performance Change the optimization objective to maximize recall and retrain from scratch

Explanation

Correct answer: Adjust the classification threshold lower to increase recall. For binary classification, AutoML provides a default threshold (typically 0.5), but you can adjust it based on business needs. Lowering the threshold (e.g., to 0.3) will classify more cases as fraud, increasing recall (catching more fraud) while decreasing precision (more false positives). This is adjustable at deployment time without retraining. Option B doesn't address the precision-recall tradeoff. Option C ignores the business requirement for high recall. Option D is inefficient; while AutoML allows selecting optimization objectives (AUC-PR, AUC-ROC), adjusting the threshold post-training is faster and gives you more control over the specific precision-recall tradeoff.

Welcome Back

Terms of Service & Privacy Notice

Create Account

Verify Your Phone Number

Reset Your Password

Login Required

Get in Touch

Training models by using AutoML