⚡ Free Classes and Scholarships Available for Underprivileged Students -

Training models by using AutoML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
You are preparing a dataset for AutoML Tables to predict customer churn. Your dataset contains 50,000 rows with 25 features, including customer_id, account_creation_date, and last_login_timestamp. Which features should you exclude or transform before training?
Explanation
Correct answer: Exclude customer_id and transform timestamps into derived features. Customer_id is a unique identifier with no predictive value and can cause overfitting. Raw timestamps are difficult for models to interpret; they should be transformed into meaningful features like 'days_since_last_login' or 'account_age_days'. While AutoML does automatic feature engineering, providing well-engineered temporal features improves model performance. Option B is incorrect because AutoML cannot effectively use unique identifiers. Option C is incorrect because raw timestamps need transformation. Option D is incorrect because timestamps can be very useful when properly transformed.
Question 2 of 10
Your team is using Vertex AI AutoML to train an image classification model for detecting defects in manufacturing. You have 10,000 labeled images, but the distribution is highly imbalanced: 8,500 images show no defects and only 1,500 show defects across 5 defect categories. What is the BEST approach to improve model performance?
Explanation
Correct answer: Use data augmentation to generate synthetic images and balance the dataset. For image classification with severe class imbalance, data augmentation (rotation, flipping, cropping, brightness adjustment) creates more training examples for underrepresented classes, helping the model learn better features. Option B (adjusting threshold) only works for binary classification and doesn't help the model learn defect features. Option C (removing data) wastes valuable training data and reduces model accuracy. Option D is incorrect because Vertex AI AutoML for images doesn't support custom sample weights; this feature is available for AutoML Tables but not for image classification.
Question 3 of 10
You need to create a demand forecasting model using AutoML for time series data in Vertex AI. Your retail dataset includes daily sales data for 500 products across 50 stores for the past 3 years. Which data organization approach is required for AutoML Forecasting?
Explanation
Correct answer: Organize data with timestamp, identifiers, target variable, and covariates in a single dataset. AutoML Forecasting expects data in long format with a time column, time series identifier columns (product_id, store_id), the target variable (sales), and optional covariates. This allows training a single model that learns patterns across all time series while respecting their individual characteristics. Option A is inefficient and ignores cross-series learning. Option C loses crucial granularity needed for product-store level forecasts. Option D (wide format) is not supported by AutoML Forecasting, which requires long format data.
Question 4 of 10
You are configuring an AutoML Tables model to predict loan default risk. Your dataset contains sensitive features including social_security_number, exact_income, date_of_birth, and credit_score. How should you handle these features to follow responsible AI practices while maintaining model performance?
Explanation
Correct answer: Remove direct identifiers, generalize sensitive features, and retain appropriately transformed data. Social security numbers are direct identifiers with no predictive value and pose privacy risks. Exact income and date of birth should be generalized to income brackets and age ranges to reduce privacy exposure while retaining predictive power. Credit score is already a derived metric and appropriate to use. Option B violates privacy principles and risks exposing PII. Option C is incorrect because encryption doesn't prevent the model from learning from and potentially exposing sensitive patterns. Option D unnecessarily discards valuable predictive information when proper generalization would suffice.
Question 5 of 10
You have trained an AutoML Vision model for detecting product categories from images. After deployment, you notice the model performs poorly on images taken in low-light conditions, even though your training set included various lighting conditions. What is the MOST effective debugging approach?
Explanation
Correct answer: Analyze confusion matrix by metadata and augment training data. The confusion matrix in Vertex AI allows filtering by image characteristics to identify systematic failures. If low-light images are underrepresented in training data, adding more such examples or using augmentation to simulate low-light conditions addresses the root cause. Option B (more training time) won't help if the training distribution doesn't match inference conditions. Option C doesn't address the lighting issue. Option D (inference-only preprocessing) creates train-serve skew and may not match AutoML's internal preprocessing; it's better to ensure training data represents the inference distribution.
Question 6 of 10
Your company wants to build a text classification model to categorize customer support tickets into 15 different categories using AutoML Natural Language. You have 30,000 labeled tickets, but labeling quality is inconsistent as tickets were labeled by different teams over 2 years. What should you prioritize to improve model quality?
Explanation
Correct answer: Perform label quality analysis, establish guidelines, and create a validation set. Inconsistent labels directly degrade model performance. Vertex AI provides label quality metrics to identify problematic examples. Creating clear labeling guidelines and relabeling a validation set ensures you can properly measure model performance. This approach improves both training quality and evaluation reliability. Option B ignores label quality issues that will limit model accuracy. Option C wastes 20,000 labeled examples and assumes recency equals quality without verification. Option D is incorrect because unsupervised methods cannot accurately assign labels to match your specific 15 categories; supervised relabeling is needed.
Question 7 of 10
You are using Vertex AI AutoML Tables to predict equipment failure. Your training dataset is stored in BigQuery and includes a TIMESTAMP column for measurement_time and various sensor readings. The table contains 1 million rows collected over 6 months. What is the recommended approach for creating your training dataset?
Explanation
Correct answer: Import directly from BigQuery to Vertex AI managed dataset. Vertex AI AutoML Tables natively supports BigQuery as a data source, automatically handling large datasets efficiently without requiring exports. Direct import preserves data types, including timestamps, and maintains data lineage. Option B (CSV export) is unnecessary, time-consuming, risks data type loss, and requires additional storage. Option C (BigQuery ML) is a different service with different capabilities; the question specifically asks about AutoML Tables. Option D is incorrect because AutoML Tables expects static training datasets, not streaming data; you import a snapshot of data for training.
Question 8 of 10
You need to train an AutoML forecasting model to predict website traffic for the next 30 days. Your dataset includes 2 years of daily traffic data along with marketing_spend, holidays, and day_of_week features. During model configuration, how should you specify the forecast horizon and context window?
Explanation
Correct answer: Set forecast horizon to 30 days and context window to at least 30 days, ideally longer. The forecast horizon should match your business requirement (30 days ahead). The context window (how much historical data the model uses) should be at least as long as the forecast horizon, but typically longer to capture patterns like weekly seasonality, monthly trends, or yearly patterns. For daily data, a context window of 60-90 days or more helps capture these patterns. Option B's matching windows is too restrictive. Option C's 365-day forecast horizon doesn't match requirements and 7-day context is too short for 30-day forecasts. Option D is incorrect; these are required configuration parameters that you must specify.
Question 9 of 10
You are preparing video data for training an AutoML Video Classification model to detect different types of customer interactions in retail stores. You have 500 hours of video footage. What is the correct data preparation approach for AutoML Video?
Explanation
Correct answer: Split videos into labeled clips and provide a CSV with URIs and labels. AutoML Video Classification expects videos segmented into meaningful clips (typically 10 seconds to 5 minutes) with each clip labeled. The CSV manifest file contains Cloud Storage URIs pointing to video files and their corresponding labels. This format allows the model to learn temporal patterns within clips. Option B is incorrect because AutoML Video Classification works with clip-level labels, not frame-level annotations (that would be for Video Object Tracking). Option C loses temporal information that's crucial for video understanding. Option D is incorrect because AutoML Video requires supervised learning with labeled data.
Question 10 of 10
After training an AutoML Tables model, you notice the model's precision is 0.92 but recall is only 0.45 for predicting fraudulent transactions. Your business requirement is to catch most fraud cases even if it means more false positives. How should you configure the model for deployment?
Explanation
Correct answer: Adjust the classification threshold lower to increase recall. For binary classification, AutoML provides a default threshold (typically 0.5), but you can adjust it based on business needs. Lowering the threshold (e.g., to 0.3) will classify more cases as fraud, increasing recall (catching more fraud) while decreasing precision (more false positives). This is adjustable at deployment time without retraining. Option B doesn't address the precision-recall tradeoff. Option C ignores the business requirement for high recall. Option D is inefficient; while AutoML allows selecting optimization objectives (AUC-PR, AUC-ROC), adjusting the threshold post-training is faster and gives you more control over the specific precision-recall tradeoff.