⚡ Free Classes and Scholarships Available for Underprivileged Students -

Training models by using AutoML

Architecting Low-Code AI Solutions

10 Questions
No time limit
Practice Mode
0%
Score
0
Correct
0
Incorrect
10
Total Questions
Back to Topics
Question 1 of 10
You are preparing a tabular dataset with 50,000 rows for AutoML Tables to predict customer churn. The dataset contains a column 'customer_email' which is unique for each customer. What is the best practice for handling this column before training?
Explanation
Unique identifiers like customer_email should be excluded from training datasets because they don't provide any generalizable patterns for prediction. Each value appears only once, so the model cannot learn meaningful relationships. Option A is incorrect because while AutoML has some automation, including unique identifiers can lead to overfitting. Option B is incorrect because one-hot encoding a unique identifier would create as many columns as rows, which is inefficient and unhelpful. Option D is incorrect because the target should be the churn indicator, not the email address.
Question 2 of 10
Your company needs to train an AutoML Vision model to classify product defects from manufacturing line images. You have 10,000 unlabeled images. What is the most efficient approach to prepare this data for AutoML training?
Explanation
Vertex AI Data Labeling Service is the most efficient approach as it provides a managed service with quality control, specialist labelers, and integration with Vertex AI. It supports active learning and can significantly reduce labeling time and costs. Option B is inefficient and doesn't scale well for 10,000 images. Option C is incorrect because AutoML Vision requires labeled data for supervised learning tasks like classification. Option D won't work for custom defect detection as pre-trained models aren't trained on your specific defect types.
Question 3 of 10
You are using AutoML Tables to build a forecasting model for retail sales prediction. Your dataset has timestamps at hourly intervals for the past 3 years. Which feature engineering step would be MOST beneficial before training?
Explanation
Creating derived temporal features like day_of_week, month, is_holiday, and hour_of_day helps the model capture seasonal patterns and cyclical trends that are crucial for forecasting. These features make temporal patterns explicit and easier for the model to learn. Option A is incorrect as timestamps contain valuable information for forecasting. Option C (Unix epoch) doesn't make patterns obvious to the model. Option D is incorrect because aggregating to yearly intervals loses the granularity needed for accurate forecasting and eliminates important seasonal patterns.
Question 4 of 10
When preparing image data for AutoML Vision, you have images of varying sizes ranging from 200x200 to 4000x4000 pixels. What is the recommended approach?
Explanation
AutoML Vision automatically handles image preprocessing including resizing, normalization, and augmentation. You should upload images in their original format and let AutoML handle the optimization. Option A is unnecessary and may reduce image quality. Option B would severely limit your dataset and isn't required. Option D is incorrect because converting to grayscale loses color information that might be important for classification, and AutoML can handle color images efficiently.
Question 5 of 10
You're training an AutoML Natural Language model for sentiment analysis with a dataset of 5,000 customer reviews. After the first training run, you notice the model performs poorly on negative sentiments. What should be your first debugging step?
Explanation
Checking for class imbalance is the first step when you notice poor performance on specific classes. If negative sentiments are underrepresented in your training data, the model won't learn to identify them effectively. You may need to collect more negative examples or apply techniques like class weighting. Option A might help but won't solve an imbalance problem. Option C is premature before understanding the root cause. Option D doesn't apply well to text classification where the text itself is the primary input.
Question 6 of 10
Your organization needs to train an AutoML model on tabular data that includes sensitive PII (Personally Identifiable Information) such as social security numbers and home addresses. What is the BEST practice to handle this data responsibly?
Explanation
Using Cloud DLP to de-identify or tokenize PII before training is the best practice for responsible AI. You should transform PII into aggregate features (e.g., zip code instead of full address, age range instead of birthdate) that preserve predictive value while protecting privacy. Option A violates privacy principles and regulations. Option C still exposes PII during training even if encrypted. Option D is better but storing IDs of sensitive data can still enable re-identification and doesn't fully address the privacy concern.
Question 7 of 10
You are using AutoML Tables to predict equipment failure. Your dataset has 100 features but you suspect many are irrelevant. How does AutoML Tables handle feature selection?
Explanation
AutoML Tables automatically performs feature selection and provides feature importance scores in the evaluation results. This helps identify which features contribute most to predictions. You can review these insights and optionally remove low-importance features for model simplification. Option A is incorrect as AutoML handles feature importance internally. Option C is incorrect because AutoML doesn't require manual PCA. Option D is incorrect because AutoML automatically handles feature normalization and scaling.
Question 8 of 10
When creating a forecasting model using AutoML for time-series data predicting monthly sales, you have 5 years of historical data. What is the recommended minimum data split configuration?
Explanation
For time-series forecasting, chronological splitting is essential to prevent data leakage. Training should use older historical data, and validation/testing should use more recent data to simulate real-world prediction scenarios. Option A (random split) causes data leakage as future information would be in the training set. Option B is backwards and would train on recent data to predict the past. Option D (random k-fold) also causes temporal leakage and doesn't respect the time-series nature of the data.
Question 9 of 10
You're preparing a dataset for AutoML Video Classification to identify different types of customer service interactions. Your videos are 2 hours long each. What preprocessing step should you take?
Explanation
Segmenting long videos into shorter, meaningful clips improves training efficiency and model accuracy. Each clip should represent a single class or interaction type. This helps the model focus on relevant content and reduces processing costs. Option A is inefficient and may confuse the model with mixed content. Option C is unnecessary as AutoML handles frame rate normalization. Option D loses temporal information that's crucial for understanding video content and interaction context.
Question 10 of 10
After training an AutoML Tables model, you notice the model achieves 99% accuracy on the training set but only 65% on the test set. What is the MOST likely issue and solution?
Explanation
The large gap between training (99%) and test (65%) accuracy is a classic sign of overfitting, where the model memorizes training data rather than learning generalizable patterns. Solutions include early stopping, regularization, or collecting more diverse training data. In AutoML, you can also try reducing training time or using a simpler model. Option A would likely worsen overfitting. Option C might help evaluation but doesn't address the core overfitting problem. Option D (underfitting) is incorrect because underfitting shows poor performance on both training and test sets.