Architecting Low-Code AI Solutions - Quiz

Question 1 of 10

You are preparing a tabular dataset with 50,000 rows for AutoML Tables to predict customer churn. The dataset contains a column 'customer_email' which is unique for each customer. What is the best practice for handling this column before training?

Keep the column as-is since AutoML will automatically handle it Convert it to a categorical feature using one-hot encoding Exclude it from the training dataset as it's a unique identifier with no predictive value Use it as the target column for prediction

Explanation

Unique identifiers like customer_email should be excluded from training datasets because they don't provide any generalizable patterns for prediction. Each value appears only once, so the model cannot learn meaningful relationships. Option A is incorrect because while AutoML has some automation, including unique identifiers can lead to overfitting. Option B is incorrect because one-hot encoding a unique identifier would create as many columns as rows, which is inefficient and unhelpful. Option D is incorrect because the target should be the churn indicator, not the email address.

Question 2 of 10

Your company needs to train an AutoML Vision model to classify product defects from manufacturing line images. You have 10,000 unlabeled images. What is the most efficient approach to prepare this data for AutoML training?

Use Vertex AI Data Labeling Service to create a human labeling task with clear instructions and quality control measures Manually label all images yourself before uploading to Vertex AI Train the model with unlabeled data as AutoML can perform unsupervised learning Use a pre-trained model directly without any labeling

Explanation

Vertex AI Data Labeling Service is the most efficient approach as it provides a managed service with quality control, specialist labelers, and integration with Vertex AI. It supports active learning and can significantly reduce labeling time and costs. Option B is inefficient and doesn't scale well for 10,000 images. Option C is incorrect because AutoML Vision requires labeled data for supervised learning tasks like classification. Option D won't work for custom defect detection as pre-trained models aren't trained on your specific defect types.

Question 3 of 10

You are using AutoML Tables to build a forecasting model for retail sales prediction. Your dataset has timestamps at hourly intervals for the past 3 years. Which feature engineering step would be MOST beneficial before training?

Remove all timestamp information to avoid data leakage Create derived features such as day_of_week, month, is_holiday, and hour_of_day from the timestamps Convert all timestamps to Unix epoch time Aggregate all data to yearly intervals to reduce dataset size

Explanation

Creating derived temporal features like day_of_week, month, is_holiday, and hour_of_day helps the model capture seasonal patterns and cyclical trends that are crucial for forecasting. These features make temporal patterns explicit and easier for the model to learn. Option A is incorrect as timestamps contain valuable information for forecasting. Option C (Unix epoch) doesn't make patterns obvious to the model. Option D is incorrect because aggregating to yearly intervals loses the granularity needed for accurate forecasting and eliminates important seasonal patterns.

Question 4 of 10

When preparing image data for AutoML Vision, you have images of varying sizes ranging from 200x200 to 4000x4000 pixels. What is the recommended approach?

Manually resize all images to exactly 224x224 pixels before uploading Only include images that are exactly the same size to ensure consistency Upload images as-is; AutoML Vision automatically handles image resizing and preprocessing Convert all images to grayscale to standardize the input

Explanation

AutoML Vision automatically handles image preprocessing including resizing, normalization, and augmentation. You should upload images in their original format and let AutoML handle the optimization. Option A is unnecessary and may reduce image quality. Option B would severely limit your dataset and isn't required. Option D is incorrect because converting to grayscale loses color information that might be important for classification, and AutoML can handle color images efficiently.

Question 5 of 10

You're training an AutoML Natural Language model for sentiment analysis with a dataset of 5,000 customer reviews. After the first training run, you notice the model performs poorly on negative sentiments. What should be your first debugging step?

Immediately increase the training budget to allow for longer training Analyze the class distribution in your training data to check for imbalanced classes Switch to a custom model using TensorFlow instead of AutoML Add more features to the dataset

Explanation

Checking for class imbalance is the first step when you notice poor performance on specific classes. If negative sentiments are underrepresented in your training data, the model won't learn to identify them effectively. You may need to collect more negative examples or apply techniques like class weighting. Option A might help but won't solve an imbalance problem. Option C is premature before understanding the root cause. Option D doesn't apply well to text classification where the text itself is the primary input.

Question 6 of 10

Your organization needs to train an AutoML model on tabular data that includes sensitive PII (Personally Identifiable Information) such as social security numbers and home addresses. What is the BEST practice to handle this data responsibly?

Include all PII data as it may improve model accuracy Use Cloud Data Loss Prevention (DLP) API to de-identify or tokenize PII before training, and use aggregated or derived features instead Encrypt the PII columns but keep them in the dataset Store PII in a separate database and reference it by ID in the training data

Explanation

Using Cloud DLP to de-identify or tokenize PII before training is the best practice for responsible AI. You should transform PII into aggregate features (e.g., zip code instead of full address, age range instead of birthdate) that preserve predictive value while protecting privacy. Option A violates privacy principles and regulations. Option C still exposes PII during training even if encrypted. Option D is better but storing IDs of sensitive data can still enable re-identification and doesn't fully address the privacy concern.

Question 7 of 10

You are using AutoML Tables to predict equipment failure. Your dataset has 100 features but you suspect many are irrelevant. How does AutoML Tables handle feature selection?

You must manually select features before training; AutoML Tables uses all provided features equally AutoML Tables automatically performs feature selection and attribution, showing feature importance in the evaluation results AutoML Tables requires you to use Principal Component Analysis (PCA) before training All features must be normalized manually or AutoML will fail

Explanation

AutoML Tables automatically performs feature selection and provides feature importance scores in the evaluation results. This helps identify which features contribute most to predictions. You can review these insights and optionally remove low-importance features for model simplification. Option A is incorrect as AutoML handles feature importance internally. Option C is incorrect because AutoML doesn't require manual PCA. Option D is incorrect because AutoML automatically handles feature normalization and scaling.

Question 8 of 10

When creating a forecasting model using AutoML for time-series data predicting monthly sales, you have 5 years of historical data. What is the recommended minimum data split configuration?

Use random 80-20 train-test split like standard ML problems Use the most recent data for training and oldest data for testing Use chronological splitting with older data for training and recent data for validation/testing to prevent data leakage Use k-fold cross-validation with random shuffling

Explanation

For time-series forecasting, chronological splitting is essential to prevent data leakage. Training should use older historical data, and validation/testing should use more recent data to simulate real-world prediction scenarios. Option A (random split) causes data leakage as future information would be in the training set. Option B is backwards and would train on recent data to predict the past. Option D (random k-fold) also causes temporal leakage and doesn't respect the time-series nature of the data.

Question 9 of 10

You're preparing a dataset for AutoML Video Classification to identify different types of customer service interactions. Your videos are 2 hours long each. What preprocessing step should you take?

Upload the full 2-hour videos directly; AutoML will handle segmentation automatically Segment videos into shorter, relevant clips (1-5 minutes) that focus on specific interactions before uploading Convert all videos to a standard frame rate of 60fps before uploading Extract individual frames and train an image classification model instead

Explanation

Segmenting long videos into shorter, meaningful clips improves training efficiency and model accuracy. Each clip should represent a single class or interaction type. This helps the model focus on relevant content and reduces processing costs. Option A is inefficient and may confuse the model with mixed content. Option C is unnecessary as AutoML handles frame rate normalization. Option D loses temporal information that's crucial for understanding video content and interaction context.

Question 10 of 10

After training an AutoML Tables model, you notice the model achieves 99% accuracy on the training set but only 65% on the test set. What is the MOST likely issue and solution?

The model needs more training time; increase the training budget The model is overfitting; reduce model complexity by using early stopping, adding regularization, or collecting more diverse training data The test set is too small; use more data for testing The model is underfitting; add more features to the dataset

Explanation

The large gap between training (99%) and test (65%) accuracy is a classic sign of overfitting, where the model memorizes training data rather than learning generalizable patterns. Solutions include early stopping, regularization, or collecting more diverse training data. In AutoML, you can also try reducing training time or using a simpler model. Option A would likely worsen overfitting. Option C might help evaluation but doesn't address the core overfitting problem. Option D (underfitting) is incorrect because underfitting shows poor performance on both training and test sets.

Welcome Back

Terms of Service & Privacy Notice

Create Account

Verify Your Phone Number

Reset Your Password

Login Required

Get in Touch

Training models by using AutoML