AWS Certified Machine Learning – Specialty Dump 04

Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?

  • A. Initialize the model with random weights in all layers including the last fully connected layer.
  • B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
  • C. Initialize the model with random weights in all layers and replace the last fully connected layer.
  • D. Initialize the model with pre-trained weights in all layers including the last fully connected layer.

Note: This is a classic question where you should deduce the answer without much prior knowledge. This is a question about “transfer” and “use existing model”, so A and C are out because of “random weights”. Then, D means to use original model as-is including its output which is not correct. Out of all, B is most sound answer, however in reality, you probably want to re-train more layers than just the output layer, as you have a large data set to learn from.

https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751


An office security agency conducted a successful pilot using 100 cameras installed at key locations within the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into a full production system using thousands of video cameras in its office locations globally. The goal is to identify activities performed by non-employees in real time
Which solution should the agency consider?

  • A. Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection of known employees, and alert when non-employees are detected.
  • B. Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Image to detect faces from a collection of known employees and alert when non-employees are detected.
  • C. Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection on each stream, and alert when non-employees are detected.
  • D. Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, run an AWS Lambda function to capture image fragments and then call Amazon Rekognition Image to detect faces from a collection of known employees, and alert when non-employees are detected.

Note: This is a kind of tricky question Amazon had promised it wouldn’t throw at us. Yes, DeepLens is not a feasible solution for production. Yes, the original pilot was done with images. However, A and B says use a proxy for each camera which is a sign for immediate exclusion. You are not going to deploy 1000s of proxies and send their video to 1000s of streams. Now we know this is a “commercial” question for promoting DeepLens.

Out of the two options left, D fits the original scenario using image to identify employees, but afterward there is no way to detect activities! Rekognition VIDEO is the one to detect activities.

Still, this is a BULLSHIT question as: 1) Pilot and production are in no way connected; 2) DeepLens used as a production proposal which is against its current terms and status; 3) With all options, only detection of non-employee faces are mentioned, not their activities which makes all options ambiguous. Actually no one can be 100% sure about the right answer to this question, but is left with a closest guess.


A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on social media to acquire new customers. Currently, the company has the following data in Amazon Aurora:

  • Profiles for all past and existing customers
  • Profiles for all past and existing insured pets
  • Policy-level information
  • Premiums received
  • Claims paid

What steps should be taken to implement a machine learning model to identify potential new customers on social media?

  • A. Use regression on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media
  • B. Use clustering on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media
  • C. Use a recommendation engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.
  • D. Use a decision tree classifier engine on customer profile data to understand key characteristics of consumer segments. Find similar profiles on social media.

A = wrong, regression means fits data to a certain line or formulae which is not the case. C = wrong, recommendation engine recommends known things to known users, such as a product (you created and you have data of) to a user (registered at your website and you have data of), in this case you don’t have data of social media users, and you may not have enough social media activities data for your existing customer. D = wrong, this is similar to regression.

B is the answer to this kind of problem: group those things / users / data, by their attributes. Think of astrology :).


A manufacturing company has a large set of labeled historical sales data. The manufacturer would like to predict how many units of a particular part should be produced each quarter.
Which machine learning approach should be used to solve this problem?

  • A. Logistic regression
  • B. Random Cut Forest (RCF)
  • C. Principal component analysis (PCA)
  • D. Linear regression

A = wrong, “logistic” is a confusing name, but it means “yes-maybe-no” decision (binary). B = wrong, RCF detects anomalies in data. C = wrong, PCA reduces data dimensionality.


A financial services company is building a robust serverless data lake on Amazon S3. The data lake should be flexible and meet the following requirements:
– Support querying old and new data on Amazon S3 through Amazon Athena and Amazon Redshift Spectrum.
– Support event-driven ETL pipelines
– Provide a quick and easy way to understand metadata
Which approach meets these requirements?

  • A. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Glue ETL job, and an AWS Glue Data catalog to search and discover metadata.
  • B. Use an AWS Glue crawler to crawl S3 data, an AWS Lambda function to trigger an AWS Batch job, and an external Apache Hive metastore to search and discover metadata.
  • C. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Batch job, and an AWS Glue Data Catalog to search and discover metadata.
  • D. Use an AWS Glue crawler to crawl S3 data, an Amazon CloudWatch alarm to trigger an AWS Glue ETL job, and an external Apache Hive metastore to search and discover metadata.

B and D = wrong, Hive not serverless. C = wrong, Batch not serverless. This is a free score question.


A company’s Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.
The model accuracy is acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes.
What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

  • A. Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.
  • B. Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.
  • C. Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.
  • D. Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.

A = wrong, 23 hours to 1 hour needs great boost in GPU power, even if it is possible, it is not future proof. C = wrong, switch model requires a lot of coding. D = wrong, training is better done with SageMaker than EMR, and this requires a lot of coding.


Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?

  • A. Recall
  • B. Misclassification rate
  • C. Mean absolute percentage error (MAPE)
  • D. Area Under the ROC Curve (AUC)

B = wrong, ambiguous. C = wrong, for forecasting. This is a confusing question. If we take the “generally” in question to mean “both binary and multiclass classifier” then “Recall” fits best as AUC for multiclass is not trival. Otherwise I would pick AUC as it is much more comprehensive, but mostly it is for binary classifier.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s