This is a complementary article to part 2. In the last part of the series, we covered what are the necessary steps and precautions for designing, developing, and maintaining an ML system. Now, I'll be introducing some relevant GCP and AWS stack for building a cloud-based ML system:

GCP: Vertex AI

Is an immersive MLOps (DevOps for ML) platform that brings together many Google Cloud services and through UI or API, one can analyze, design, develop, and monitor a machine-learning solution using prebuilt models and custom models.

Here are some notable services that I have personally used and recommend:

Vertex AI Data Labeling

Required Technical Expertise Level: None.

You can outsource the labeling of data to google cloud operators. You will have to provide them with the datasets, and instructions related to the labeling process
When they are done, you can use the annotation set to train a Vertex AI model or export the labeled data items to use in another machine-learning environment.

For information about the pricing of data labeling, you can refer to Data labeling.

ML Development Phase: 1.2. Data Preparation

AutoML

Required Technical Expertise Level:

Basic Data Engineering
Basic Machine Learning Understanding

You create and train a model with minimal technical effort. You can use AutoML to quickly prototype models and explore new datasets before investing in development. For example, you can use it to learn which features are best for a given dataset.

Note that AutoML has limited functionality and it can be used only for limited use cases, that depends on the data type you're trying to work with. Some functionalities are:

Image data: Classification, object detection.
Video data: Action recognition, classification, object tracking.
Text data: Classification, entity extraction, sentiment analysis.
Tabular data: Classification/regression, forecasting.

ML Development Phase:

1.1. Use-Case Definition
1.2. Data Preparation
1.3. Model Design
2.2. Building Model
2.3. Iterative Tuning

Vertex AI Workbench

Required Technical Expertise Level:

Expert Data Engineering
Expert Machine Learning Understanding

From experimentation to deployment to managing and monitoring models. It is a Jupyter notebook-based fully managed, scalable, enterprise-ready compute infrastructure with security controls and user management capabilities.

ML Development Phase:

2.1. Creating ML pipeline
2.2. Building Model
2.3. Iterative Tuning
3.2. Setup Deployment Flow

Vertex AI Vizier

Required Technical Expertise Level:

Basic Data Engineering
Basic Machine Learning Understanding

For models that have many different hyperparameters, it can be difficult and time-consuming to tune them manually. Vizier is an automatic optimization service that helps you tune hyperparameters in complex machine learning (ML) models, irrelevant to the objective function.

Tips

Since this is a trial-based tuning, it's not recommended unless the difficulty of optimization by other means is confirmed, or a clear objective function is not at hand.

ML Development Phase:

2.3. Iterative Tuning
3.1. Evaluation Dashboards

Vertex Explainable AI

Required Technical Expertise Level:

Basic Data Engineering
Expert Machine Learning Understanding

Vertex Explainable AI offers Feature-based and Example-based explanations to provide a better understanding of model decision-making. Knowing how a model behaves, and how it is influenced by its training dataset, gives anyone who builds or uses ML new abilities to improve models, build confidence in their predictions, and understand when and why things go awry.

Explainable AI can paint the significant features in image datasets and can compare different examples from the training and evaluation datasets to give a thorough understanding of the model's prediction and accuracy.

ML Development Phase:

2.3. Iterative Tuning
3.1. Evaluation Dashboards

Vertex AI Pipelines

Required Technical Expertise Level:

Basic Data Engineering
Moderate Machine Learning Understanding

You can create scalable pipelines for managing the data flow by reading it from your specified data sources, use the models you wish and monitor the pipeline node by node (step by step) and get an overview of the ML's performance.

This is a very powerful tool that allows ML engineers to design and manage their pipelines on a higher level without worrying about the necessary infrastructure or required computation resources. You can begin designing your custom pipeline without actually delving into coding.

Note: The pipeline's metadata can be accessed directly, and one doesn't require to go through Vertex ML Metadata.

ML Development Phase:

2.1. Creating ML pipeline
2.2. Building Model
2.3. Iterative Tuning
3.2. Setup Deployment Flow

Vertex AI Tensorboard

Required Technical Expertise Level:

Moderate Data Engineering
Moderate Machine Learning Understanding.

Open source TensorBoard (TB) is a Google open-source project for machine learning experiment visualization. Vertex AI TensorBoard is an enterprise-ready managed version of TensorBoard.

You can create experiments, visualize metrics and archive the logs of each experiment such as the inputs outputs, and metrics. You can visualize histograms of weights, biases, or other tensors as they change over time

Tip: This is the go-to method to quickly asses all of the model's training metrics.

ML Development Phase:

2.3. Iterative Tuning
3.1. Evaluation Dashboard
3.2. Setup Deployment Flow
3.3. Planning Ahead

GCP: Cloud Composer

Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and manage workflows. Cloud Composer is built on the popular Apache Airflow open-source project and operates using the Python programming language.

Cloud Composer offers a managed Apache Airflow solution without any installation steps. One can create Airflow environments quickly and use Airflow-native tools, such as the powerful Airflow web interface and command-line tools, so you can focus on your workflows and not your infrastructure.

In data analytics, a workflow represents a series of tasks for ingesting, transforming, analyzing, or utilizing data. In Airflow, workflows are created using DAGs, or "Directed Acyclic Graphs".

A DAG is a collection of tasks that you want to schedule and run, organized in a way that reflects their relationships and dependencies. DAGs are created in Python scripts, which define the DAG structure (tasks and their dependencies) using code.

ML Development Phase:

2.1. Creating ML pipeline
2.2. Building Model
3.2. Setup Deployment Flow

Tip

Cloud composer is a good environment to orchestrate your MLOps assets and structure your ML pipelines, however, it's not a cost-effective solution since it's considered an active service, meaning you will be charged from the moment you begin using the service, even if there's no active request/response workflow.

AWS: Amazon Personalize

It's a managed online ML service and the following use-cases are suggested by the officials:

Optimize recommendations For swifter deployment, automate creating and maintaining personalized recommendations for industries such as retail, media, and entertainment.
Target customers more accurately Apply ML to run more effective prospecting campaigns by segmenting users based on preferences such as product, category, and brand.
Maximize your data’s value Unlock information trapped in product descriptions, reviews, or other unstructured text to generate more relevant recommendations.
Promote items using business rules Customize your recommendations by promoting specific items based on business goals, while still ensuring the highest relevance possible.

Personal Take

One of the positive points of this service is an easy-to-use text-based analysis.
However, I only suggest this service for businesses that do not have a strong technical background in machine learning and data analysis. This system can be a great asset to your existing KPIs, but the extremely limited customization that is offered by Amazon as of the current date, cannot act as a strong foundation for an IT company's future.

ML Development Phase:

1.1. Use-Case Definition
1.2. Data Preparation
1.3. Model Design
2.2. Building Model
2.3. Iterative Tuning

Final Word

In this part, I only covered and reviewed stacks that I have experience working with within a production environment. I hope it proves to be useful for you when trying to design a cloud-based ML system.

If you find this interesting, have a comment, or think a section needs more explanation, reach out to me and I'll do my best to be responsive.

Enterprise Level Machine Learning Tools & Tips: Part 3

Part 3: Relevant GCP & AWS stack.

Table of contents

GCP: Vertex AI

Vertex AI Data Labeling

AutoML

Vertex AI Workbench

Vertex AI Vizier

Tips

Vertex Explainable AI

Vertex AI Pipelines

Vertex AI Tensorboard

GCP: Cloud Composer

AWS: Amazon Personalize

Final Word