Step3_Containers-Batch-Streaming-Pipelines

1. Containers¶

1.1 Benefits¶

Lightweight alternative to virtual machines, which provide similar functionality.
Faster to spin up, while providing the same level of isolation as virtual machines.
Can reuse layers from other containers, making it much faster to build and share containers.
Great solution to use when you need to run conflicting versions of Python runtimes or libraries, on a single machine.

1.2 Data Science¶

Reproducible analyses: after packaging up analyses, others can rerun your work months or years later.
Web applications: we can build an interactive web application in which Containers provide a great way of abstracting away hosting concerns for deploying the app.
Model deployments: To let the model as an endpoint, containers provide a great way of separating the model application code from the model serving infrastructure.

1.3 Serverless Function vs. Container Technologies¶

The best approach for serving models depends on your deployment environment and expected workload. Typically, you are constrained to a specific cloud platform when working at a company, because your model service may need to interface with other components in the cloud, such as a database or cloud storage. Within AWS, there are multiple options for hosting containers while GCP is aligned with GKE as a single solution. The main question to ask is whether it is more cost-effective to serve your model using serverless function technologies or elastic container technologies.

The correct answer depends on the volume of traffic you need to handle, the amount of latency that is tolerable for end-users, and the complexity of models that you need to host. Containerized solutions are great for serving complex models and making sure that you can meet latency requirements, but they may require a bit more DevOps overhead versus serverless functions.

1.4 Container Orchestration¶

Container orchestration systems are responsible for managing the life cycles of containers in a cluster. They provide services including provisioning, scaling, failover, load balancing, and service discovery between containers.

the general trend has been moving towards Kubernetes for this functionality, which is an open-source platform originally designed by Google.

1.5 Wrap the Application in a Docker Container¶

Dockerfile¶

The first step is to use the FROM command to identify a base image to use. The ubuntu image provides a Linux environment that supports the apt-get command.
The MAINTAINER command adds to the metadata information associated with the image, adding the name of the image maintainer. * The RUN command is used to install Python, set up a symbolic link, and install Flask. For containers with many Python libraries, it’s also possible to use a requirements.txt file.
The COPY and ADD command inserts our script into the image and places the files and folders in the root directory.
The final command could be RUN, CMD, or ENTRYPOINT, which specifies which arguments to run to execute the application. Read more here

dos
FROM ubuntu:latest
MAINTAINER Linh H

RUN apt-get update \  
  && apt-get install -y python3.9 python3-pip \  
  && cd /usr/local/bin \  
  && ln -s /usr/bin/python3 python \  
  && pip3 install flask==2.0 pandas==1.4 mlflow==2.1 scikit-learn==1.0.2 statsmodels==0.13.2

COPY echo_docker.py echo.py 
ADD models models
ADD db db

ENTRYPOINT ["gunicorn","echo:app", "-b", "0.0.0.0:80"]

Creating an Image¶

Let's build an image, named echo_service, for the current folder. You need Docker daemon services running background.

$ docker image build -t "echo_service" .

Running Containers¶

After successfully building the image, we can start running this image as a container to serve the predictive model at port 8080.

$ sudo docker run -d -p 8080:80 echo_service

The -d flag specifies that the container should run as a daemon process, which will continue to run even when shutting down the terminal.
The -p flag is used to map a port on the host machine to a port that the container uses for communication. Port mapping from the host port of 8080 to the container port 80 (by gunicorn).

Demo Codes¶

Files of Demo Codes are located in the demo-code\echo_service folder.

1.6 Run Container in Kubernetes on GCP¶

Now we want to deploy the image onto GCP using the Kubernetes service. The first step is to save your image to a Docker registry that can interface with the orchestration system. The GCP version of this registry service is called Container Registry.

Make sure we turn on the Google Container Registry API and Kubernetes Engine API by enabling them in the GCP console.

Push the Image¶

We need the GCP credentials JSON file that we set up in the Step 1 Chapter. Then we log in GCP Container Registry at https://us.gcr.io .

$ cat newacc_gcp_credential.json | sudo docker login -u _json_key --password-stdin https://us.gcr.io
$ sudo docker tag echo_service us.gcr.io/scalable-model-piplines/echo_service 
$ sudo docker push us.gcr.io/scalable-model-piplines/echo_service

$ type newacc_gcp_credential.json | docker login -u _json_key --password-stdin https://us.gcr.io

Run Service on Kubernetes¶

On the Google Kubernetes Engine main page:

Click "Deploy Container".
Select "Existing Container Image".
Choose "echo_service:latest".
Assign an application name "echo-gke".
Click "Deploy".

When GKE ran the Docker image, it started a cluster and we can set up Services which are sets of Pods with a network endpoint that can be used for discovery and load balancing.

1.7 Discover the Cluster¶

In Workloads menu> select echo-gke > menu ACTION > run EXPOSE to expose the cluster to external traffic.

Now we choose Load balancer for Service type which maps to port 80 to our gunicorn server.

By calling http://34.70.212.160/predict?range=1 it will return JSON result of predictions.

import requests

result = requests.get("http://34.70.212.160/predict?range=1")
print(result.json())

1.8 Automatically Scaling with `autoscale`¶

For the Kubernetes cluster, we can use kubectl to make our cluster automatically scale up when a condition is met. For the example below, when all CPU utilization is up to 80%, the cluster is adding more nodes until it reaches --max=5 nodes.

kubectl autoscale deployment echo-gke --cpu-percent=80 --min=2 --max=5

In GKE, there is an option to turn on the autoscale

Go to GKE UI, select Workloads on the left menu
Select the deployment of the workloads, in this case, it is prediction
On top menu, click ACTIONS and choose Autoscale

gke-autoscale

There are two choices: Horizontal vs Vertical pod autoscaling. If choosing Horizontal Scaling:

gke-autoscale 2

Select what you want and click SAVE to update the cluster immediately.

Traffic-based autoscaling: Besides pod autoscaling, we also can scale up the system based on traffic utilization signals. Traffic-based autoscaling is enabled by the Gateway controller.

1.9 Note¶

In this example, I put trained models inside the container. In the next chapter, we can put these trained models on external storage (i.e. Google Cloud Storage). Whenever the predictor is called, it loads the trained model and runs the prediction.

Hence, we only need to set up a Container once and focus on developing model pipelines.

2. Batch Model Pipelines¶

Batch model pipelines automatically perform a sequence of tasks in order to train and store results for a Machine Learning model. In a batch process, we perform a set of operations that store trained models and predictions that are later served by different applications.

However, when working with the pipelines, any issues should be resolved quickly. Because there are many tasks and components involved in a pipeline, issues can occur at any part of the system. Hence, we need a system in place that can send alerts to the team that owns the pipeline and rerun portions of the model pipeline to resolve any issues.

2.1 Workflow tools¶

Workflow tools provide a solution for managing these types of problems in model pipelines and are responsible for:

Running tasks
Provisioning resources
Monitoring the status of tasks

There are several open-source tools for building workflows including Airflow, Luigi, MLflow, and Pentaho Kettle. We focus on Apache Airflow, which Google Cloud Composer is built on it, in this project.

2.2 Model Pipelines:¶

There are typically two types of batch model pipelines:

Persistent: The models are trained and saved to a (binary) file. Any applications/pipelines can reload this model to run prediction.
Transient: When an app/pipeline wants to do prediction, it builds and trains the model, then predicts.

2.3 Benefits over Crons¶

There are a few situations where workflow tools provide benefits over using cron directly:

Dependencies: Workflow tools define graphs of operations, making dependencies explicit.
Backfills: It may be necessary to run an ETL for a range of different dates on old data (late arrival of data).
Versioning: Most workflow tools integrate with version control systems to manage graphs.
Alerting: These tools can send out emails or generate PageDuty alerts when failures occur.

Workflow tools: One of the key benefits is the ability to handle DAG configuration as code, which enables code reviews and version control for workflows.

2.4 Cloud Dataflow¶

Google Cloud Dataflow provides a useful framework for scaling up sklearn models to massive datasets. Instead of fitting all the input data into a data frame, we can score each record individually in the process function and use Apache Beam to stream these outputs to a data sink, such as BigQuery.

There are a lot of examples/tutorials for TensorFlow work in Cloud Dataflow for batch predictions here.

The core component in Dataflow is a pipeline that defines the operations to perform as part of a workflow:

Pipeline: defines the set of operations to perform as part of a Dataflow job.
Collection: the interface between different stages in a workflow. The input to any step in a workflow is a collection of objects and the output is a new collection of objects.
DoFn: an operation to perform on each element in a collection, resulting in a new collection.
Transform: an operation to perform on sets of elements in a collection, such as aggregation.

2.5 Apache Beam¶

Cloud Dataflow builds upon the open-source Apache Beam library, making it usable in other cloud environments. One of the main reasons for using Beam is the ability to switch between multiple runners such as Apache Spark, Apache Flink, Samza, and Google Cloud Dataflow because Apache Beam looks like a framework.

3. Streaming Model Pipelines¶

Many organizations are now using streaming platforms in order to build real-time data pipelines that transform streams of data and move data between different components in a cloud environment. These platforms are typically distributed and provide fault tolerance for streaming data.

To connect different systems, we can use Kafka for a self-host solution or cloud services such as Google Pub/Sub or Amazon Kinesis.

3.1 Types of Prediction¶

Passive prediction (like log or update): when the user made an action (watch, view, play, buy, sell an item), we can update the recommendation list for that user for the next serving. The predictions are saved to a database or data store.
Active prediction: A client asks for a prediction and sends a request to modeling, the result might not need right away (pre-generating prediction in the background). But the results are sent back to the client and are not probably saved to a database.

3.2 Benefits¶

Although the Active prediction sounds like Lambda or Cloud Function, there are keys different when using Cloud Dataflow:

It is easy to connect with other components in cloud platforms. We can push data to BigQuery for storage or forward output to another message consumer.
Able to use distributed tools, such as PySpark, in contrast to the endpoint-based approaches that service requests in isolation.
More tools and different programming languages supported.

3.3 Google Pub/Sub¶

Unlike Kafka, PubSub uses separate concepts for producer and consumer data sources. In Kafka, you can publish and subscribe to a topic directly, while in PubSub consumers subscribe to subscriptions rather than directly subscribing to topics. With PubSub, you first set up a topic and then create one or more subscriptions that listen to this topic.

In PubSub UI in the GCP Console, let's create a topic ID sale-range

In this topic, we create a Subscription ID predict-range. There are three Delivery types:

Pull: your subscriber client initiates requests to a Pub/Sub server to retrieve messages.
Push: a Pub/Sub server initiates a request to your subscriber client to deliver messages.
Write to BigQuery: A BigQuery subscription writes messages to an existing BigQuery table as they are received. You need not configure a separate subscriber client.

For pull, the subscribers must request delivery. For other subscription types, Pub/Sub delivers messages as soon as they are published.

3.4 Demo of Connection¶

Read messages from the subscription¶

from google.cloud import pubsub_v1

subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
    "google_project_name", "Subscription_ID"
)

def callback(message):
    print(message.data)
    message.ack()

subscriber.subscribe(subscription_path, callback=callback)

Publish a message to a topic¶

from google.cloud import pubsub_v1

publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path("google_project_name", "Topic_ID")

data = "Hello World!".encode('utf-8')
publisher.publish(topic_path, data=data)

We need to encode the message in utf-8 format before publishing the message.

Containers & Batch/Streaming Pipelines¶