Automating the training and deployment of ML models on Kubernates

With the rise of Machine Learning and models, the need for automating and streamlining model deployment become a necessity. Pushed mostly by the fact that ML models as a new way of programming, are no longer an experimental concept but rather a day to day artifacts that can also follow a release and versioning process.

here is a link to the code used below: github

throughout this example I will :

  • train a model.
  • serialize it and save it.
  • build a docker image with front-end web server.
  • make a deployment on a kubernates cluster.

requirements:

scikit

docker

minikube & Kubernates

Building The Model

Training  data : 

our training data is going to be generated with a math function y= sin(2*π*tan(x)), where is between 0 and 1 with an increment of 0.001.

x = np.arange(0.0, 1, 0.001).reshape(-1, 1)

x = [[ 0. ]
[ 0.001]
[ 0.002]

………

[ 0.997]
[ 0.998]
[ 0.999]]

y = np.sin(2 * np.pi * np.tan(x).ravel()) #with max/min values of 1,-1

Screen Shot 2018-06-20 at 2.40.48 PM

Fitting the Model : 

In this example, I will use  a MultiLayer Perceptron implemented by SCIKIT python library

This is what the regressor function will all the parameters ( already tuned ) :

reg = MLPRegressor(hidden_layer_sizes=(500,), activation='relu', solver='adam', alpha=0.001,batch_size='auto',

learning_rate='constant', learning_rate_init=0.01, power_t=0.5, max_iter=1000, shuffle=True,

random_state=9, tol=0.0001, verbose=False, warm_start=False, momentum=0.9,

nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999,

epsilon=1e-08)
Test Data :
for testing we will use a generated set of data as well :
test_x = np.arange(0.0, 1, 0.05).reshape(-1, 1)
Prediction :
test_y = reg.predict(test_x)
Results :
continuous blue is the real output
dotted red is the predicted output
Screen Shot 2018-06-20 at 4.27.17 PM
Saving the model :
I used Python object serialization framework Pickle :
joblib.dump(reg, 'mlpreg.pkl')
this will save your model to a file named: mlpreg.pkl

Deploying the model

building an image :
I have created a docker image for deploying the model on a web server :
 FROM python:2.7.15-stretch

COPY MLPReg.py .

COPY server.py .

RUN python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

RUN python -m pip install -U scikit-learn

RUN python MLPReg.py

EXPOSE 8088

CMD python server.py

to build the image you can run this :
docker build -t mbenachour/mlpreg:latest .
kubernates deployment :
this the kubernates yaml file that describes the deployment :

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: mlpreg-deployment
  labels:
    app: mlpreg
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mlpreg
  template:
    metadata:
      labels:
        app: mlpreg
    spec:
      terminationGracePeriodSeconds: 30
      containers:
      - name: mlpreg
        image: mbenachour/mlpreg:latest
        imagePullPolicy: "Always"
        ports:
        - containerPort: 8088
---
apiVersion: v1
kind: Service
metadata:
  name: mlpreg-svc
  labels:
    app: mlpreg
    #tier: frontend
spec:
  type: NodePort
  ports:
  - port: 8088
  selector:
    app: mlpreg
    #tier: frontend

you can deploy the kubernates model :
kubectl apply -f mlp.yml
to check on the status of your kubernates services :
kubectl get services
you should see something similar to this :
Screen Shot 2018-06-21 at 4.31.45 PM.png

Making predictions

to get the kubernates cluster IP address, in my case I’m using minikube the command line :

$minikube service mlpreg-svc --url

http://192.168.99.105:32397
to make a prediction using the API for an input of 0.1
Screen Shot 2018-06-21 at 5.33.13 PM

Pipeline.ai

a lot of products have been introduced to help solve this problem, one of them is Chris Fregly project: pipeline.ai
the project gives you the possibility to create-train-deploy models using different frameworks :
– tensorflow
– scikit
– pytorch
implementing a lot of ML most used algorithms like linear regression.

Using AWS GuardDuty to stop compromised instances and send notifications.

GuardDuty  (announced in the 2017 edition of AWS Re:Invent) , is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. It monitors for activity such as unusual API calls or potentially unauthorized deployments that indicate a possible account compromise. GuardDuty also detects potentially compromised instances or reconnaissance by attackers.

with a minimal amount of code, and few clicks in the AWS console we can set up guardduty to scan EC2 fleets for eventual threats, notify a lambda function to stop the compromised instances and send an SMS notification using AWS SNS service:

Screen Shot 2018-01-04 at 9.43.11 AM

1- Testing few threats :

1-a – Bitcoin mining : one of the possible threats is using your EC2 instances for bitcoin mining , I started a bitcoind container on my EC2 instance to :

Screen Shot 2018-01-04 at 9.53.26 AM

1-b SSH brute-force : I’m not using any username and passwords dictionaries

Screen Shot 2018-01-04 at 9.55.03 AM

2- SNS topic : create an SNS topic called guardduty_alerts, with an SMS subscription

3- Lambda: for stopping instances and sending notifications

import boto3
import json

def lambda_handler(event, context):
print(‘loading handler’)# print(event)
sns = boto3.client(service_name = “sns”)
topicArn = ‘arn:aws:sns:us-east-1:9999999999:guardduty_alerts’

#
result = json.loads(event)# result is now a dict
instanceId = event[‘detail’][‘resource’][‘instanceDetails’][‘instanceId’]
type = event[‘detail’][‘description’]
message = “your EC2 instance ” + instanceId + “has been compromised by attack of ” + type + “, it will be stopped”
sns.publish(
TopicArn = topicArn,
Message = message
)

ec2 = boto3.client(‘ec2’, region_name = ‘us-east-1’)
ec2.stop_instances(InstanceIds = [instanceId])
return

4- CloudWatch rule: create a cloudwatch rule that triggers the lambda function we created previosly

 

et voila , all the threats that we did earlier shows in the GuardDuty findings :

Screen Shot 2018-01-04 at 10.36.08 AM

Stoping the compromised instances :

Screen Shot 2018-01-04 at 10.42.33 AM

sending notifications:

Screen Shot 2018-01-04 at 10.43.29 AM

 

Local (and S3) cloud storage server using Minio

Minio is a local cloud object storage server, it’s open source, released under Apache License V2.0, allowing developers and devops to have a local and a public cloud storage to:

  • backup VMs
  • backup containers
  • store unstructured Data ( photos, files, …)
  • store objects in AWS S3
  • store objects using SDKs (GO, Javascripts, Java )

to start a server you can use the container image of MiniO available on Docker hub here :

mini0/minio

you can run this cmd :

docker pull minio/minio

to start the server run :

 docker run -p 9000:9000 minio/minio server /export

you can access the web UI at http://localhost:9000

screen-shot-2017-01-02-at-7-21-48-pm

the access key and secret key for the local server are generated at the start of the server

create a bucket :

screen-shot-2017-01-02-at-9-15-52-pm

accessible also through the web UI:

screen-shot-2017-01-02-at-9-18-00-pm

using your AWS S3 storage :

we need to add AWS S3 end point to the list of hosts :

mc config host add my-aws https://s3.amazonaws.com YOUR_ACCESS_KEY  YOUR_SECRET_KEY

create a bucket in S3 :

screen-shot-2017-01-02-at-10-30-19-pm

and it’s created  :

screen-shot-2017-01-02-at-10-36-10-pm

CI and code promotion for Chef cookbooks with Jenkins – POC

 

I have been browsing the internet for blogs or articles to help Chef developers have a way of promoting the code of their cookbooks, a way of vetting code, and avoiding that code goes from Operations guys straight to production.

I have found a lot of theoretical articles on building a CI pipeline for chef cookbooks, but not a lot of practical ones,  so I decided to make a proof of concept for  the public and my team as well.

when it comes to integration tools, I like Jenkins, it’s open source and the community is very active in adding  and updating plugins .

In this example I will use a Java cookbook as a code base, and I will be running 4 test :

  • Foodcritic : a helpful lint tool you can use to check your Chef cookbooks for common problems. It comes with 61 built-in rules that identify problems ranging from simple style inconsistencies to difficult to diagnose issues that will hurt in production.
  • ChefSpec : a unit testing framework for testing Chef cookbooks. ChefSpec makes it easy to write examples and get fast feedback on cookbook changes without the need for virtual machines or cloud servers.
  • Rubocop :  a Ruby static code analyzer. Out of the box it will enforce many of the guidelines outlined in the community Ruby Style Guide.
  • Test Kitchen : a test harness tool to execute your configured code on one or more platforms in isolation. A driver plugin architecture is used which lets you run your code on various cloud providers and virtualization technologies such as Amazon EC2,  Vagrant, Docker, LXC containers, and more. Many testing frameworks are already supported out of the box including Bats, shUnit2, RSpec, Serverspec, with others being created weekly.

of course you can have all these tools in one package…. the famous ChefDK

Code Promotion :

The concept of code promotion helps the CI process distinguish between good and bad builds, I like to define a good build as a build where ALL the tests are successful .

Jenkins helps you implement this concept with a community plugin : Promoted Build Plugin

based of the status of your build ( promoted of not)  you can control the code that goes into your repository (github or gitlab), you can use set up hooks to deny merge requests from builds that are not promoted.

Jobs:

let’s setup our jobs, we will too categories of jobs :

  • Build Jobs
  • Test Jobs

 

screen-shot-2016-09-18-at-9-00-30-pm

whenever a build job is successful will trigger all the test jobs to start.

screen-shot-2016-09-19-at-9-24-50-am

Build-java-cookbook : will clone the code repo and create a temporary artifact, this is the config section for this job

screen-shot-2016-09-18-at-9-04-18-pm

Rubocop Test : will copy the temporary artifact decompress it to have all code repo and run rubocop test on the code :

screen-shot-2016-09-18-at-9-18-58-pm

ChefSpec Test :

screen-shot-2016-09-19-at-9-26-48-am

FoodCritic : 

screen-shot-2016-09-19-at-9-34-11-am

Test Kitchen : 

screen-shot-2016-09-19-at-9-39-28-am

Test Kitchen will spin a vagrant box (ubuntu-14.04) and run the cookbook on it and test the results.

First Run : 

with the configurations above we run the build and test jobs :

Result :

screen-shot-2016-09-19-at-9-54-56-am

Rubocop test failed, by looking at the execution log we can see why :

+ rubocop
Inspecting 44 files
.......CC...CC........C..C..................

Offenses:

providers/alternatives.rb:34:38: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus == 0.
      alternative_exists_same_prio = shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus == 0
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:35:28: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus == 0.
      alternative_exists = shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus == 0
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:43:18: C: Use remove_cmd.exitstatus.zero? instead of remove_cmd.exitstatus == 0.
          unless remove_cmd.exitstatus == 0
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:57:18: C: Use install_cmd.exitstatus.zero? instead of install_cmd.exitstatus == 0.
          unless install_cmd.exitstatus == 0
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:66:28: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus == 0.
      alternative_is_set = shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus == 0
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:72:16: C: Use set_cmd.exitstatus.zero? instead of set_cmd.exitstatus == 0.
        unless set_cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:87:50: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
    new_resource.updated_by_last_action(true) if cmd.exitstatus == 0
                                                 ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:39:20: C: Omit parentheses for ternary conditions.
    package_name = (file_name =~ /^server-jre.*$/) ? 'jdk' : file_name.scan(/[a-z]+/)[0]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/ark.rb:134:14: C: Use 0o for octal literals.
        mode 0755
             ^^^^
providers/ark.rb:149:14: C: Closing method call brace must be on the line after the last argument when opening brace is on a separate line from the first argument.
            ))
             ^
providers/ark.rb:150:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:157:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:164:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:172:14: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
      unless cmd.exitstatus == 0
             ^^^^^^^^^^^^^^^^^^^
recipes/ibm.rb:44:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/ibm_tar.rb:36:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/ibm_tar.rb:49:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/set_java_home.rb:27:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/set_java_home.rb:32:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
resources/ark.rb:39:54: C: Use 0o for octal literals.
attribute :app_home_mode, kind_of: Integer, default: 0755
                                                     ^^^^

44 files inspected, 20 offenses detected
Build step 'Execute shell' marked build as failure
Finished: FAILURE

 

 

let’s go ahead and fix these offenses and commit the code :

screen-shot-2016-09-19-at-10-38-02-am

we restart the build :

and here everything is green this time :

screen-shot-2016-09-19-at-11-40-27-am

from this point on you can create do two things :

  • save your cookbook in a private supermarket with a corresponding version number
  • upload this cookbook to chef server

 

Promotion status :

After the completion of all tests, this build can now be promoted.

screen-shot-2016-09-19-at-11-48-31-am

 

Running ContainerVMs on ESXI vmware host

 

until today I knew that running containers was always dependent on the existence of a host and an OS of some kind, but I came across this project : vSphere Integrated containers, it’s a runtime environment allowing developer to run containers as VMs, instead of running containers in  VMs .

there is a good read to understand the contrast between traditional containers and containerVMs

vic-machine :
is CLI tool allowing for the creating of containerVMs in the following setups :
  • vCenter Server with a cluster
  • vCenter Server with one or more standalone ESXi hosts
  • A standalone ESXi host

this architectures relies on a Virtual Container Host, the VHC is an end point to start, stop, delete containers across the datacenter .

“The Virtual Container Host (VCH) is the means of controlling, as well as consuming, container services – a Docker API endpoint is exposed for developers to access, and desired ports for client connections are mapped to running containers as required. Each VCH is backed by a vSphere resource pool, delivering compute resources far beyond that of a single VM or even a dedicated physical host. Multiple VCHs can be deployed in an environment, depending on business requirements. For example, to separate resources for development, testing, and production.”

the binaries can be downloaded from here :

https://bintray.com/vmware/vic-repo/build/view#files

untar the compressed file :

$ tar xvzf vic_3711.tar.gz

this is the content of the tar file :

Screen Shot 2016-08-06 at 11.54.13 PM

Setting up ESXI host : 

  • download the ISO file from vmware wbesite :  https://my.vmware.com/en/web/vmware/evalcenter?p=free-esxi6
  • use virtualbox or vmware fusion to create a host using esxi host ( http://www.vmwareandme.com/2013/10/step-by-step-guide-how-to-install.html#.V6a8rZNViko)Screen Shot 2016-08-06 at 11.50.51 PM

Creating a Virtual Container Host :

$ vic-machine-darwin create –target 172.16.127.130 –user root –image-datastore datastore1
INFO[2016-08-06T14:05:48-05:00] Please enter ESX or vCenter password:
INFO[2016-08-06T14:05:50-05:00] ### Installing VCH ####
INFO[2016-08-06T14:05:50-05:00] Generating certificate/key pair – private key in ./virtual-container-host-key.pem
INFO[2016-08-06T14:05:50-05:00] Validating supplied configuration
INFO[2016-08-06T14:05:51-05:00] Firewall status: ENABLED on “/ha-datacenter/host/localhost.localdomain/localhost.localdomain”
INFO[2016-08-06T14:05:51-05:00] Firewall configuration OK on hosts:
INFO[2016-08-06T14:05:51-05:00] “/ha-datacenter/host/localhost.localdomain/localhost.localdomain”
WARN[2016-08-06T14:05:51-05:00] Evaluation license detected. VIC may not function if evaluation expires or insufficient license is later assigned.
INFO[2016-08-06T14:05:51-05:00] License check OK
INFO[2016-08-06T14:05:51-05:00] DRS check SKIPPED – target is standalone host
INFO[2016-08-06T14:05:51-05:00] Creating Resource Pool “virtual-container-host”
INFO[2016-08-06T14:05:51-05:00] Creating VirtualSwitch
INFO[2016-08-06T14:05:51-05:00] Creating Portgroup
INFO[2016-08-06T14:05:51-05:00] Creating appliance on target
INFO[2016-08-06T14:05:51-05:00] Network role “client” is sharing NIC with “external”
INFO[2016-08-06T14:05:51-05:00] Network role “management” is sharing NIC with “external”
INFO[2016-08-06T14:05:52-05:00] Uploading images for container
INFO[2016-08-06T14:05:52-05:00] “bootstrap.iso”
INFO[2016-08-06T14:05:52-05:00] “appliance.iso”
INFO[2016-08-06T14:06:00-05:00] Waiting for IP information
INFO[2016-08-06T14:06:18-05:00] Waiting for major appliance components to launch
INFO[2016-08-06T14:06:18-05:00] Initialization of appliance successful
INFO[2016-08-06T14:06:18-05:00]
INFO[2016-08-06T14:06:18-05:00] vic-admin portal:
INFO[2016-08-06T14:06:18-05:00] https://172.16.127.131:2378
INFO[2016-08-06T14:06:18-05:00]
INFO[2016-08-06T14:06:18-05:00] DOCKER_HOST=172.16.127.131:2376
INFO[2016-08-06T14:06:18-05:00]
INFO[2016-08-06T14:06:18-05:00] Connect to docker:
INFO[2016-08-06T14:06:18-05:00] docker -H 172.16.127.131:2376 –tls info
INFO[2016-08-06T14:06:18-05:00] Installer completed successfully

you can use vSphere or ESXI web client to take a look :

Screen Shot 2016-08-07 at 12.20.56 AM

creating ContainerVM :

$ docker –tls run  –name container1 ubuntu

the container has been created :

Screen Shot 2016-08-07 at 12.25.32 AM

 

Conclusion :

ContainerVMs seem to have the following distinctive characteristics over the traditional containers  :

  1. There is no default shared filesystem between the container and its host
    • Volumes are attached to the container as disks and are completely isolated from each other
    • A shared filesystem could be provided by something like an NFS volume driver
  2. The way that you do low-level management and monitoring of a container is different. There is no VCH shell.
    • Any API-level control plane query, such as docker ps, works as expected
    • Low-level management and monitoring uses exactly the same tools and processes as for a VM
  3. The kernel running in the container is not shared with any other container
    • This means that there is no such thing as an optional privileged mode. Every container is privileged and fully isolated.
    • When a containerVM kernel is forked rather than booted, much of its immutable memory is shared with a parenttemplate
  4. There is no such thing as unspecified memory or CPU limits
    • A Linux container will have access to all of the CPU and memory resource available in its host if not specified
    • A containerVM must have memory and CPU limits defined, either derived from a default or specified explicitly

but the traditional containers like Docker are definitely  a more mature solution, offers more tools for orchestration and scaling.

Riak Cluster Using Docker Compose

Riak is some hot stuff lately with the increasing need for clusterzitation in the world of NoSQL data stores .

Riak is a solution to big data problem, it was based on Amazon Dynamo design, to respond to request at a very large scale,

Basho introduced Riak as fault tolerant, simple, scalable, high availability friendly .

it’s fairly easy to create/add nodes with riak-admin to a riak cluster, but if this is combined with docker-compose capabilities it will give you the possibility to have an easily deployable/scalable cluster of riak nodes.

I created a docker-compose yaml to specify all the components of the cluster, basically a seed and a set of nodes

I used  Hectcastro Riak docker image, because :  that’s the beauty of containers, they are reusable !

$git clone https://github.com/mbenachour/riak-cluster-compose.git
$cd riak-cluster-compose

 

  • start compose as a daemon :
$docker-compose up -d
  • you will have one seed and one node running :

Screen Shot 2015-12-15 at 5.00.52 PM

  • list all running containers

Screen Shot 2015-12-15 at 5.02.33 PM

  • let’s create an “Artists” buckets and add an object using node1 (port : 32812)
$curl -i -d '{"name":"Bruce"}' -H "Content-Type: application/json" \
localhost:32812/riak/artists/Bruce
  • check the bucket was created :

Screen Shot 2015-12-15 at 5.30.05 PM

Scale time ! : 

  • to scale you cluster you can add 3 nodes
    $docker scale riak_node=4

Screen Shot 2015-12-15 at 5.32.32 PM

  •  let’s query node4 for the list of buckets (port: 32818)

Screen Shot 2015-12-15 at 5.36.08 PM

works !

if you are more interested to scale on multiple hosts, you can combine this with docker swarm.

 

Create a free website or blog at WordPress.com.

Up ↑