Storing your ML Models with parameters

Often when training machine learning models you find yourself creating different estimators and tuning this parameter or that to get the results you want, you may also find yourself wanting to save the results of those iterations, to save you time in the future.

that’s what I’m trying to address in this post, having some sort of artifact repository for machine learning models, but saving your parameters as metadata using the following design :

Screen Shot 2018-12-09 at 8.38.33 PM

1: user uploads artifacts using pre-signed s3 URLs

2  and 3: a putObject event triggers the lambda function to make an API call to an ec2 instance running an HTTP server  to read the estimator from S3 and get the parameters

4: saving the parameters in DynamoDB

 

Uploading artifacts :

I use AWS S3 to store the assets, making use of the pre-signed URL feature that gives you the possibility to use temporary URLs to upload files to S3, which takes away the managing permissions.

to orchestrate all this I like to use my favorite serverless framework.

here is the code on github

Deploying Serverless stack :

[code]$serverless deploy[/code]

Screen Shot 2018-12-10 at 1.19.57 PM

this will create 5 endpoints :

  • POST /dev/asset
  • GET /dev/asset
  • PUT /dev/asset/{asset_id}
  • DELETE /dev/asset/{asset_id}

these endpoints will allow you to update/create/delete an artifact, in this case, is a model.

for more reading about this check out the readme page of this serverless example

Getting the parameters :

in this part, in the EC2 instance, we will try to download the model and get the parameters to store them in dynamo DB

Initially I thought I could leverage all of this work in Lambda, so I don’t have to create an EC2 instance just to read the parameters, unfortunately, there couple issues with that solution, one of them is the size of the dependencies once you add SKlearn libraries as a dependency, the size of the lambda zip reaches 60Mb. But once uploaded there was an issue running SKlearn part of the lambda, for this iteration I decided to use a t2.micro on EC2.

the EC2 has a python web server running that get requests with an asset_id, downloads the asset, get the parameters and store them in dynamodb

this is the code for the server :

https://github.com/mbenachour/store_ml_models/blob/master/server.py

Testing the upload :

to test all this I created a small python script:

[code]

import sys
import requests
from sklearn.externals import joblib

def upload(filename):
model = loadModel(filename)
print (model.get_params())
url = ‘https://oo0cl2av91.execute-api.us-east-1.amazonaws.com/dev/asset’
response = requests.post(url)
print (response)
presigned = response.json().get(‘body’).get(‘upload_url’)
response = requests.put(presigned, data=open(filename).read())
print (response)

def loadModel(model_path):
download_path = model_path
#s3_client.download_file(BUCKET_NAME, model, ‘/tmp/model.pkl’)
return joblib.load(download_path)

upload(sys.argv[1])

[/code]

to run it use :

[code] python test.py  your_model.pkl [/code]

if you look at your dynamodb table you will see that your model has a description :

Screen Shot 2018-12-10 at 11.40.30 PM

Querying Ercot public dataset using AWS Glue and Athena

ERCOT is an acronym for Electric Reliability Council of Texas, it manages the flow of electric power to more than 25 million Texas customers — representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and 600+ generation units. It also performs financial settlement for the competitive wholesale bulk-power market and administers retail switching for 7 million premises in competitive choice areas.

ERCOT also offers an online and public dataset giving market participants information on a variety of topics related to the market of electricity in the state of Texas, which makes it a good candidate for AWS products: Glue and Athena.

Tools :

AWS Glue 

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics

AWS Athena 

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run

Scraping Data :

the data on Ercot website is available as a collection of  .zip files, I used a python scraper from this github repository to only collect CSV files.

As an example, we will be collecting data about the total energy sold from this page

Using the previous tools the command would something like this :

[code]

python -m ercot.scraper “http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12334&reportTitle=DAM%20Total%20Energy%20Sold&showHTMLView=&mimicKey”

[/code]

the script will download the csv files and store them in a data folder :

Screen Shot 2018-11-11 at 9.07.37 PM

At this point, we transfer the data to S3 to be ready for AWS Glue, an optimization of this process could consist of creating a lambda function with a schedule to continuously upload  new datasets

Creating a Crawler 

you can add a Crawler in AWS Glue to be able to traverse datasets in S3 and create a table to be queried.

Screen Shot 2018-11-11 at 9.31.44 PM

 

At the end of its run, the crawler creates a table that contains records gathered from all the CSV files we downloaded from EROCT public dataset, in this instance the table is called: damtotqtyengysoldnp

Screen Shot 2018-11-11 at 9.26.13 PM

 

And now you can query Ahead! 

using AWS Athena, you can run different queries on the table we generated previously, here are some few examples :

Total energy sold by settlement point :

Screen Shot 2018-11-11 at 9.53.54 PM

 

Getting the hours of the day: 11/12/2018 with the Max  energy sold 

Screen Shot 2018-11-11 at 10.07.58 PM

 

 

Automating AWS server builds from ServiceNow requests.

A lot of organizations are using ServiceNow to manage requests for creating new compute resources in the could ( i.e servers, lambda functions, containers …)

Usually, CMP tools like Scalr, or even ServiceNow Cloud Management can ( to a certain degree ) automate this process, but there are two major issues with these approaches :

1 – For these tools to be used properly they need lengthy customization to fit the business architecture of the organization, mapping all the business units, integrating with existing tools …

2 – It can be expensive, the price per node model can add up quickly for companies with a large number of servers deployed.

This a way to automate server build by using ServerNow Orchestration Module, and Serverless to build the integration.

snow

Creating ServiceNow Form

 

Screen Shot 2018-10-15 at 10.16.32 AM

Screen Shot 2018-10-15 at 10.17.35 AM

 

ServiceNow workflow

Screen Shot 2018-10-14 at 5.38.03 PM

to simplify the process I’ve only used a simple RunScript Step, other activities can be added like CMDB integration, approvals, notifications, etc …

here is a look at what the script does :

  • Gets all the parameters from the form
  • make an API call to API Gateway and Lambda
  • Lambda will trigger

[code]

var os = current.variables.os;
var inst_size = current.variables.inst_size;

var request = new sn_ws.RESTMessageV2();
request.setEndpoint(‘https://9drnhpoorj.execute-api.us-east-1.amazonaws.com/dev/instance’);
request.setBasicAuth(“apikey1″,”mutKJWAolpsdfsdflksd8ndew02234LNFQQvq”);
request.setHttpMethod(‘POST’);
request.setRequestHeader(“Accept”, “application/json”);
request.setRequestHeader(‘Content-Type’, ‘application/json’);

request.setStringParameterNoEscape(‘os’,current.variables.os);
request.setStringParameterNoEscape(‘inst_size’,current.variables.inst_size);
request.setStringParameterNoEscape(‘volume_size’,current.variables.volume_size1);

request.setRequestHeader(‘Content-Type’, ‘application/json’);
request.setRequestBody(‘{“os”:”${os}”,”inst_size”:”${inst_size}”}’);
var response = request.execute();

[/code]

 

The AWS side

To build things on the AWS side, I used Serverless,

check out the code at the github repo

Demo

if I request an EC2 instance that’s an m1.small running an amazon linux OS

Screen Shot 2018-10-14 at 5.49.25 PM

if I take a look at the AWS console I can see that the server is getting created  :

Screen Shot 2018-10-15 at 10.23.14 AM

With the right parameters :

Screen Shot 2018-10-15 at 10.21.09 AM

 

How can AIOps help you prevent the next major incident.

What is it?

AIOps is a term that has been used in the last few years to describe the ability to drive intelligence from the day-to-day data that IT operations generate. The data source could vary from monitoring tools like SolarWinds to service desk tools like ServiceNow to automation tools like configuration management ( chef, puppet … ), or log search platforms like Splunk

Untitled Diagram (5)

One area where AIOps can be an asset to operation teams is incident predictability and remediation, there are others like storage and capacity management, resources utilization …

How can AIOPS help prevent the next outage :

the footprint of digital systems and businesses is increasing every day and so is the speed at which the data is produced.

For example, a Palo Alto firewall can produce up to 12 million events in one day, the manual correlation of data is nearly impossible, and that’s why we need an overview of the entire landscape of data produced by IT operations,  transformation of data to be able to serve as training and test sets for machine learning.

Starting from the promise that an incident is a result of a change ( voluntary or involuntary) to a configuration, a device, a network, or an application, all these changes if monitored and reported on correctly can help create a good context to understand the root-cause analysis of the incident.

You can create an ML model that will help you predict the next outage, notify operation teams, and help reduce the downtime.

Suppose that you transformed the input data that you gathered from all your sources, organized it into dataset like the one below and used a supervised learning process to create an ML model :

Screen Shot 2018-09-18 at 10.20.52 PM

your model will be able to make predictions of future incidents when fed with real-time input coming from your tools and logs :

Untitled Diagram (7)

over time, with more data, your model will get better at detecting future anomalies, with much more accuracy.

in conclusion

There is a lot of writing out there about AIOps, but the application, in my opinion, is a bit harder.

For different reasons, one being the spectrum of toolset in IT operations is very wide, and two being that the data structures are different from one organization to another, which means that trying to put a generic machine learning process to produce insights, will be at worst impossible and at best will lack accuracy.

For an organization to be able to get intelligent insights from  AIOps, there has to be an internal effort to train your models, because the quality of your future prediction of major incidents will essentially depend on the quality of your training and test sets.

 

 

 

Links :

AIOps Platforms

https://www.ca.com/us/products/aiops.html

https://www.splunk.com/blog/2017/11/16/what-is-aiops-and-what-it-means-for-you.html

Deploying Apps and ML models on mesosphere DC/OS

Have you ever thought of your data centers and cloud infrastructure ( private and public ) as one big computer? where you can deploy your applications with a click of a button, without worrying too much about the underlying infrastructure? well … DCOS allows you to manage your infrastructure from a single point, offering you the possibility to run distributed applications, containers, services, jobs while maintaining a certain abstraction from the infrastructure layer, as long as it provides computing, storage, and networking capabilities.

After deploying my ML model on a kubernates Cluster, a lambda function, I will deploy it on a DCOS cluster.

what is DCOS:

DCOS is a datacenter operating system, DC/OS is itself a distributed system, a cluster manager, a container platform, and an operating system.

DC/OS Architecture Layers

DCOS manages the 3 layers of software, platform, and infrastructure.

the dashboard :

Screen Shot 2018-09-03 at 7.43.36 PM

the catalog:

DCOS UI offers a catalog of certified and community packages that the users can install in seconds , like kafka, spark, hadoop, MySQL ..

 

 

Deploying Apps and ML models on DCOS :

the application I’m deploying is a web server running the model I created in my previous posts to make predictions.

DCOS relies on an application definition file that looks like this :

app.json :

[code]
{
“volumes”: null,
“id”: “mlpregv3”,
“cmd”: “python server.py”,
“instances”: 1,
“cpus”: 1,
“mem”: 128,
“disk”: 0,
“gpus”: 0,

“container”: {
“type”: “DOCKER”,
“docker”: {
“image”: “mbenachour/dcos-mlpreg:1”,
“forcePullImage”: false,
“privileged”: false,
“network”: “HOST”,
“portMappings”: [
{ “containerPort”: 8088, “hostPort”: 8088 }
]
}
}
}[/code]

 

the rest of the code can be found in my GitHub repo

after you configure your DCOS CLI and log in, you can run this command :

Screen Shot 2018-09-03 at 8.01.37 PM

if we take a look at the UI we can see that app/web server has been deployed :

Screen Shot 2018-09-03 at 8.03.35 PM

Deploy machine learning models on AWS lambda and serverless

in the last post, we talked about how to deploy a Machine learning trained model on Kubernates.

Here is another way of deploying ML models: AWS lambda + API gateway

Untitled Diagram (3)

Basically, your model (mlpreg.pkl) will be stored in S3, your lambda function will download the model use it to make predictions, another call will allow you to get the model hyperparameters, and sent it back to the user.

Screen Shot 2018-08-07 at 9.11.00 AM

to deploy AWS services, we will use a framework called Serverless

serverless allow you with a single configuration file to define functions, create resources, declare permissions, configure endpoints …

serverless uses one main config file and one or multiple code files :

  • handler.py : the lambda function
  • serverless.yml : serverless configuration file

here is what the serverless configuration file for this example would look like :

[code]

service: deploy-ml-service
plugins:
– serverless-python-requirements
provider:
name: aws
runtime: python2.7
iamRoleStatements:
– Effect: Allow
# Note: just for the demo, we are giving full access to s3
Action:
– s3:*
Resource: “*”
functions:
predict:
handler: handler.predict
events:
– http:
path: predict
method: post
cors: true
integration: lambda
getModelInfo:
handler: handler.getModelInfo
events:
– http:
path: params
method: post
cors: true
integration: lambda[/code]

as described in the example we will create two functions one will make a prediction using the model we built in the last post, the other one will display the model hyperparameters :

  • predict
  • getModelInfo

to load the model we have  :

  • load_model : loading the stored model from S3

handler.py

[code]
from sklearn.externals import joblib
import boto3

BUCKET_NAME = ‘asset-s3-uploader-02141’

def predict(event,context):
input = event[“body”][“input”]
modelName = event[“body”][“model_name”]
data = float(input)
return loadModel(modelName).predict(data)[0]

def loadModel(model):
s3_client = boto3.client(‘s3’)
download_path = ‘/tmp/model.pkl’
s3_client.download_file(BUCKET_NAME, model, ‘/tmp/model.pkl’)
return joblib.load(download_path)

def getModelInfo(event,context):
model = event[“body”][“model_name”]
return loadModel(model).get_params()
[/code]

$Serverless Deploy ! 

yep that’s all it takes, and your services will be deployed in seconds:

Screen Shot 2018-08-06 at 9.25.59 PM

Run the tests:

getting the model Hyperparameters :

[code]

root@58920085f9af:/tmp/deploy# curl -s -d “model_name=mlpreg.pkl” https://abcefgh123.execute-api.us-east-1.amazonaws.com/dev/params | python -m json.tool
{
“activation”: “relu”,
“alpha”: 0.001,
“batch_size”: “auto”,
“beta_1”: 0.9,
“beta_2”: 0.999,
“early_stopping”: false,
“epsilon”: 1e-08,
“hidden_layer_sizes”: [
1000
],
“learning_rate”: “constant”,
“learning_rate_init”: 0.01,
“max_iter”: 1000,
“momentum”: 0.9,
“nesterovs_momentum”: true,
“power_t”: 0.5,
“random_state”: 9,
“shuffle”: true,
“solver”: “adam”,
“tol”: 0.0001,
“validation_fraction”: 0.1,
“verbose”: false,
“warm_start”: false
}

[/code]

Making Predictions :

[code]

root@58920085f9af:/tmp/deploy# curl -s -d “input=1&model_name=mlpreg.pkl” https://abcdefg123.execute-api.us-east-1.amazonaws.com/dev/predict | python -m json.tool
0.13994134155335683

[/code]

Automating the training and deployment of ML models on Kubernates

With the rise of Machine Learning and models, the need for automating and streamlining model deployment become a necessity. Pushed mostly by the fact that ML models as a new way of programming, are no longer an experimental concept but rather a day to day artifacts that can also follow a release and versioning process.

here is a link to the code used below: github

throughout this example I will :

  • train a model.
  • serialize it and save it.
  • build a docker image with front-end web server.
  • make a deployment on a kubernates cluster.

requirements:

scikit

docker

minikube & Kubernates

Building The Model

Training  data : 

our training data is going to be generated with a math function y= sin(2*π*tan(x)), where is between 0 and 1 with an increment of 0.001.

x = np.arange(0.0, 1, 0.001).reshape(-1, 1)

x = [[ 0. ]
[ 0.001]
[ 0.002]

………

[ 0.997]
[ 0.998]
[ 0.999]]

y = np.sin(2 * np.pi * np.tan(x).ravel()) #with max/min values of 1,-1

Screen Shot 2018-06-20 at 2.40.48 PM

Fitting the Model : 

In this example, I will use  a MultiLayer Perceptron implemented by SCIKIT python library

This is what the regressor function will all the parameters ( already tuned ) :

reg = MLPRegressor(hidden_layer_sizes=(500,), activation='relu', solver='adam', alpha=0.001,batch_size='auto',

learning_rate='constant', learning_rate_init=0.01, power_t=0.5, max_iter=1000, shuffle=True,

random_state=9, tol=0.0001, verbose=False, warm_start=False, momentum=0.9,

nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999,

epsilon=1e-08)
Test Data :
for testing we will use a generated set of data as well :
test_x = np.arange(0.0, 1, 0.05).reshape(-1, 1)
Prediction :
test_y = reg.predict(test_x)
Results :
continuous blue is the real output
dotted red is the predicted output
Screen Shot 2018-06-20 at 4.27.17 PM
Saving the model :
I used Python object serialization framework Pickle :
joblib.dump(reg, 'mlpreg.pkl')
this will save your model to a file named: mlpreg.pkl

Deploying the model

building an image :
I have created a docker image for deploying the model on a web server :

[code] FROM python:2.7.15-stretch

COPY MLPReg.py .

COPY server.py .

RUN python -m pip install –user numpy scipy matplotlib ipython jupyter pandas sympy nose

RUN python -m pip install -U scikit-learn

RUN python MLPReg.py

EXPOSE 8088

CMD python server.py

[/code]

to build the image you can run this :
docker build -t mbenachour/mlpreg:latest .
kubernates deployment :
this the kubernates yaml file that describes the deployment :

[code ]

apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: mlpreg-deployment
labels:
app: mlpreg
spec:
replicas: 3
selector:
matchLabels:
app: mlpreg
template:
metadata:
labels:
app: mlpreg
spec:
terminationGracePeriodSeconds: 30
containers:
– name: mlpreg
image: mbenachour/mlpreg:latest
imagePullPolicy: “Always”
ports:
– containerPort: 8088

apiVersion: v1
kind: Service
metadata:
name: mlpreg-svc
labels:
app: mlpreg
#tier: frontend
spec:
type: NodePort
ports:
– port: 8088
selector:
app: mlpreg
#tier: frontend

[/code]

you can deploy the kubernates model :
kubectl apply -f mlp.yml
to check on the status of your kubernates services :
kubectl get services
you should see something similar to this :
Screen Shot 2018-06-21 at 4.31.45 PM.png

Making predictions

to get the kubernates cluster IP address, in my case I’m using minikube the command line :

$minikube service mlpreg-svc --url

http://192.168.99.105:32397
to make a prediction using the API for an input of 0.1
Screen Shot 2018-06-21 at 5.33.13 PM

Pipeline.ai

a lot of products have been introduced to help solve this problem, one of them is Chris Fregly project: pipeline.ai
the project gives you the possibility to create-train-deploy models using different frameworks :
– tensorflow
– scikit
– pytorch
implementing a lot of ML most used algorithms like linear regression.

Using AWS GuardDuty to stop compromised instances and send notifications.

GuardDuty  (announced in the 2017 edition of AWS Re:Invent) , is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. It monitors for activity such as unusual API calls or potentially unauthorized deployments that indicate a possible account compromise. GuardDuty also detects potentially compromised instances or reconnaissance by attackers.

with a minimal amount of code, and few clicks in the AWS console we can set up guardduty to scan EC2 fleets for eventual threats, notify a lambda function to stop the compromised instances and send an SMS notification using AWS SNS service:

Screen Shot 2018-01-04 at 9.43.11 AM

1- Testing few threats :

1-a – Bitcoin mining : one of the possible threats is using your EC2 instances for bitcoin mining , I started a bitcoind container on my EC2 instance to :

Screen Shot 2018-01-04 at 9.53.26 AM

1-b SSH brute-force : I’m not using any username and passwords dictionaries

Screen Shot 2018-01-04 at 9.55.03 AM

2- SNS topic : create an SNS topic called guardduty_alerts, with an SMS subscription

3- Lambda: for stopping instances and sending notifications

import boto3
import json

def lambda_handler(event, context):
print(‘loading handler’)# print(event)
sns = boto3.client(service_name = “sns”)
topicArn = ‘arn:aws:sns:us-east-1:9999999999:guardduty_alerts’

#
result = json.loads(event)# result is now a dict
instanceId = event[‘detail’][‘resource’][‘instanceDetails’][‘instanceId’]
type = event[‘detail’][‘description’]
message = “your EC2 instance ” + instanceId + “has been compromised by attack of ” + type + “, it will be stopped”
sns.publish(
TopicArn = topicArn,
Message = message
)

ec2 = boto3.client(‘ec2’, region_name = ‘us-east-1’)
ec2.stop_instances(InstanceIds = [instanceId])
return

4- CloudWatch rule: create a cloudwatch rule that triggers the lambda function we created previosly

 

et voila , all the threats that we did earlier shows in the GuardDuty findings :

Screen Shot 2018-01-04 at 10.36.08 AM

Stoping the compromised instances :

Screen Shot 2018-01-04 at 10.42.33 AM

sending notifications:

Screen Shot 2018-01-04 at 10.43.29 AM

 

Local (and S3) cloud storage server using Minio

Minio is a local cloud object storage server, it’s open source, released under Apache License V2.0, allowing developers and devops to have a local and a public cloud storage to:

  • backup VMs
  • backup containers
  • store unstructured Data ( photos, files, …)
  • store objects in AWS S3
  • store objects using SDKs (GO, Javascripts, Java )

to start a server you can use the container image of MiniO available on Docker hub here :

mini0/minio

you can run this cmd :

docker pull minio/minio

to start the server run :

 docker run -p 9000:9000 minio/minio server /export

you can access the web UI at http://localhost:9000

screen-shot-2017-01-02-at-7-21-48-pm

the access key and secret key for the local server are generated at the start of the server

create a bucket :

screen-shot-2017-01-02-at-9-15-52-pm

accessible also through the web UI:

screen-shot-2017-01-02-at-9-18-00-pm

using your AWS S3 storage :

we need to add AWS S3 end point to the list of hosts :

mc config host add my-aws https://s3.amazonaws.com YOUR_ACCESS_KEY  YOUR_SECRET_KEY

create a bucket in S3 :

screen-shot-2017-01-02-at-10-30-19-pm

and it’s created  :

screen-shot-2017-01-02-at-10-36-10-pm

CI and code promotion for Chef cookbooks with Jenkins – POC

 

I have been browsing the internet for blogs or articles to help Chef developers have a way of promoting the code of their cookbooks, a way of vetting code, and avoiding that code goes from Operations guys straight to production.

I have found a lot of theoretical articles on building a CI pipeline for chef cookbooks, but not a lot of practical ones,  so I decided to make a proof of concept for  the public and my team as well.

when it comes to integration tools, I like Jenkins, it’s open source and the community is very active in adding  and updating plugins .

In this example I will use a Java cookbook as a code base, and I will be running 4 test :

  • Foodcritic : a helpful lint tool you can use to check your Chef cookbooks for common problems. It comes with 61 built-in rules that identify problems ranging from simple style inconsistencies to difficult to diagnose issues that will hurt in production.
  • ChefSpec : a unit testing framework for testing Chef cookbooks. ChefSpec makes it easy to write examples and get fast feedback on cookbook changes without the need for virtual machines or cloud servers.
  • Rubocop :  a Ruby static code analyzer. Out of the box it will enforce many of the guidelines outlined in the community Ruby Style Guide.
  • Test Kitchen : a test harness tool to execute your configured code on one or more platforms in isolation. A driver plugin architecture is used which lets you run your code on various cloud providers and virtualization technologies such as Amazon EC2,  Vagrant, Docker, LXC containers, and more. Many testing frameworks are already supported out of the box including Bats, shUnit2, RSpec, Serverspec, with others being created weekly.

of course you can have all these tools in one package…. the famous ChefDK

Code Promotion :

The concept of code promotion helps the CI process distinguish between good and bad builds, I like to define a good build as a build where ALL the tests are successful .

Jenkins helps you implement this concept with a community plugin : Promoted Build Plugin

based of the status of your build ( promoted of not)  you can control the code that goes into your repository (github or gitlab), you can use set up hooks to deny merge requests from builds that are not promoted.

Jobs:

let’s setup our jobs, we will too categories of jobs :

  • Build Jobs
  • Test Jobs

 

screen-shot-2016-09-18-at-9-00-30-pm

whenever a build job is successful will trigger all the test jobs to start.

screen-shot-2016-09-19-at-9-24-50-am

Build-java-cookbook : will clone the code repo and create a temporary artifact, this is the config section for this job

screen-shot-2016-09-18-at-9-04-18-pm

Rubocop Test : will copy the temporary artifact decompress it to have all code repo and run rubocop test on the code :

screen-shot-2016-09-18-at-9-18-58-pm

ChefSpec Test :

screen-shot-2016-09-19-at-9-26-48-am

FoodCritic : 

screen-shot-2016-09-19-at-9-34-11-am

Test Kitchen : 

screen-shot-2016-09-19-at-9-39-28-am

Test Kitchen will spin a vagrant box (ubuntu-14.04) and run the cookbook on it and test the results.

First Run : 

with the configurations above we run the build and test jobs :

Result :

screen-shot-2016-09-19-at-9-54-56-am

Rubocop test failed, by looking at the execution log we can see why :

+ rubocop
Inspecting 44 files
.......CC...CC........C..C..................

Offenses:

providers/alternatives.rb:34:38: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus == 0.
      alternative_exists_same_prio = shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path} | grep 'priority #{priority}$'").exitstatus == 0
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:35:28: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus == 0.
      alternative_exists = shell_out("#{alternatives_cmd} --display #{cmd} | grep #{alt_path}").exitstatus == 0
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:43:18: C: Use remove_cmd.exitstatus.zero? instead of remove_cmd.exitstatus == 0.
          unless remove_cmd.exitstatus == 0
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:57:18: C: Use install_cmd.exitstatus.zero? instead of install_cmd.exitstatus == 0.
          unless install_cmd.exitstatus == 0
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:66:28: C: Use shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus.zero? instead of shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus == 0.
      alternative_is_set = shell_out("#{alternatives_cmd} --display #{cmd} | grep \"link currently points to #{alt_path}\"").exitstatus == 0
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:72:16: C: Use set_cmd.exitstatus.zero? instead of set_cmd.exitstatus == 0.
        unless set_cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^^^^^
providers/alternatives.rb:87:50: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
    new_resource.updated_by_last_action(true) if cmd.exitstatus == 0
                                                 ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:39:20: C: Omit parentheses for ternary conditions.
    package_name = (file_name =~ /^server-jre.*$/) ? 'jdk' : file_name.scan(/[a-z]+/)[0]
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
providers/ark.rb:134:14: C: Use 0o for octal literals.
        mode 0755
             ^^^^
providers/ark.rb:149:14: C: Closing method call brace must be on the line after the last argument when opening brace is on a separate line from the first argument.
            ))
             ^
providers/ark.rb:150:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:157:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:164:16: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
        unless cmd.exitstatus == 0
               ^^^^^^^^^^^^^^^^^^^
providers/ark.rb:172:14: C: Use cmd.exitstatus.zero? instead of cmd.exitstatus == 0.
      unless cmd.exitstatus == 0
             ^^^^^^^^^^^^^^^^^^^
recipes/ibm.rb:44:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/ibm_tar.rb:36:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/ibm_tar.rb:49:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/set_java_home.rb:27:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
recipes/set_java_home.rb:32:8: C: Use 0o for octal literals.
  mode 00755
       ^^^^^
resources/ark.rb:39:54: C: Use 0o for octal literals.
attribute :app_home_mode, kind_of: Integer, default: 0755
                                                     ^^^^

44 files inspected, 20 offenses detected
Build step 'Execute shell' marked build as failure
Finished: FAILURE

 

 

let’s go ahead and fix these offenses and commit the code :

screen-shot-2016-09-19-at-10-38-02-am

we restart the build :

and here everything is green this time :

screen-shot-2016-09-19-at-11-40-27-am

from this point on you can create do two things :

  • save your cookbook in a private supermarket with a corresponding version number
  • upload this cookbook to chef server

 

Promotion status :

After the completion of all tests, this build can now be promoted.

screen-shot-2016-09-19-at-11-48-31-am