My books recommendations for AI and ML

IMG_6567 copy

With AI and machine learning creeping in every industry, the paradigm of IT as we know it is also changing, ML is a natural extension of the software revolution we have seen in the last decades, knowing how to utilize ML in your industry will be a key element for success and growth in the coming years.

This transformation will need a new vision, as new jobs, new platforms and new ways of doing business will emerge from it. I believe at this point we are past the hype of AI and we are in the middle of a reality where machine learning and inference are helping thousands of businesses grow and prosper.

I have read several books on AI and ML, and the two that stands out are:

  • Human + Machine , reimagining work in the age of AI
  • Pragmatic AI : an introduction to Cloud-Based Machine learning.

either you are an engineer, a manager, an executive, or merely driven by curiosity about AI and ML, I recommend that you read these books to fully grasp its impact on many industries

Human + Machine, reimagining work in the age of AI

Paul R. Daugherty and H. James Wilson did an amazing job at reimaging what work will look like in the age of AI, they introduced the notion of the Missing Middle; a realistic approach of looking at this transformation by defining what machine can do, what humans can do, and  where humans and machines have hybrid activities.

Humans can judge, lead, empathize and create, machines can iterate, predict and adapt.

AI can give humans superpowers, but humans need to train, sustain machines, and at times explain its decisions.

Paul and James talk about an entirely new set of jobs that will emerge from this alliance.

Pragmatic AI : an introduction to Cloud-Based Machine learning

if you are an engineer who likes to understand how the training and the inference work under the hood, this book would be a great resource for you.

Pragmatic AI explains how you can utilize cloud resources in AWS Azure, and GCP to train your models, optimize them, and deploy a production scale machine learning powered application.

the book also contains real applications and code samples to help reproduce it on your own, and it covers the following topics :

  • AI and ML toolchain: from python ecosystem toolchains like numpy,  Jupiter Notebooks and others to the tools available on AWS GCP and Azure
  • DevOps practices to help you deliver and deploy
  • Creating practical AI applications from scratch
  • Optimization

 

there are definitely a lot of publications concerning AI and ML, but the combination of the two books above will cover the organizational and structural challenges that an organization will face when it comes to adopting AI, and also the technical backgrounds needed to work with it.

Storing your ML Models with parameters

Often when training machine learning models you find yourself creating different estimators and tuning this parameter or that to get the results you want, you may also find yourself wanting to save the results of those iterations, to save you time in the future.

that’s what I’m trying to address in this post, having some sort of artifact repository for machine learning models, but saving your parameters as metadata using the following design :

Screen Shot 2018-12-09 at 8.38.33 PM

1: user uploads artifacts using pre-signed s3 URLs

2  and 3: a putObject event triggers the lambda function to make an API call to an ec2 instance running an HTTP server  to read the estimator from S3 and get the parameters

4: saving the parameters in DynamoDB

 

Uploading artifacts :

I use AWS S3 to store the assets, making use of the pre-signed URL feature that gives you the possibility to use temporary URLs to upload files to S3, which takes away the managing permissions.

to orchestrate all this I like to use my favorite serverless framework.

here is the code on github

Deploying Serverless stack :

[code]$serverless deploy[/code]

Screen Shot 2018-12-10 at 1.19.57 PM

this will create 5 endpoints :

  • POST /dev/asset
  • GET /dev/asset
  • PUT /dev/asset/{asset_id}
  • DELETE /dev/asset/{asset_id}

these endpoints will allow you to update/create/delete an artifact, in this case, is a model.

for more reading about this check out the readme page of this serverless example

Getting the parameters :

in this part, in the EC2 instance, we will try to download the model and get the parameters to store them in dynamo DB

Initially I thought I could leverage all of this work in Lambda, so I don’t have to create an EC2 instance just to read the parameters, unfortunately, there couple issues with that solution, one of them is the size of the dependencies once you add SKlearn libraries as a dependency, the size of the lambda zip reaches 60Mb. But once uploaded there was an issue running SKlearn part of the lambda, for this iteration I decided to use a t2.micro on EC2.

the EC2 has a python web server running that get requests with an asset_id, downloads the asset, get the parameters and store them in dynamodb

this is the code for the server :

https://github.com/mbenachour/store_ml_models/blob/master/server.py

Testing the upload :

to test all this I created a small python script:

[code]

import sys
import requests
from sklearn.externals import joblib

def upload(filename):
model = loadModel(filename)
print (model.get_params())
url = ‘https://oo0cl2av91.execute-api.us-east-1.amazonaws.com/dev/asset’
response = requests.post(url)
print (response)
presigned = response.json().get(‘body’).get(‘upload_url’)
response = requests.put(presigned, data=open(filename).read())
print (response)

def loadModel(model_path):
download_path = model_path
#s3_client.download_file(BUCKET_NAME, model, ‘/tmp/model.pkl’)
return joblib.load(download_path)

upload(sys.argv[1])

[/code]

to run it use :

[code] python test.py  your_model.pkl [/code]

if you look at your dynamodb table you will see that your model has a description :

Screen Shot 2018-12-10 at 11.40.30 PM