Storing your ML Models with parameters

Often when training machine learning models you find yourself creating different estimators and tuning this parameter or that to get the results you want, you may also find yourself wanting to save the results of those iterations, to save you time in the future.

that’s what I’m trying to address in this post, having some sort of artifact repository for machine learning models, but saving your parameters as metadata using the following design :

Screen Shot 2018-12-09 at 8.38.33 PM

1: user uploads artifacts using pre-signed s3 URLs

2  and 3: a putObject event triggers the lambda function to make an API call to an ec2 instance running an HTTP server  to read the estimator from S3 and get the parameters

4: saving the parameters in DynamoDB

 

Uploading artifacts :

I use AWS S3 to store the assets, making use of the pre-signed URL feature that gives you the possibility to use temporary URLs to upload files to S3, which takes away the managing permissions.

to orchestrate all this I like to use my favorite serverless framework.

here is the code on github

Deploying Serverless stack :

$serverless deploy

Screen Shot 2018-12-10 at 1.19.57 PM

this will create 5 endpoints :

  • POST /dev/asset
  • GET /dev/asset
  • PUT /dev/asset/{asset_id}
  • DELETE /dev/asset/{asset_id}

these endpoints will allow you to update/create/delete an artifact, in this case, is a model.

for more reading about this check out the readme page of this serverless example

Getting the parameters :

in this part, in the EC2 instance, we will try to download the model and get the parameters to store them in dynamo DB

Initially I thought I could leverage all of this work in Lambda, so I don’t have to create an EC2 instance just to read the parameters, unfortunately, there couple issues with that solution, one of them is the size of the dependencies once you add SKlearn libraries as a dependency, the size of the lambda zip reaches 60Mb. But once uploaded there was an issue running SKlearn part of the lambda, for this iteration I decided to use a t2.micro on EC2.

the EC2 has a python web server running that get requests with an asset_id, downloads the asset, get the parameters and store them in dynamodb

this is the code for the server :

https://github.com/mbenachour/store_ml_models/blob/master/server.py

Testing the upload :

to test all this I created a small python script:


import sys
import requests
from sklearn.externals import joblib

def upload(filename):
model = loadModel(filename)
print (model.get_params())
url = 'https://oo0cl2av91.execute-api.us-east-1.amazonaws.com/dev/asset'
response = requests.post(url)
print (response)
presigned = response.json().get('body').get('upload_url')
response = requests.put(presigned, data=open(filename).read())
print (response)

def loadModel(model_path):
download_path = model_path
#s3_client.download_file(BUCKET_NAME, model, '/tmp/model.pkl')
return joblib.load(download_path)

upload(sys.argv[1])

to run it use :

 python test.py  your_model.pkl 

if you look at your dynamodb table you will see that your model has a description :

Screen Shot 2018-12-10 at 11.40.30 PM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: