Using Smart Meter Data for consumer electricity usage forecasting

Electric energy consumption is essential for promoting economic development and raising the standard of living. In contrast to other energy sources, electric energy cannot be stored for large-scale consumption. From an economic viewpoint, the supply and demand of electric energy must be balanced at any given time. Therefore, a precise forecasting of electric energy consumption is very important for the economic operation of an electric power grid.

The ability to create a forecasting model for an individual consumer can help determine the overall load on the grid for a given time.

For this post, we will classify this as a time series prediction problem and only use one variance.  Check back for a post where introduce mutli variance.

The Dataset :

I used Smart Meters Texas or SMT portal to access my home electricity usage. SMTstores daily, monthly and even 15-minute intervals of energy data. Energy data is  recorded by digital electric meters (commonly known as smart meters), and provides secure access to the data for customers and authorized market participants

In addition to acting as an interface for access to smart meter data, SMT enables secure communications with the customers in-home devices and provides a convenient and easy-to-use process where customers can voluntarily authorize market participants, other than the customer’s retail electric provider or third parties, access to their energy information and in-home devices.

Tools :

Keras: Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible

TensorFlow : TensorFlow is an open-source software library for dataflow programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks

The Algorithm: LSTM

In our case, we will be using a variant of Recurrent Neural Network ( RNN ) called Long Short Term Memory ( LSTM), why? time series problems are a difficult type of prediction modeling, LSTMs are good at extracting input features patterns when it’s over a long period of time, LSTM have the capability of retaining it in memory, and use it to predict next sequences, this what my 3 months usage (in KWH) look like ( by 15 min interval ) :

Screen Shot 2019-02-10 at 2.31.38 PM

for this example, I will only use a subset of the overall dataset, 3 days of electricity usage:

11/01/18 00:15,0.005
11/01/18 00:30,0.005
11/01/18 00:45,0.013
11/01/18 01:00,0.029
11/01/18 01:15,0.025
11/01/18 01:30,0.004
11/01/18 01:45,0.005
11/01/18 02:00,0.004
11/01/18 02:15,0.024

Screen Shot 2019-02-10 at 2.44.35 PM

python :

loading the dataset:

# load the dataset

dataframe = read_csv('AMS.csv', usecols=[1], engine='python')

dataset = dataframe.values

dataset = dataset.astype('float32')

Training Set and Test Set :

train_size = int(len(dataset) * 0.67)

test_size = len(dataset) - train_size

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

creating model :

model = Sequential()</pre>
model.add(LSTM(4, input_shape=(1, look_back))) model.add(Dense(1)) model.add(Activation('tanh')) model.compile(loss='mean_squared_error', optimizer='adam'), trainY, epochs=50, batch_size=1) 
making predictions:

trainPredict = model.predict(trainX)</pre>
testPredict = model.predict(testX) 

for plotting we shift the training and test sets :

# shift train predictions for plotting
trainPredictPlot = numpy.empty_like(dataset)
trainPredictPlot[:, :] = numpy.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# shift test predictions for plotting
testPredictPlot = numpy.empty_like(dataset)
testPredictPlot[:, :] = numpy.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict

if results were plotted, it would look something like the picture below:

Blue: actual usage

Orange: Test Set

Green: Predictions

Screen Shot 2019-02-11 at 8.27.33 AM

if we zoom on the prediction part :

Screen Shot 2019-02-11 at 8.34.54 AM

this is good forecasting, with a 0.19 RMSE

Summary :

With the electricity market undergoing a revolution, load forecasts have gained much more significance spreading across other business departments like energy trading and financial planning. Accurate load forecasts are the basis for most reliability organizations operations, like Electricity Reliability Consul Of Texas more commonly known as ERCOT. Accurate load forecasting will become even more important with Smart Grids. Smart Grids create the opportunity to proactively take action at the consumer level, storage level, and generation side, to avoid a situation of energy scarcity and/ or price surge.

Querying Ercot public dataset using AWS Glue and Athena

ERCOT is an acronym for Electric Reliability Council of Texas, it manages the flow of electric power to more than 25 million Texas customers — representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and 600+ generation units. It also performs financial settlement for the competitive wholesale bulk-power market and administers retail switching for 7 million premises in competitive choice areas.

ERCOT also offers an online and public dataset giving market participants information on a variety of topics related to the market of electricity in the state of Texas, which makes it a good candidate for AWS products: Glue and Athena.

Tools :

AWS Glue 

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics

AWS Athena 

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run

Scraping Data :

the data on Ercot website is available as a collection of  .zip files, I used a python scraper from this github repository to only collect CSV files.

As an example, we will be collecting data about the total energy sold from this page

Using the previous tools the command would something like this :

python -m ercot.scraper ""

the script will download the csv files and store them in a data folder :

Screen Shot 2018-11-11 at 9.07.37 PM

At this point, we transfer the data to S3 to be ready for AWS Glue, an optimization of this process could consist of creating a lambda function with a schedule to continuously upload  new datasets

Creating a Crawler 

you can add a Crawler in AWS Glue to be able to traverse datasets in S3 and create a table to be queried.

Screen Shot 2018-11-11 at 9.31.44 PM


At the end of its run, the crawler creates a table that contains records gathered from all the CSV files we downloaded from EROCT public dataset, in this instance the table is called: damtotqtyengysoldnp

Screen Shot 2018-11-11 at 9.26.13 PM


And now you can query Ahead! 

using AWS Athena, you can run different queries on the table we generated previously, here are some few examples :

Total energy sold by settlement point :

Screen Shot 2018-11-11 at 9.53.54 PM


Getting the hours of the day: 11/12/2018 with the Max  energy sold 

Screen Shot 2018-11-11 at 10.07.58 PM



Blog at

Up ↑