Querying Ercot public dataset using AWS Glue and Athena

ERCOT is an acronym for Electric Reliability Council of Texas, it manages the flow of electric power to more than 25 million Texas customers — representing about 90 percent of the state’s electric load. As the independent system operator for the region, ERCOT schedules power on an electric grid that connects more than 46,500 miles of transmission lines and 600+ generation units. It also performs financial settlement for the competitive wholesale bulk-power market and administers retail switching for 7 million premises in competitive choice areas.

ERCOT also offers an online and public dataset giving market participants information on a variety of topics related to the market of electricity in the state of Texas, which makes it a good candidate for AWS products: Glue and Athena.

Tools :

AWS Glue 

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics

AWS Athena 

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run

Scraping Data :

the data on Ercot website is available as a collection of  .zip files, I used a python scraper from this github repository to only collect CSV files.

As an example, we will be collecting data about the total energy sold from this page

Using the previous tools the command would something like this :

[code]

python -m ercot.scraper “http://mis.ercot.com/misapp/GetReports.do?reportTypeId=12334&reportTitle=DAM%20Total%20Energy%20Sold&showHTMLView=&mimicKey”

[/code]

the script will download the csv files and store them in a data folder :

Screen Shot 2018-11-11 at 9.07.37 PM

At this point, we transfer the data to S3 to be ready for AWS Glue, an optimization of this process could consist of creating a lambda function with a schedule to continuously upload  new datasets

Creating a Crawler 

you can add a Crawler in AWS Glue to be able to traverse datasets in S3 and create a table to be queried.

Screen Shot 2018-11-11 at 9.31.44 PM

 

At the end of its run, the crawler creates a table that contains records gathered from all the CSV files we downloaded from EROCT public dataset, in this instance the table is called: damtotqtyengysoldnp

Screen Shot 2018-11-11 at 9.26.13 PM

 

And now you can query Ahead! 

using AWS Athena, you can run different queries on the table we generated previously, here are some few examples :

Total energy sold by settlement point :

Screen Shot 2018-11-11 at 9.53.54 PM

 

Getting the hours of the day: 11/12/2018 with the Max  energy sold 

Screen Shot 2018-11-11 at 10.07.58 PM