Historical Price Service

Welcome to this walkthrough on the Tyba Historical API.

By the end of this guide, you’ll know how to retrieve historical energy and ancillary prices.

Also, the various code objects mentioned here all contain hyperlinks to the relevant parts of the Client API Reference, so make sure to click on them for more information!

Setting Up

First, ensure you’ve installed the Tyba Client. After instantiating a Tyba client instance, one can access the historical price client by calling client.services and its sub interfaces lmp and ancillary

import os
import pandas as pd
from tyba_client.client import Client

PAT = os.environ["TYBA_PAT"]
client = Client(PAT)
services = client.services
lmp = services.lmp
ancillary = services.ancillary

Fetching Available ISOs

The main use of the services interface is the get_all_isos() method. This method lists all of the regional transmission operators and independent system operators (all generally referred to as ISOs) that are represented in Tyba’s historical price dataset. Even if we are confident that the ISO we care about is in the dataset, we usually want to start here to make sure that we have the right string identifier for our ISO.

isos = services.get_all_isos().json()
print(isos)

#returns
['PJMISO', 'CAISO', 'NEISO', 'NYISO', 'ERCOT', 'SPPISO', 'MISO']

Note that get_all_isos() actually returns a requests.Response object, so we have to extract the useful JSON data from the body of the response. This is the pattern used for all of the methods in the historical forecast API

Fetching Historical Energy Prices For a Single Node

Let’s suppose we’re interested in HB_HOUSTON’s day-ahead energy prices from 2020 through 2021. First, we use the lmp.search_nodes method to find the relevant node id

res = lmp.search_nodes(node_name_filter="HB_HOUSTON")
res.raise_for_status()  # this will raise an error if our search request was unsuccessful
nodes_list = res.json()['nodes']
print(nodes_list)

#returns
[{'node/id': '10000697077', 'node/name': 'HB_HOUSTON', 'node/iso': 'ERCOT', 'node/lat': 29.852151,
'node/lng': -95.477535}]

In this case, our node search returned data for a single node since we applied such a specific name filter. We can then extract the node id and use it to get a pandas Series of day-ahead prices with the lmp.get_prices method

from tyba_client.client import Market
import pandas as pd
node_id = nodes_list[0]["node/id"]
res = lmp.get_prices(
    node_ids=[node_id], # note this has to be a list, even if there is just a single ID
    market=Market.DA,
    start_year=2020,
    end_year=2021,
)
res.raise_for_status()
price_data = res.json()[node_id]  # results are also organized by node ID
price_data = pd.Series(index=pd.to_datetime(price_data["datetimes"]), data=price_data["prices"])
print(price_data.head())

#returns
2020-01-01 00:00:00    11.27
2020-01-01 01:00:00     9.73
2020-01-01 02:00:00     9.58
2020-01-01 03:00:00     9.65
2020-01-01 04:00:00    10.18

Working with Node Data

The search_nodes() method provides the ability to search based on location, node name and ISO. For example, we can look at all the nodes around a particular location:

res = lmp.search_nodes(location="29.760427, -95.369804")
res.raise_for_status()  # this will raise an error if our search request was unsuccessful
node_search_df = pd.DataFrame(res.json()['nodes'])
print(node_search_df.head())

#returns
    node/lng node/iso    node/name   node/lat      node/id  node/distance-meters
0 -95.379870    ERCOT        CR_RN  29.776851  10018267019           2061.858094
1 -95.398456    ERCOT  BRHEIGHT_RN  29.782491  10016437277           3692.691396
2 -95.399247    ERCOT   TC_GTG2_RN  29.703634  10017407316           6909.874096
3 -95.399247    ERCOT      TC_PUN1  29.703634  10000698825           6909.874096
4 -95.399247    ERCOT      TC_PUN2  29.703634  10000698826           6909.874096

The data in node_search_df can be used to determine relevant nodes near a project site and generally for mapping/plotting purposes. However, note that the node data returned is related to the use of the location search parameter. To access other node data such as data start and end years, node type etc. we can use the lmp.get_all_nodes method

res = lmp.get_all_nodes(iso="ERCOT")
res.raise_for_status()
node_df = pd.DataFrame(res.json())
print(node_df.head())

#returns
   da_start_year  rt_end_year     zone          name       type      substation  da_end_year           id  rt_start_year
0         2011.0         2025    SOUTH  OXY_OXY_CCG1  GENERATOR      OXYCHEM CC       2025.0  10000698721           2011
1         2022.0         2025    SOUTH  SLCN_ESS_ALL  GENERATOR             NaN       2025.0  10016671726           2022
2         2024.0         2025    SOUTH   ANEM_ESS_RN  GENERATOR             NaN       2025.0  10017318833           2024
3         2011.0         2025    NORTH     WHCCS_CC1  GENERATOR WOLF HOLLOW GEN       2025.0  10000698379           2011
4         2011.0         2025  HOUSTON     SCLP_PUN4  GENERATOR    SWEENY COGEN       2025.0  10000698780           2011

Note that iso is a keyword argument, so simply typing lmp.get_all_nodes("ERCOT") won’t work.

The search_nodes() and get_all_nodes() methods can be used together for more complex node filtering. For example, let’s suppose that we want the 3 closest unique nodes to our location that also have at least 3 years of real time price data

node_df = node_df.set_index("id")  # set the id column as the index
node_search_df = node_search_df.set_index("node/id")
node_df = pd.concat([node_df, node_search_df], axis=1, join="inner")  # combine the 2 dfs by id, leaving only the subset in node_search_df
node_df.index.name = "id"
node_df = node_df[node_df["rt_end_year"] - node_df["rt_start_year"] >= 3]  # filter for data span
node_df["latlon"] = node_df["node/lng"].astype(str) + node_df["node/lat"].astype(str)
node_df = new_df.reset_index().groupby("latlon").first().set_index("id").sort_values(by="node/distance-meters")  # filter for unique lat/lon
print(node_df[["rt_start_year", "rt_end_year", "node/distance-meters"]].head())

#returns
             rt_start_year  rt_end_year  node/distance-meters
id
10016437277           2021         2025           3692.691396
10000698826           2011         2025           6909.874096
10000698744           2011         2024          12913.743104
10016232998           2021         2025          13086.640441
10016437499           2021         2024          14409.202592

Fetching Historical Energy Prices For Multiple Nodes

We can pass multiple node IDs to the lmp.get_prices method. Continuing the example from the previous section, let’s suppose that we are interested in real-time energy price data for the top 3 nodes in node_df and generate a real time price dataframe organized by node name

node_df = node_df.reset_index()
node_df = node_df.loc[:2, :]
ids_names = dict(zip(node_df["id"], node_df["name"]))
res = lmp.get_prices(
    node_ids=node_df["id"].values.tolist(),
    market=Market.RT,
    start_year=2021,
    end_year=2023,
)
res.raise_for_status()
price_data = res.json()
price_df = pd.concat([pd.Series(index=v["datetimes"], data=v["prices"], name=k) for k, v in price_data.items()], axis=1)
price_df = price_df.rename(columns=ids_names)
print(price_df.head())

#returns
                     BRHEIGHT_RN  TC_PUN2  PR_PR_G2
2021-01-01T00:00:00      15.4500  15.4375   15.4125
2021-01-01T01:00:00      14.8725  14.8750   14.8700
2021-01-01T02:00:00      14.5350  14.5375   14.5350
2021-01-01T03:00:00      15.8525  15.8475   15.8350
2021-01-01T04:00:00      17.0200  16.9800   16.9175

Please consult the LMP docs for more details and the Example Notebooks for more examples.

Fetch Historical Ancillary Prices for a Single Region

Let’s suppose we’re interested in Day-Ahead Regulation Up prices in the SP15 region in CAISO for the year 2020. For this task, we would use the ancillary.get_prices method

from tyba_client.models import AncillaryService
res = ancillary.get_prices(
                iso="CAISO",
                service=AncillaryService.REGULATION_UP,
                market=Market.DA,
                region="SP15",
                start_year=2020,
                end_year=2020,
            )
res.raise_for_status()
as_prices = res.json()
as_prices = pd.Series(index=pd.to_datetime(as_prices["datetimes"]).tz_localize(None), data=as_prices["prices"])
print(as_prices.head())

# returns
2020-01-01 00:00:00    12.27
2020-01-01 01:00:00     6.34
2020-01-01 02:00:00     6.49
2020-01-01 03:00:00    10.49
2020-01-01 04:00:00     8.32

Fetch Ancillary Prices for Multiple Regions, Services, and Markets

Unlike the lmp.get_prices method, ancillary.get_prices only pulls a single price time series. As such, we pull multiple price series by iterating and combining afterwards. For example, let’s fetch Day-Ahead and Real-Time prices for Regulation Up and Down for both the SP15 and NP15 regions in CAISO in 2022

regions = dict()
for region in ["SP15", "NP15"]:
    services = dict()
    for service in [AncillaryService.REGULATION_DOWN, AncillaryService.REGULATION_UP]:
        market_df = pd.DataFrame()
        for market in [Market.DA, Market.RT]:
            as_prices = ancillary.get_prices(
                iso="CAISO",
                service=service,
                market=market,
                region=region,
                start_year=2022,
                end_year=2022,
            ).json()
            market_df[market] = pd.Series(index=pd.to_datetime(as_prices["datetimes"]).tz_localize(None),
                                          data=as_prices["prices"])  # df of market prices for each region-service combo
        services[service] = market_df
    regions[region] = pd.concat(services, axis=1)  # df for each region
as_df = pd.concat(regions, axis=1)  # df combining regions
print(as_df.head())

# returns
                                                               SP15                                          NP15
                             Regulation Down          Regulation Up        Regulation Down          Regulation Up
                           dayahead realtime      dayahead realtime      dayahead realtime      dayahead realtime
2022-01-01 00:00:00            6.35      0.0          1.70     3.30          0.75      0.0          3.00     2.45
2022-01-01 01:00:00            3.00      0.0          1.28     1.50          3.00      0.0          1.88     1.88
2022-01-01 02:00:00            2.66      0.0          1.28     1.70          5.01      0.0          3.42     8.04
2022-01-01 03:00:00            1.44      0.0          1.28     2.56          5.02      0.0          3.42     2.41
2022-01-01 04:00:00            1.44      0.0          1.28     2.00         12.84      0.0          3.42     3.00

Working with Ancillary Region Data

The get_prices() method requires 5 data points to fetch prices: ISO, region, service, market, and date range. In the above examples, we knew which pricing regions we were interested in and, implicitly, that the combinations of region, service, market and year were available in the dataset. In situations where we don’t know what date ranges or even regions are available, we can use the ancillary.get_pricing_regions method to find out. We specify 3 of the input data points (ISO, service and market) and fetch the other 2 (available regions and their respective date ranges)

res = ancillary.get_pricing_regions(
    iso="CAISO",
    service=AncillaryService.REGULATION_UP,
    market=Market.DA)
res.raise_for_status()
as_regions = pd.DataFrame(res.json())
print(as_regions.head())

# returns
                     region  start_year  end_year
0  Pacific Northwest - SP15        2010      2025
1                      WAPA        2010      2025
2                      WAPA        2010      2025
3                      WAPA        2010      2025
4                      ZP26        2010      2025

More complex analyses can be performed by combining the results of multiple calls to get_pricing_regions(). For example, let’s suppose we want to know the regions and date ranges that are available in CAISO for real-time and both regulation up and down

import numpy as np
dfs = dict()
for service in [AncillaryService.REGULATION_UP, AncillaryService.REGULATION_DOWN]:
    res = ancillary.get_pricing_regions(
       iso="CAISO",
       service=AncillaryService.REGULATION_UP,
       market=Market.RT)
    res.raise_for_status()
    dfs[service] = pd.DataFrame(res.json()).drop_duplicates()  # currently duplicates returned in results
as_regions = dfs[AncillaryService.REGULATION_UP].merge(dfs[AncillaryService.REGULATION_DOWN], on="region", suffixes=["_reg_up", "_reg_down"])
as_regions["min_start_year"] = np.maximum(as_regions["start_year_reg_up"], as_regions["start_year_reg_down"])
as_regions["max_end_year"] = np.minimum(as_regions["end_year_reg_up"], as_regions["end_year_reg_down"])
print(as_regions[["region", "min_start_year", "max_end_year"]].head())

# returns
          region  min_start_year  max_end_year
0        Arizona            2011          2025
1          LADWP            2011          2025
2         Mexico            2011          2025
3  Nevada - NP26            2011          2025
4  Nevada - SP15            2011          2025

Please consult the Ancillary docs for more details and the Example Notebooks below for more examples.

Example Notebooks