Historical Price Service
Welcome to this walkthrough on the Tyba Historical API.
By the end of this guide, you’ll know how to retrieve historical energy and ancillary prices.
Also, the various code objects mentioned here all contain hyperlinks to the relevant parts of the Client API Reference, so make sure to click on them for more information!
Setting Up
First, ensure you’ve installed the Tyba Client. After
instantiating a Tyba client instance, one can access the historical price client by calling
client.services
and its sub interfaces
lmp
and ancillary
import os
import pandas as pd
from tyba_client.client import Client
PAT = os.environ["TYBA_PAT"]
client = Client(PAT)
services = client.services
lmp = services.lmp
ancillary = services.ancillary
Fetching Available ISOs
The main use of the services
interface is the
get_all_isos()
method. This method lists all of the regional transmission operators
and independent system operators (all generally referred to as ISOs) that are represented in Tyba’s historical price
dataset. Even if we are confident that the ISO we care about is in the dataset, we usually want to start here to make
sure that we have the right string identifier for our ISO.
isos = services.get_all_isos().json()
print(isos)
#returns
['PJMISO', 'CAISO', 'NEISO', 'NYISO', 'ERCOT', 'SPPISO', 'MISO']
Note that get_all_isos()
actually returns a requests.Response
object, so
we have to extract the useful JSON data from the body of the response. This is the pattern used for all of the methods
in the historical forecast API
Fetching Historical Energy Prices For a Single Node
Let’s suppose we’re interested in HB_HOUSTON’s day-ahead energy prices from 2020 through 2021. First, we use the
lmp.search_nodes
method to find the relevant node id
res = lmp.search_nodes(node_name_filter="HB_HOUSTON")
res.raise_for_status() # this will raise an error if our search request was unsuccessful
nodes_list = res.json()['nodes']
print(nodes_list)
#returns
[{'node/id': '10000697077', 'node/name': 'HB_HOUSTON', 'node/iso': 'ERCOT', 'node/lat': 29.852151,
'node/lng': -95.477535}]
In this case, our node search returned data for a single node since we applied such a specific name filter. We can
then extract the node id and use it to get a pandas Series of day-ahead prices with the
lmp.get_prices
method
from tyba_client.client import Market
import pandas as pd
node_id = nodes_list[0]["node/id"]
res = lmp.get_prices(
node_ids=[node_id], # note this has to be a list, even if there is just a single ID
market=Market.DA,
start_year=2020,
end_year=2021,
)
res.raise_for_status()
price_data = res.json()[node_id] # results are also organized by node ID
price_data = pd.Series(index=pd.to_datetime(price_data["datetimes"]), data=price_data["prices"])
print(price_data.head())
#returns
2020-01-01 00:00:00 11.27
2020-01-01 01:00:00 9.73
2020-01-01 02:00:00 9.58
2020-01-01 03:00:00 9.65
2020-01-01 04:00:00 10.18
Working with Node Data
The search_nodes()
method provides the ability to search based on location, node
name and ISO. For example, we can look at all the nodes around a particular location:
res = lmp.search_nodes(location="29.760427, -95.369804")
res.raise_for_status() # this will raise an error if our search request was unsuccessful
node_search_df = pd.DataFrame(res.json()['nodes'])
print(node_search_df.head())
#returns
node/lng node/iso node/name node/lat node/id node/distance-meters
0 -95.379870 ERCOT CR_RN 29.776851 10018267019 2061.858094
1 -95.398456 ERCOT BRHEIGHT_RN 29.782491 10016437277 3692.691396
2 -95.399247 ERCOT TC_GTG2_RN 29.703634 10017407316 6909.874096
3 -95.399247 ERCOT TC_PUN1 29.703634 10000698825 6909.874096
4 -95.399247 ERCOT TC_PUN2 29.703634 10000698826 6909.874096
The data in node_search_df
can be used to determine relevant nodes near a project site and generally for
mapping/plotting purposes. However, note that the node data returned is related to the use of the
location
search parameter. To access other node data such as data start and end years, node type etc. we can use the
lmp.get_all_nodes
method
res = lmp.get_all_nodes(iso="ERCOT")
res.raise_for_status()
node_df = pd.DataFrame(res.json())
print(node_df.head())
#returns
da_start_year rt_end_year zone name type substation da_end_year id rt_start_year
0 2011.0 2025 SOUTH OXY_OXY_CCG1 GENERATOR OXYCHEM CC 2025.0 10000698721 2011
1 2022.0 2025 SOUTH SLCN_ESS_ALL GENERATOR NaN 2025.0 10016671726 2022
2 2024.0 2025 SOUTH ANEM_ESS_RN GENERATOR NaN 2025.0 10017318833 2024
3 2011.0 2025 NORTH WHCCS_CC1 GENERATOR WOLF HOLLOW GEN 2025.0 10000698379 2011
4 2011.0 2025 HOUSTON SCLP_PUN4 GENERATOR SWEENY COGEN 2025.0 10000698780 2011
Note that iso
is a keyword argument, so simply typing lmp.get_all_nodes("ERCOT")
won’t
work.
The search_nodes()
and get_all_nodes()
methods can be
used together for more complex node filtering. For example, let’s suppose that we want the 3 closest unique nodes to our
location that also have at least 3 years of real time price data
node_df = node_df.set_index("id") # set the id column as the index
node_search_df = node_search_df.set_index("node/id")
node_df = pd.concat([node_df, node_search_df], axis=1, join="inner") # combine the 2 dfs by id, leaving only the subset in node_search_df
node_df.index.name = "id"
node_df = node_df[node_df["rt_end_year"] - node_df["rt_start_year"] >= 3] # filter for data span
node_df["latlon"] = node_df["node/lng"].astype(str) + node_df["node/lat"].astype(str)
node_df = new_df.reset_index().groupby("latlon").first().set_index("id").sort_values(by="node/distance-meters") # filter for unique lat/lon
print(node_df[["rt_start_year", "rt_end_year", "node/distance-meters"]].head())
#returns
rt_start_year rt_end_year node/distance-meters
id
10016437277 2021 2025 3692.691396
10000698826 2011 2025 6909.874096
10000698744 2011 2024 12913.743104
10016232998 2021 2025 13086.640441
10016437499 2021 2024 14409.202592
Fetching Historical Energy Prices For Multiple Nodes
We can pass multiple node IDs to the lmp.get_prices
method. Continuing the
example from the previous section, let’s suppose that we are interested in real-time energy price data
for the top 3 nodes in node_df
and generate a real time price dataframe organized by node name
node_df = node_df.reset_index()
node_df = node_df.loc[:2, :]
ids_names = dict(zip(node_df["id"], node_df["name"]))
res = lmp.get_prices(
node_ids=node_df["id"].values.tolist(),
market=Market.RT,
start_year=2021,
end_year=2023,
)
res.raise_for_status()
price_data = res.json()
price_df = pd.concat([pd.Series(index=v["datetimes"], data=v["prices"], name=k) for k, v in price_data.items()], axis=1)
price_df = price_df.rename(columns=ids_names)
print(price_df.head())
#returns
BRHEIGHT_RN TC_PUN2 PR_PR_G2
2021-01-01T00:00:00 15.4500 15.4375 15.4125
2021-01-01T01:00:00 14.8725 14.8750 14.8700
2021-01-01T02:00:00 14.5350 14.5375 14.5350
2021-01-01T03:00:00 15.8525 15.8475 15.8350
2021-01-01T04:00:00 17.0200 16.9800 16.9175
Please consult the LMP docs
for more details and the Example Notebooks for more
examples.
Fetch Historical Ancillary Prices for a Single Region
Let’s suppose we’re interested in Day-Ahead Regulation Up prices in the SP15 region in CAISO for the year 2020. For
this task, we would use the ancillary.get_prices
method
from tyba_client.models import AncillaryService
res = ancillary.get_prices(
iso="CAISO",
service=AncillaryService.REGULATION_UP,
market=Market.DA,
region="SP15",
start_year=2020,
end_year=2020,
)
res.raise_for_status()
as_prices = res.json()
as_prices = pd.Series(index=pd.to_datetime(as_prices["datetimes"]).tz_localize(None), data=as_prices["prices"])
print(as_prices.head())
# returns
2020-01-01 00:00:00 12.27
2020-01-01 01:00:00 6.34
2020-01-01 02:00:00 6.49
2020-01-01 03:00:00 10.49
2020-01-01 04:00:00 8.32
Fetch Ancillary Prices for Multiple Regions, Services, and Markets
Unlike the lmp.get_prices
method,
ancillary.get_prices
only pulls a single price time series. As such,
we pull multiple price series by iterating and combining afterwards. For example, let’s fetch Day-Ahead and Real-Time
prices for Regulation Up and Down for both the SP15 and NP15 regions in CAISO in 2022
regions = dict()
for region in ["SP15", "NP15"]:
services = dict()
for service in [AncillaryService.REGULATION_DOWN, AncillaryService.REGULATION_UP]:
market_df = pd.DataFrame()
for market in [Market.DA, Market.RT]:
as_prices = ancillary.get_prices(
iso="CAISO",
service=service,
market=market,
region=region,
start_year=2022,
end_year=2022,
).json()
market_df[market] = pd.Series(index=pd.to_datetime(as_prices["datetimes"]).tz_localize(None),
data=as_prices["prices"]) # df of market prices for each region-service combo
services[service] = market_df
regions[region] = pd.concat(services, axis=1) # df for each region
as_df = pd.concat(regions, axis=1) # df combining regions
print(as_df.head())
# returns
SP15 NP15
Regulation Down Regulation Up Regulation Down Regulation Up
dayahead realtime dayahead realtime dayahead realtime dayahead realtime
2022-01-01 00:00:00 6.35 0.0 1.70 3.30 0.75 0.0 3.00 2.45
2022-01-01 01:00:00 3.00 0.0 1.28 1.50 3.00 0.0 1.88 1.88
2022-01-01 02:00:00 2.66 0.0 1.28 1.70 5.01 0.0 3.42 8.04
2022-01-01 03:00:00 1.44 0.0 1.28 2.56 5.02 0.0 3.42 2.41
2022-01-01 04:00:00 1.44 0.0 1.28 2.00 12.84 0.0 3.42 3.00
Working with Ancillary Region Data
The get_prices()
method requires 5 data points to fetch prices: ISO, region,
service, market, and date range. In the above examples, we knew which pricing regions we were interested in and,
implicitly, that the combinations of region, service, market and year were available in the dataset. In situations
where we don’t know what date ranges or even regions are available, we can use the
ancillary.get_pricing_regions
method to find out. We specify
3 of the input data points (ISO, service and market) and fetch the other 2 (available regions and their respective
date ranges)
res = ancillary.get_pricing_regions(
iso="CAISO",
service=AncillaryService.REGULATION_UP,
market=Market.DA)
res.raise_for_status()
as_regions = pd.DataFrame(res.json())
print(as_regions.head())
# returns
region start_year end_year
0 Pacific Northwest - SP15 2010 2025
1 WAPA 2010 2025
2 WAPA 2010 2025
3 WAPA 2010 2025
4 ZP26 2010 2025
More complex analyses can be performed by combining the results of multiple calls to
get_pricing_regions()
. For example, let’s suppose we want to know the regions and
date ranges that are available in CAISO for real-time and both regulation up and down
import numpy as np
dfs = dict()
for service in [AncillaryService.REGULATION_UP, AncillaryService.REGULATION_DOWN]:
res = ancillary.get_pricing_regions(
iso="CAISO",
service=AncillaryService.REGULATION_UP,
market=Market.RT)
res.raise_for_status()
dfs[service] = pd.DataFrame(res.json()).drop_duplicates() # currently duplicates returned in results
as_regions = dfs[AncillaryService.REGULATION_UP].merge(dfs[AncillaryService.REGULATION_DOWN], on="region", suffixes=["_reg_up", "_reg_down"])
as_regions["min_start_year"] = np.maximum(as_regions["start_year_reg_up"], as_regions["start_year_reg_down"])
as_regions["max_end_year"] = np.minimum(as_regions["end_year_reg_up"], as_regions["end_year_reg_down"])
print(as_regions[["region", "min_start_year", "max_end_year"]].head())
# returns
region min_start_year max_end_year
0 Arizona 2011 2025
1 LADWP 2011 2025
2 Mexico 2011 2025
3 Nevada - NP26 2011 2025
4 Nevada - SP15 2011 2025
Please consult the Ancillary docs
for more details and the
Example Notebooks below for more examples.