Walkthroughs
This page is a collection of walkthroughs that distinguish the usages of API subpackages vs feature subpackages, and a collection of workflows for aggregating data for a subset of companies, industries, and economic data series (depending on the subpackage being used).
Although these walkthroughs will provide you with enough information to use finagg to aggregate data from the implemented APIs, it’s recommended you still explore the respective official API documentations to get a good understanding of how each API is organized and what data is provided through each API.
These walkthroughs assume your environment contains the necessary environment variables to enable using the respective APIs. See the configuration docs for more info on configuring your environment.
Using the FRED API Subpackages
Searching for economic data
Let’s say we’re interested in finding economic data series corresponding
to various US treasuries. The FRED series search API is a good place to start.
The FRED series search API allows us to search through economic data series
that’re provided by the FRED series observations API (the main API used for
retrieving economic data). We can search for economic data series using search
terms/words with finagg.fred.api.Series.search
(the FRED series
search API implementation):
>>> finagg.fred.api.series.search.get(
... "treasury yield",
... order_by="popularity"
... ).head(5)
id ... title observation_start ...
0 T10Y2Y ... 10-Year Treasury Constant Maturity Minus 2-Yea... 1976-06-01 ...
1 BAMLH0A0HYM2 ... ICE BofA US High Yield Index Option-Adjusted S... 1996-12-31 ...
2 T10Y3M ... 10-Year Treasury Constant Maturity Minus 3-Mon... 1982-01-04 ...
3 DGS10 ... Market Yield on U.S. Treasury Securities at 10... 1962-01-02 ...
4 DFII10 ... Market Yield on U.S. Treasury Securities at 10... 2003-01-02 ...
Notice we order by popularity using the order_by
arg to help filter through
irrelevant economic data series. The returned dataframe provides
key info for further exploring and aggregating the economic data series:
the
id
column indicates the economic data series ID corresponding to each returned series (IDs are args for other FRED series API methods)the
title
column indicates the series name (useful for verifying the series are relevant)the
observation_start
column indicates the oldest date with data for each series (we may want to ignore series that don’t have as much history)
Retrieving economic data
Let’s assume we’re only interested in the economic data series corresponding
to economic series ID "DGS10"
under the id
column. We can get the
economic data series observations through the FRED series observations API
(implemented by finagg.fred.api.Series.observations
):
>>> finagg.fred.api.series.observations.get(
... "DGS10",
... output_type=1
... ).head(5)
realtime_start realtime_end date value series_id
0 2023-04-01 2023-04-01 1962-01-02 4.06 DGS10
1 2023-04-01 2023-04-01 1962-01-03 4.03 DGS10
2 2023-04-01 2023-04-01 1962-01-04 3.99 DGS10
3 2023-04-01 2023-04-01 1962-01-05 4.02 DGS10
4 2023-04-01 2023-04-01 1962-01-08 4.03 DGS10
It should be noted some trial-and-error can be required to retrieve data using
the FRED series observations API. Most FRED API methods have realtime_start
,
realtime_end
, and output_type
parameters that control the reporting
period for the returned data. The FRED API docs have good explanations and examples
that clarify the use cases for these parameters. For most FRED API use cases,
these parameters are usually set to 0
, -1
, and 4
, respectively.
These values (when used together) effectively retrieve economic data series
observations as they first occur for their entire history. That is, the
returned values will be the “truth” values from the perspective of economists
and analysts at the time the values were originally published. This is usally
what most people want since we typically want to build models and/or strategies
based on currently known values rather than future values.
However, the "DGS10"
economic data series does not support retrieving
initial release observations. To successfully retrieve data for the "DGS10"
series (and most treasury yield series), the realtime_start
and
realtime_end
parameters must be today’s date (the default) and the
output_type
parameter must be 1
(also the default).
Installing economic data
So now we have an economic data series we’re interested in and can retrieve observation values for. We can download and store (or install) the economic data series observation values in our own SQL table to reduce the number of requests to the FRED API (so we don’t get throttled), and to get a slight speed boost when performing offline analysis (it’ll be slightly faster to get data that’s on our own machine rather than some server).
Unfortunately, to avoid the complexity of reimplementing the entire FRED API
when retrieving data from local SQL tables, finagg’s installation methods
only support economic data series that have initial release observation data
available. Searching for other treasury yield series reveals economic data
series with similar names and values that support initial releases (e.g.,
"GS10"
); we’ll need to use these series in-place of our previously found
series to use finagg’s installation methods.
Installing a treasury yield economic data series is extremely straightforward
with the finagg.fred.feat
subpackage and finagg.fred.feat.series
member. We can also verify the series is installed correctly using the
finagg.fred.feat.Series.get_id_set()
method.
>>> finagg.fred.feat.series.install({"GS10"})
>>> id_set = finagg.fred.feat.series.get_id_set()
>>> "GS10" in id_set
True
We can then retrieve the original, raw economic data series we installed using
the finagg.fred.feat.Series.from_raw()
method.
>>> finagg.fred.feat.series.from_raw("GS10").head(5)
value
date
1996-12-01 6.30
1997-01-01 6.58
1997-02-01 6.42
1997-03-01 6.69
1997-04-01 6.89
All of these steps aren’t exactly obvious when using finagg for the first time.
However, finagg’s purpose is to streamline popular financial data
aggregation, so obviously there are some shortcuts when it comes to popular
economic data series. That’s where finagg.fred.feat.economic
comes
in for the FRED API subpackage. finagg.fred.feat.economic
assumes
a fixed set of popular economic series IDs that also support initial release
observations for all its methods. For example, the
finagg.fred.feat.Economic.install()
method doesn’t allow
specification of economic data series IDs.
finagg.fred.feat.Economic.install()
will, by default, only
install a handful of economic data series.
It’s important to note that once an economic data series is supported by
finagg.fred.feat.economic
, it will never be removed. However,
additional economic data series may be added as a default depending on popularity.
This restriction comes with the benefit of simplifying our download-then-retrieve
workflow. Repeating the download-then-retrieve workflow we used for the treasury
FRED economic data series but with finagg.fred.feat.economic
looks
like the following:
>>> finagg.fred.feat.economic.install()
>>> finagg.fred.feat.economic.from_refined()["FEDFUNDS"].head(5)
date
2014-10-06 0.09
2014-10-08 0.09
2014-10-13 0.09
2014-10-15 0.09
2014-10-20 0.09
Name: FEDFUNDS, dtype: float64
Lastly, it’s useful to mention that any download/installation step in the common download-then-retrieve workflow for raw or refined data with finagg’s Python interface can probably be replicated using finagg’s CLI. For example, the following:
>>> finagg.fred.feat.series.install({"GS10"})
is equivalent to:
finagg fred install -r series -sid GS10
Similarly, the following:
>>> finagg.fred.feat.series.install()
>>> finagg.fred.feat.economic.install()
is equivalent to:
finagg fred install --raw series --refined economic -ss economic
Using the SEC API Subpackages
Retrieving company financial data
Let’s say we’re interested in a specific company. The SEC EDGAR API is a good
place to start accessing a company’s financials. However, not all companies
have all their financial data accessible through the SEC EDGAR API. The best
way to start out and see what financials are available for a particular
company is to look at a company’s facts through
finagg.sec.api.company_facts
.
Let’s assume we’re interested in Microsoft. We can access all the financial publications associated with Microsoft by simply passing Microsoft’s ticker, MSFT, to the company facts API implementation. We can look at the columns to get a good understanding of the API implementation and the returned dataframe:
>>> df = finagg.sec.api.company_facts.get(ticker="MSFT")
>>> df.columns.tolist()
['end', 'value', ..., 'fy', 'fp', 'form', 'filed', ..., 'tag', ..., 'units', ...]
The main columns that most use cases care about are:
fy
,fp
, andfiled
; these are the fiscal year, fiscal period (i.e., quarter), and filing date, respectively for each row of financial dataform
is the type of SEC form the row was submitted with (e.g., 10-Q, 10-K, etc.)tag
is the ID of the financial (e.g.,"EarningsPerShareBasic"
)value
is the financial’s actual valueunits
is the financial’s unit (e.g., USD/shares)
The company’s financials can be further filtered from the company facts
dataframe directly, or a specific financial can be accessed with the
finagg.sec.api.company_concept
API implementation. For example,
we can access all of Microsoft’s earnings per share financial publications
with the following:
>>> df = finagg.sec.api.company_concept.get(
... "EarningsPerShareBasic",
... ticker="MSFT",
... units="USD/shares"
... )
>>> df.head(5)
start end value accn fy fp form filed ...
0 2007-07-01 2007-09-30 0.46 0001193125-10-171791 2010 FY 10-K 2010-07-30 ...
1 2007-10-01 2007-12-31 0.50 0001193125-10-171791 2010 FY 10-K 2010-07-30 ...
2 2008-01-01 2008-03-31 0.47 0001193125-10-171791 2010 FY 10-K 2010-07-30 ...
3 2007-07-01 2008-06-30 1.90 0001193125-10-171791 2010 FY 10-K 2010-07-30 ...
4 2008-04-01 2008-06-30 0.46 0001193125-10-171791 2010 FY 10-K 2010-07-30 ...
Filtering company financial data
However, the SEC EDGAR company concept API implementation returns all the
earnings per share publications for Microsoft, including amendments. We may
not necessarily care about amendments because we may be building strategies
or models that use current data and not future data. Fortunately, finagg
provides finagg.sec.api.filter_original_filings()
to further select
original financial publication data from specific forms:
>>> finagg.sec.api.filter_original_filings(df, form="10-Q").head(5)
fy fp tag start end value ...
0 2010 Q1 EarningsPerShareBasic 2008-07-01 2008-09-30 0.48 ...
1 2010 Q2 EarningsPerShareBasic 2008-07-01 2008-12-31 0.95 ...
2 2010 Q3 EarningsPerShareBasic 2008-07-01 2009-03-31 1.29 ...
3 2011 Q1 EarningsPerShareBasic 2009-07-01 2009-09-30 0.40 ...
4 2011 Q2 EarningsPerShareBasic 2009-07-01 2009-12-31 1.15 ...
Unfortunately, the SEC EDGAR API is still relatively new and a lot of the financial data publications are unaudited, so not all financials are available for all companies through the SEC EDGAR API. I.e., a workflow for retrieving Microsoft’s financial data may not work for retrieving another company’s financial data.
It requires some trial-and-error to find a set of tags that
are popular and available for the majority of companies. In addition, the
workflow for exploring these tags and filtering forms can be cumbersome.
However, finagg provides finagg.sec.api.popular_concepts
for listing
company concepts (combinations of financial data tags and other parameters)
that’re popular and widely available for companies.
Installing company financial data
It’s extremely straightforward to filter and install widely popular and available
quarterly financial data for a set of companies using the finagg.sec.feat
subpackage and finagg.sec.feat.quarterly
member. The
finagg.sec.feat.quarterly
member also goes a step further by somewhat
normalizing the installed financial data (e.g., total asset value is converted
to percent change of total asset value on a quarter-over-quarter basis), making
the process for aggregating company financial data and comparing company
financial data painless.
We can give this streamlined process a try with Microsoft again, and we can
verify Microsoft’s financial data is successfully installed using the
finagg.sec.feat.Quarterly.get_ticker_set()
method.
>>> finagg.sec.feat.quarterly.install({"MSFT"})
>>> ticker_set = finagg.sec.feat.quarterly.get_ticker_set()
>>> "MSFT" in ticker_set
True
We can then retrieve Microsoft’s quarterly financial data using the
finagg.sec.feat.Quarterly.from_refined()
method.
>>> finagg.sec.feat.quarterly.from_refined("MSFT").head(5)
LOG_CHANGE(Assets) LOG_CHANGE(AssetsCurrent) ...
fy fp filed ...
2010 Q1 2010-01-25 0.182629 -0.023676 ...
Q2 2010-04-21 0.000000 0.000000 ...
Q3 2010-07-21 0.000000 0.000000 ...
2011 Q1 2011-01-19 0.459174 0.278241 ...
Q2 2011-04-21 0.000000 0.000000 ...
On top of this simplification, finagg.sec.feat.quarterly
provides
another method and convenience for normalizing quarterly financial data.
finagg.sec.feat.Quarterly.normalized
normalizes
quarterly financial data using quarterly financial data from all the other
companies within the target company’s industry. For example, Lowe’s’
financial data would be used to normalize Home Depot’s financial data such
that all columns have zero mean and unit variance.
finagg.sec.feat.Quarterly.normalized
also has
similar workflow to finagg.sec.feat.quarterly
.
>>> finagg.sec.feat.quarterly.normalized.install({"MSFT"})
>>> finagg.sec.feat.quarterly.normalized.from_refined("MSFT").head(5)
NORM(LOG_CHANGE(Assets)) NORM(LOG_CHANGE(AssetsCurrent)) ...
fy fp filed ...
2010 Q2 2010-04-21 0.000000 0.000000 ...
Q3 2010-07-21 0.000000 0.000000 ...
2011 Q1 2011-01-19 0.978816 0.074032 ...
Q2 2011-04-21 0.000000 0.000000 ...
Q3 2011-07-20 -0.353553 -0.353553 ...
Lastly, it’s useful to mention that any download/installation step in the common download-then-retrieve workflow for financial data with finagg’s Python interface can probably be replicated using finagg’s CLI. For example, the following:
>>> finagg.sec.feat.quarterly.install({"MSFT"})
is equivalent to:
finagg sec install --raw submissions --raw tags --refined quarterly -t MSFT