Walkthroughs
============

This page is a collection of walkthroughs that distinguish the usages of
API subpackages vs feature subpackages, and a collection of workflows for
aggregating data for a subset of companies, industries, and economic data
series (depending on the subpackage being used).

Although these walkthroughs will provide you with enough information to use
**finagg** to aggregate data from the implemented APIs, it's recommended
you still explore the respective official API documentations to get a good
understanding of how each API is organized and what data is provided through
each API.

These walkthroughs assume your environment contains the necessary environment
variables to enable using the respective APIs. See the
:doc:`configuration docs <configuration>` for more info on configuring your
environment.

Using the FRED API Subpackages
------------------------------

Searching for economic data
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's say we're interested in finding economic data series corresponding
to various US treasuries. The FRED series search API is a good place to start.
The FRED series search API allows us to search through economic data series
that're provided by the FRED series observations API (the main API used for
retrieving economic data). We can search for economic data series using search
terms/words with :attr:`finagg.fred.api.Series.search` (the FRED series
search API implementation):

>>> finagg.fred.api.series.search.get(
...   "treasury yield",
...   order_by="popularity"
... ).head(5)  # doctest: +SKIP
             id ...                                             title observation_start ...
0        T10Y2Y ... 10-Year Treasury Constant Maturity Minus 2-Yea...        1976-06-01 ...
1  BAMLH0A0HYM2 ... ICE BofA US High Yield Index Option-Adjusted S...        1996-12-31 ...
2        T10Y3M ... 10-Year Treasury Constant Maturity Minus 3-Mon...        1982-01-04 ...
3         DGS10 ... Market Yield on U.S. Treasury Securities at 10...        1962-01-02 ...
4        DFII10 ... Market Yield on U.S. Treasury Securities at 10...        2003-01-02 ...

Notice we order by popularity using the ``order_by`` arg to help filter through
irrelevant economic data series. The returned dataframe provides
key info for further exploring and aggregating the economic data series:

* the ``id`` column indicates the economic data series ID corresponding to
  each returned series (IDs are args for other FRED series API methods)
* the ``title`` column indicates the series name (useful for verifying the
  series are relevant)
* the ``observation_start`` column indicates the oldest date with data for
  each series (we may want to ignore series that don't have as much history)

Retrieving economic data
^^^^^^^^^^^^^^^^^^^^^^^^

Let's assume we're only interested in the economic data series corresponding
to economic series ID ``"DGS10"`` under the ``id`` column. We can get the
economic data series observations through the FRED series observations API
(implemented by :attr:`finagg.fred.api.Series.observations`):

>>> finagg.fred.api.series.observations.get(
...     "DGS10",
...     output_type=1
... ).head(5)  # doctest: +SKIP
  realtime_start realtime_end        date value series_id
0     2023-04-01   2023-04-01  1962-01-02  4.06     DGS10
1     2023-04-01   2023-04-01  1962-01-03  4.03     DGS10
2     2023-04-01   2023-04-01  1962-01-04  3.99     DGS10
3     2023-04-01   2023-04-01  1962-01-05  4.02     DGS10
4     2023-04-01   2023-04-01  1962-01-08  4.03     DGS10

It should be noted some trial-and-error can be required to retrieve data using
the FRED series observations API. Most FRED API methods have ``realtime_start``,
``realtime_end``, and ``output_type`` parameters that control the reporting
period for the returned data. The `FRED API docs have good explanations and examples`_
that clarify the use cases for these parameters. For most FRED API use cases,
these parameters are usually set to ``0``, ``-1``, and ``4``, respectively.
These values (when used together) effectively retrieve economic data series
observations as they first occur for their entire history. That is, the
returned values will be the "truth" values from the perspective of economists
and analysts at the time the values were originally published. This is usally
what most people want since we typically want to build models and/or strategies
based on *currently known* values rather than *future* values.

However, the ``"DGS10"`` economic data series does not support retrieving
initial release observations. To successfully retrieve data for the ``"DGS10"``
series (and most treasury yield series), the ``realtime_start`` and
``realtime_end`` parameters must be today's date (the default) and the
``output_type`` parameter must be ``1`` (also the default).

Installing economic data
^^^^^^^^^^^^^^^^^^^^^^^^

So now we have an economic data series we're interested in and can retrieve
observation values for. We can download and store (or install) the economic
data series observation values in our own SQL table to reduce the number of
requests to the FRED API (so we don't get throttled), and to get a slight
speed boost when performing offline analysis (it'll be slightly faster to
get data that's on our own machine rather than some server).

Unfortunately, to avoid the complexity of reimplementing the entire FRED API
when retrieving data from local SQL tables, **finagg**'s installation methods
only support economic data series that have initial release observation data
available. Searching for other treasury yield series reveals economic data
series with similar names and values that support initial releases (e.g.,
``"GS10"``); we'll need to use these series in-place of our previously found
series to use **finagg**'s installation methods.

Installing a treasury yield economic data series is extremely straightforward
with the :mod:`finagg.fred.feat` subpackage and :data:`finagg.fred.feat.series`
member. We can also verify the series is installed correctly using the
:meth:`finagg.fred.feat.Series.get_id_set` method.

>>> finagg.fred.feat.series.install({"GS10"})  # doctest: +SKIP
>>> id_set = finagg.fred.feat.series.get_id_set()
>>> "GS10" in id_set
True

We can then retrieve the original, raw economic data series we installed using
the :meth:`finagg.fred.feat.Series.from_raw` method.

>>> finagg.fred.feat.series.from_raw("GS10").head(5)  # doctest: +NORMALIZE_WHITESPACE
            value
date
1996-12-01   6.30
1997-01-01   6.58
1997-02-01   6.42
1997-03-01   6.69
1997-04-01   6.89

All of these steps aren't exactly obvious when using **finagg** for the first time.
However, **finagg**'s purpose is to streamline popular financial data
aggregation, so obviously there are some shortcuts when it comes to popular
economic data series. That's where :data:`finagg.fred.feat.economic` comes
in for the FRED API subpackage. :data:`finagg.fred.feat.economic` assumes
a fixed set of popular economic series IDs that also support initial release
observations for all its methods. For example, the
:meth:`finagg.fred.feat.Economic.install` method doesn't allow
specification of economic data series IDs.
:meth:`finagg.fred.feat.Economic.install` will, by default, only
install a handful of economic data series.

It's important to note that once an economic data series is supported by
:data:`finagg.fred.feat.economic`, it will never be removed. However,
additional economic data series may be added as a default depending on popularity.

This restriction comes with the benefit of simplifying our download-then-retrieve
workflow. Repeating the download-then-retrieve workflow we used for the treasury
FRED economic data series but with :data:`finagg.fred.feat.economic` looks
like the following:

>>> finagg.fred.feat.economic.install()  # doctest: +SKIP
>>> finagg.fred.feat.economic.from_refined()["FEDFUNDS"].head(5)  # doctest: +SKIP
date
2014-10-06    0.09
2014-10-08    0.09
2014-10-13    0.09
2014-10-15    0.09
2014-10-20    0.09
Name: FEDFUNDS, dtype: float64

Lastly, it's useful to mention that any download/installation step in the
common download-then-retrieve workflow for raw or refined data
with **finagg**'s Python interface can probably be replicated using
**finagg**'s CLI. For example, the following:

>>> finagg.fred.feat.series.install({"GS10"})  # doctest: +SKIP

is equivalent to:

.. code:: console

    finagg fred install -r series -sid GS10

Similarly, the following:

>>> finagg.fred.feat.series.install()  # doctest: +SKIP
>>> finagg.fred.feat.economic.install()  # doctest: +SKIP

is equivalent to:

.. code:: console

    finagg fred install --raw series --refined economic -ss economic

Using the SEC API Subpackages
-----------------------------

Retrieving company financial data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's say we're interested in a specific company. The SEC EDGAR API is a good
place to start accessing a company's financials. However, not all companies
have all their financial data accessible through the SEC EDGAR API. The best
way to start out and see what financials are available for a particular
company is to look at a company's facts through
:data:`finagg.sec.api.company_facts`.

Let's assume we're interested in Microsoft. We can access all the financial
publications associated with Microsoft by simply passing Microsoft's ticker,
MSFT, to the company facts API implementation. We can look at the columns
to get a good understanding of the API implementation and the returned
dataframe:

>>> df = finagg.sec.api.company_facts.get(ticker="MSFT")
>>> df.columns.tolist()  # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
['end', 'value', ..., 'fy', 'fp', 'form', 'filed', ..., 'tag', ..., 'units', ...]

The main columns that most use cases care about are:

* ``fy``, ``fp``, and ``filed``; these are the fiscal year, fiscal period
  (i.e., quarter), and filing date, respectively for each row of financial data
* ``form`` is the type of SEC form the row was submitted with (e.g., 10-Q,
  10-K, etc.)
* ``tag`` is the ID of the financial (e.g., ``"EarningsPerShareBasic"``)
* ``value`` is the financial's actual value
* ``units`` is the financial's unit (e.g., USD/shares)

The company's financials can be further filtered from the company facts
dataframe directly, or a specific financial can be accessed with the
:data:`finagg.sec.api.company_concept` API implementation. For example,
we can access all of Microsoft's earnings per share financial publications
with the following:

>>> df = finagg.sec.api.company_concept.get(
...     "EarningsPerShareBasic",
...     ticker="MSFT",
...     units="USD/shares"
... )
>>> df.head(5)  # doctest: +SKIP
        start         end  value                  accn    fy  fp  form       filed ...
0  2007-07-01  2007-09-30   0.46  0001193125-10-171791  2010  FY  10-K  2010-07-30 ...
1  2007-10-01  2007-12-31   0.50  0001193125-10-171791  2010  FY  10-K  2010-07-30 ...
2  2008-01-01  2008-03-31   0.47  0001193125-10-171791  2010  FY  10-K  2010-07-30 ...
3  2007-07-01  2008-06-30   1.90  0001193125-10-171791  2010  FY  10-K  2010-07-30 ...
4  2008-04-01  2008-06-30   0.46  0001193125-10-171791  2010  FY  10-K  2010-07-30 ...

Filtering company financial data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, the SEC EDGAR company concept API implementation returns all the
earnings per share publications for Microsoft, including amendments. We may
not necessarily care about amendments because we may be building strategies
or models that use *current* data and not *future* data. Fortunately, **finagg**
provides :meth:`finagg.sec.api.filter_original_filings` to further select
original financial publication data from specific forms:

>>> finagg.sec.api.filter_original_filings(df, form="10-Q").head(5)  # doctest: +SKIP
     fy  fp                    tag       start         end  value ...
0  2010  Q1  EarningsPerShareBasic  2008-07-01  2008-09-30   0.48 ...
1  2010  Q2  EarningsPerShareBasic  2008-07-01  2008-12-31   0.95 ...
2  2010  Q3  EarningsPerShareBasic  2008-07-01  2009-03-31   1.29 ...
3  2011  Q1  EarningsPerShareBasic  2009-07-01  2009-09-30   0.40 ...
4  2011  Q2  EarningsPerShareBasic  2009-07-01  2009-12-31   1.15 ...

Unfortunately, the SEC EDGAR API is still relatively new and a lot of the
financial data publications are unaudited, so not all financials are available
for all companies through the SEC EDGAR API. I.e., a workflow for retrieving
Microsoft's financial data may not work for retrieving another company's
financial data.

It requires some trial-and-error to find a set of tags that
are popular and available for the majority of companies. In addition, the
workflow for exploring these tags and filtering forms can be cumbersome.
However, **finagg** provides :data:`finagg.sec.api.popular_concepts` for listing
company concepts (combinations of financial data tags and other parameters)
that're popular and widely available for companies.

Installing company financial data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's extremely straightforward to filter and install widely popular and available
quarterly financial data for a set of companies using the :mod:`finagg.sec.feat`
subpackage and :data:`finagg.sec.feat.quarterly` member. The
:data:`finagg.sec.feat.quarterly` member also goes a step further by somewhat
normalizing the installed financial data (e.g., total asset value is converted
to percent change of total asset value on a quarter-over-quarter basis), making
the process for aggregating company financial data and comparing company
financial data painless.

We can give this streamlined process a try with Microsoft again, and we can
verify Microsoft's financial data is successfully installed using the
:meth:`finagg.sec.feat.Quarterly.get_ticker_set` method.

>>> finagg.sec.feat.quarterly.install({"MSFT"})  # doctest: +SKIP
>>> ticker_set = finagg.sec.feat.quarterly.get_ticker_set()
>>> "MSFT" in ticker_set
True

We can then retrieve Microsoft's quarterly financial data using the
:meth:`finagg.sec.feat.Quarterly.from_refined` method.

>>> finagg.sec.feat.quarterly.from_refined("MSFT").head(5)  # doctest: +SKIP
                    LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent) ...
fy   fp filed                                                     ...
2010 Q1 2010-01-25            0.182629                  -0.023676 ...
     Q2 2010-04-21            0.000000                   0.000000 ...
     Q3 2010-07-21            0.000000                   0.000000 ...
2011 Q1 2011-01-19            0.459174                   0.278241 ...
     Q2 2011-04-21            0.000000                   0.000000 ...

On top of this simplification, :data:`finagg.sec.feat.quarterly` provides
another method and convenience for normalizing quarterly financial data.
:attr:`finagg.sec.feat.Quarterly.normalized` normalizes
quarterly financial data using quarterly financial data from all the other
companies within the target company's industry. For example, Lowe's'
financial data would be used to normalize Home Depot's financial data such
that all columns have zero mean and unit variance.
:attr:`finagg.sec.feat.Quarterly.normalized` also has
similar workflow to :data:`finagg.sec.feat.quarterly`.

>>> finagg.sec.feat.quarterly.normalized.install({"MSFT"})  # doctest: +SKIP
>>> finagg.sec.feat.quarterly.normalized.from_refined("MSFT").head(5)  # doctest: +SKIP
                    NORM(LOG_CHANGE(Assets))  NORM(LOG_CHANGE(AssetsCurrent)) ...
fy   fp filed                                                                 ...
2010 Q2 2010-04-21                  0.000000                         0.000000 ...
     Q3 2010-07-21                  0.000000                         0.000000 ...
2011 Q1 2011-01-19                  0.978816                         0.074032 ...
     Q2 2011-04-21                  0.000000                         0.000000 ...
     Q3 2011-07-20                 -0.353553                        -0.353553 ...

Lastly, it's useful to mention that any download/installation step in the
common download-then-retrieve workflow for financial data with **finagg**'s
Python interface can probably be replicated using **finagg**'s CLI. For
example, the following:

>>> finagg.sec.feat.quarterly.install({"MSFT"})  # doctest: +SKIP

is equivalent to:

.. code:: console

    finagg sec install --raw submissions --raw tags --refined quarterly -t MSFT

.. _`FRED API docs have good explanations and examples`: https://fred.stlouisfed.org/docs/api/fred/