finagg.sec.feat package

Module contents

Features from SEC sources.

finagg.sec.feat.annual

The most popular way for accessing finagg.sec.feat.Annual.

finagg.sec.feat.quarterly

The most popular way for accessing finagg.sec.feat.Quarterly.

finagg.sec.feat.submissions

The most popular way for accessing finagg.sec.feat.Submissions.

finagg.sec.feat.tags

The most popular way for accessing finagg.sec.feat.Tags.

class finagg.sec.feat.Annual[source]

Bases: object

Methods for gathering annual features from SEC EDGAR data.

The module variable finagg.sec.feat.annual is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

It doesn’t matter which data source you use to gather features. They all return equivalent dataframes.

>>> df1 = finagg.sec.feat.annual.from_api("AAPL").head(5)
>>> df2 = finagg.sec.feat.annual.from_raw("AAPL").head(5)
>>> df3 = finagg.sec.feat.annual.from_refined("AAPL").head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
>>> pd.testing.assert_frame_equal(df1, df3, rtol=1e-4)
industry

Annual features aggregated for an entire industry. The most popular way for accessing the finagg.sec.feat.IndustryAnnual feature set.

normalized

A company’s annual features normalized by its industry. The most popular way for accessing the finagg.sec.feat.NormalizedAnnual feature set.

classmethod from_api(ticker: str, /, *, start: None | str = None, end: None | str = None) DataFrame[source]

Get annual features directly from the SEC API.

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

Returns:

Annual data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.annual.from_api("AAPL").head(5)  
                 LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   filed                                                                                               ...
2010 2010-10-27           -0.089864                  -0.023676                                  0.012840 ...
2011 2011-10-26            0.272493                   0.278241                                  0.017805 ...
2012 2012-10-31            0.896033                   0.076422                                  0.014423 ...
2013 2013-10-30            0.414064                   0.248046                                  0.010630 ...
2014 2014-10-27            0.161871                   0.239927                                  1.902394 ...
classmethod from_raw(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get annual features from a local SEC SQL table.

Not all data series are published at the same rate or time. Missing rows for less-frequent annual publications are forward filled.

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Annual data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.annual.from_raw("AAPL").head(5)  
                 LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   filed                                                                                               ...
2010 2010-10-27           -0.089864                  -0.023676                                  0.012840 ...
2011 2011-10-26            0.272493                   0.278241                                  0.017805 ...
2012 2012-10-31            0.896033                   0.076422                                  0.014423 ...
2013 2013-10-30            0.414064                   0.248046                                  0.010630 ...
2014 2014-10-27            0.161871                   0.239927                                  1.902394 ...
classmethod from_refined(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from the refined SQL table.

This is the preferred method for accessing features for offline analysis (assuming data in the local SQL table is current).

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Annual data dataframe with each tag as a separate column. Sorted by filing date.

Raises:

NoResultFound – If there are no rows for ticker in the refined SQL table.

Examples

>>> finagg.sec.feat.annual.from_refined("AAPL").head(5)  
                 LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   filed                                                                                               ...
2010 2010-10-27           -0.089864                  -0.023676                                  0.012840 ...
2011 2011-10-26            0.272493                   0.278241                                  0.017805 ...
2012 2012-10-31            0.896033                   0.076422                                  0.014423 ...
2013 2013-10-30            0.414064                   0.248046                                  0.010630 ...
2014 2014-10-27            0.161871                   0.239927                                  1.902394 ...
classmethod get_candidate_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the raw SQL table that MAY BE ELIGIBLE to be in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that may be valid for creating annual features that also have at least lb rows for each tag used for constructing the features.

Examples

>>> "AAPL" in finagg.sec.feat.annual.get_candidate_ticker_set()  
True
classmethod get_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that contain all the columns for creating annual features that also have at least lb rows.

Examples

>>> "AAPL" in finagg.sec.feat.annual.get_ticker_set()  
True
classmethod install(tickers: None | set[str] = None, *, processes: int = 3, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the raw SQL tables, transforming them into annual features, and then writing to the refined annual SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the candidate tickers from Annual.get_candidate_ticker_set().

  • processes – Number of background processes to run in parallel when processing tickers.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_refined(ticker: str, df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the given dataframe to the refined feature table while using the ticker ticker.

Parameters:
  • ticker – Company ticker.

  • df – Dataframe to store as rows in a local SQL table

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.

class finagg.sec.feat.IndustryAnnual[source]

Bases: object

Methods for gathering industry-averaged annual data from SEC features.

The class variable finagg.sec.feat.Annual.industry is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

You can aggregate this feature set using a ticker or an industry code directly.

>>> df1 = finagg.sec.feat.annual.industry.from_refined(ticker="MSFT").head(5)
>>> df2 = finagg.sec.feat.annual.industry.from_refined(code=73).head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
classmethod from_refined(*, ticker: None | str = None, code: None | str = None, level: Literal[2, 3, 4] = 2, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get annual features from the feature store, aggregated for an entire industry.

The industry can be chosen according to a company or by an industry code directly. If a company is provided, then the first level digits of the company’s SIC code is used for the industry code.

Parameters:
  • ticker – Company ticker. Lookup the industry associated with this company. Mutually exclusive with code.

  • code – Industry SIC code to use for industry lookup. Mutually exclusive with ticker.

  • level

    Industry level to aggregate features at. The industry used according to ticker or code is subsampled according to this value. Options include:

    • 2 = major group (e.g., furniture and fixtures)

    • 3 = industry group (e.g., office furnitures)

    • 4 = industry (e.g., wood office furniture)

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Annual data dataframe with each tag as a separate column. Sorted by filing date.

Raises:
  • ValueError – If neither a ticker nor code are provided.

  • NoResultFound – If there are no rows for ticker or code in the refined SQL table.

Examples

>>> df = finagg.sec.feat.annual.industry.from_refined(ticker="AAPL").head(5)
>>> df["mean"]  
name             AssetCoverageRatio  DebtEquityRatio  EarningsPerShareBasic ...
fy   filed                                                                  ...
2010 2011-02-24            1.577924         1.112079               3.630000 ...
2011 2012-02-24            1.998633         0.886148               1.851429 ...
2012 2013-02-26            1.997748         1.413677               3.750833 ...
2013 2014-03-03            2.232804         1.176639               4.475000 ...
2014 2015-03-17            2.019007         1.080055               2.742143 ...
>>> df["std"]  
name             AssetCoverageRatio  DebtEquityRatio  EarningsPerShareBasic ...
fy   filed                                                                  ...
2010 2011-02-24            0.320778         0.149843               3.436292 ...
2011 2012-02-24            1.371792         0.331815               3.837792 ...
2012 2013-02-26            1.480323         1.085814               3.975254 ...
2013 2014-03-03            1.787131         0.575039               7.236695 ...
2014 2015-03-17            1.303284         0.458704               3.015323 ...
class finagg.sec.feat.NormalizedAnnual[source]

Bases: object

Annual features from SEC EDGAR data normalized according to industry averages and standard deviations.

The class variable finagg.sec.feat.Annual.normalized is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

It doesn’t matter which data source you use to gather features. They both return equivalent dataframes.

>>> df1 = finagg.sec.feat.annual.normalized.from_other_refined("AAPL").head(5)
>>> df2 = finagg.sec.feat.annual.normalized.from_refined("AAPL").head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
classmethod from_other_refined(ticker: str, /, *, level: Literal[2, 3, 4] = 2, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from other feature SQL tables.

Parameters:
  • ticker – Company ticker.

  • level

    Industry level to aggregate relative features at. The industry used according to ticker is subsampled according to this value. Options include:

    • 2 = major group (e.g., furniture and fixtures)

    • 3 = industry group (e.g., office furnitures)

    • 4 = industry (e.g., wood office furniture)

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Relative annual data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.annual.normalized.from_other_refined("AAPL").head(5)  
                 NORM(LOG_CHANGE(Assets))  NORM(LOG_CHANGE(AssetsCurrent))  NORM(LOG_CHANGE(CommonStockSharesOutstanding)) ...
fy   filed                                                                                                                 ...
2010 2010-10-27                 -0.707107                         0.707107                                        0.707107 ...
2011 2011-10-26                  0.549435                         0.485644                                        0.574215 ...
2012 2012-10-31                  2.160612                        -0.609935                                        0.802452 ...
2013 2013-10-30                  0.756833                         0.336215                                       -0.148804 ...
2014 2014-10-27                  0.244212                         1.384738                                        2.836717 ...
classmethod from_refined(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from the features SQL table.

This is the preferred method for accessing features for offline analysis (assuming data in the local SQL table is current).

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Annual data dataframe with each tag as a separate column. Sorted by filing date.

Raises:

NoResultFound – If there are no rows for ticker in the refined SQL table.

Examples

>>> finagg.sec.feat.annual.normalized.from_refined("AAPL").head(5)  
                 NORM(LOG_CHANGE(Assets))  NORM(LOG_CHANGE(AssetsCurrent))  NORM(LOG_CHANGE(CommonStockSharesOutstanding)) ...
fy   filed                                                                                                                 ...
2010 2010-10-27                 -0.707107                         0.707107                                        0.707107 ...
2011 2011-10-26                  0.549435                         0.485644                                        0.574215 ...
2012 2012-10-31                  2.160612                        -0.609935                                        0.802452 ...
2013 2013-10-30                  0.756833                         0.336215                                       -0.148804 ...
2014 2014-10-27                  0.244212                         1.384738                                        2.836717 ...
classmethod get_candidate_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the annual SQL table that MAY BE ELIGIBLE to be in the feature’s SQL table.

This is just an alias for finagg.sec.feat.Annual.get_ticker_set().

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that may be valid for creating industry-normalized annual features that also have at least lb rows for each tag used for constructing the features.

Examples

>>> "AAPL" in finagg.sec.feat.annual.normalized.get_candidate_ticker_set()  
True
classmethod get_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that contain all the columns for creating industry-normalized annual features that also have at least lb rows.

Examples

>>> "AAPL" in finagg.sec.feat.annual.normalized.get_ticker_set()  
True
classmethod get_tickers_sorted_by(column: str, /, *, ascending: bool = True, year: int = -1, engine: None | Engine = None) list[str][source]

Get all tickers in the feature’s SQL table sorted by a particular column.

Parameters:
  • column – Feature column to sort by.

  • ascending – Whether to return results in ascending order according to the values in column.

  • year – Year to select from. Defaults to the most recent year that has data available.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Tickers sorted by a feature column for a particular year.

Examples

>>> ts = finagg.sec.feat.annual.normalized.get_tickers_sorted_by(
...         "NORM(EarningsPerShareBasic)",
...         year=2020,
... )
>>> "AMD" == ts[0]  
True
classmethod install(tickers: None | set[str] = None, *, processes: int = 3, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the annual SQL tables, transforming them into normalized features, and then writing to the refined annual normalized SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the candidate tickers from NormalizedAnnual.get_candidate_ticker_set().

  • processes – Number of background processes to run in parallel when processing tickers.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_refined(ticker: str, df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the dataframe to the feature store for ticker.

Parameters:
  • ticker – Company ticker.

  • df – Dataframe to store completely as rows in a local SQL table.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.

class finagg.sec.feat.Quarterly[source]

Bases: object

Methods for gathering quarterly features from SEC EDGAR data.

The module variable finagg.sec.feat.quarterly is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

It doesn’t matter which data source you use to gather features. They all return equivalent dataframes.

>>> df1 = finagg.sec.feat.quarterly.from_api("AAPL").head(5)
>>> df2 = finagg.sec.feat.quarterly.from_raw("AAPL").head(5)
>>> df3 = finagg.sec.feat.quarterly.from_refined("AAPL").head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
>>> pd.testing.assert_frame_equal(df1, df3, rtol=1e-4)
industry

Quarterly features aggregated for an entire industry. The most popular way for accessing the IndustryQuarterly feature set.

normalized

A company’s quarterly features normalized by its industry. The most popular way for accessing the NormalizedQuarterly feature set.

classmethod from_api(ticker: str, /, *, start: None | str = None, end: None | str = None) DataFrame[source]

Get quarterly features directly from the SEC API.

Not all data series are published at the same rate or time. Missing rows for less-frequent quarterly publications are forward filled.

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

Returns:

Quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.quarterly.from_api("AAPL").head(5)  
                    LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   fp filed                                                                                               ...
2010 Q1 2010-01-25            0.182629                  -0.023676                                  0.012840 ...
     Q2 2010-04-21            0.000000                   0.000000                                  0.000000 ...
     Q3 2010-07-21            0.000000                   0.000000                                  0.000000 ...
2011 Q1 2011-01-19            0.459174                   0.278241                                  0.017805 ...
     Q2 2011-04-21            0.000000                   0.000000                                  0.000000 ...
classmethod from_raw(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get quarterly features from a local SEC SQL table.

Not all data series are published at the same rate or time. Missing rows for less-frequent quarterly publications are forward filled.

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.quarterly.from_raw("AAPL").head(5)  
                    LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   fp filed                                                                                               ...
2010 Q1 2010-01-25            0.182629                  -0.023676                                  0.012840 ...
     Q2 2010-04-21            0.000000                   0.000000                                  0.000000 ...
     Q3 2010-07-21            0.000000                   0.000000                                  0.000000 ...
2011 Q1 2011-01-19            0.459174                   0.278241                                  0.017805 ...
     Q2 2011-04-21            0.000000                   0.000000                                  0.000000 ...
classmethod from_refined(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from the refined SQL table.

This is the preferred method for accessing features for offline analysis (assuming data in the local SQL table is current).

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Raises:

NoResultFound – If there are no rows for ticker in the refined SQL table.

Examples

>>> finagg.sec.feat.quarterly.from_refined("AAPL").head(5)  
                    LOG_CHANGE(Assets)  LOG_CHANGE(AssetsCurrent)  LOG_CHANGE(CommonStockSharesOutstanding) ...
fy   fp filed                                                                                               ...
2010 Q1 2010-01-25            0.182629                  -0.023676                                  0.012840 ...
     Q2 2010-04-21            0.000000                   0.000000                                  0.000000 ...
     Q3 2010-07-21            0.000000                   0.000000                                  0.000000 ...
2011 Q1 2011-01-19            0.459174                   0.278241                                  0.017805 ...
     Q2 2011-04-21            0.000000                   0.000000                                  0.000000 ...
classmethod get_candidate_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the raw SQL table that MAY BE ELIGIBLE to be in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that may be valid for creating quarterly features that also have at least lb rows for each tag used for constructing the features.

Examples

>>> "AAPL" in finagg.sec.feat.quarterly.get_candidate_ticker_set()  
True
classmethod get_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that contain all the columns for creating quarterly features that also have at least lb rows.

Examples

>>> "AAPL" in finagg.sec.feat.quarterly.get_ticker_set()
True
classmethod install(tickers: None | set[str] = None, *, processes: int = 3, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the raw SQL tables, transforming them into quarterly features, and then writing to the refined quarterly SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the candidate tickers from Quarterly.get_candidate_ticker_set().

  • processes – Number of background processes to run in parallel when processing tickers.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_refined(ticker: str, df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the given dataframe to the refined feature table while using the ticker ticker.

Parameters:
  • ticker – Company ticker.

  • df – Dataframe to store as rows in a local SQL table

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.

class finagg.sec.feat.IndustryQuarterly[source]

Bases: object

Methods for gathering industry-averaged quarterly data from SEC features.

The class variable finagg.sec.feat.Quarterly.industry is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

You can aggregate this feature set using a ticker or an industry code directly.

>>> df1 = finagg.sec.feat.quarterly.industry.from_refined(ticker="MSFT").head(5)
>>> df2 = finagg.sec.feat.quarterly.industry.from_refined(code=73).head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
classmethod from_refined(*, ticker: None | str = None, code: None | str = None, level: Literal[2, 3, 4] = 2, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get quarterly features from the feature store, aggregated for an entire industry.

The industry can be chosen according to a company or by an industry code directly. If a company is provided, then the first level digits of the company’s SIC code is used for the industry code.

Parameters:
  • ticker – Company ticker. Lookup the industry associated with this company. Mutually exclusive with code.

  • code – Industry SIC code to use for industry lookup. Mutually exclusive with ticker.

  • level

    Industry level to aggregate features at. The industry used according to ticker or code is subsampled according to this value. Options include:

    • 2 = major group (e.g., furniture and fixtures)

    • 3 = industry group (e.g., office furnitures)

    • 4 = industry (e.g., wood office furniture)

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Raises:
  • ValueError – If neither a ticker nor code are provided.

  • NoResultFound – If there are no rows for ticker or code in the refined SQL table.

Examples

>>> df = finagg.sec.feat.quarterly.industry.from_refined(ticker="AAPL").head(5)
>>> df["mean"]  
name                AssetCoverageRatio  DebtEquityRatio  EarningsPerShareBasic ...
fy   fp filed                                                                  ...
2010 Q1 2010-04-29            1.718796         0.956111               0.865000 ...
     Q2 2010-07-30            1.753165         0.874013               0.380000 ...
     Q3 2010-11-04            1.830832         0.842663               1.062857 ...
2011 Q1 2011-05-05            2.164743         0.757230               1.022500 ...
     Q2 2011-08-08            2.222168         0.718292               2.016667 ...
>>> df["std"]  
name                AssetCoverageRatio  DebtEquityRatio  EarningsPerShareBasic ...
fy   fp filed                                                                  ...
2010 Q1 2010-04-29            0.394667         0.327296               1.139079 ...
     Q2 2010-07-30            0.310804         0.285554               2.954901 ...
     Q3 2010-11-04            0.351052         0.273552               3.075227 ...
2011 Q1 2011-05-05            1.166501         0.311317               1.137287 ...
     Q2 2011-08-08            1.104677         0.313766               2.075428 ...
class finagg.sec.feat.NormalizedQuarterly[source]

Bases: object

Quarterly features from SEC EDGAR data normalized according to industry averages and standard deviations.

The class variable finagg.sec.feat.Quarterly.normalized is an instance of this feature set implementation and is the most popular interface for calling feature methods.

Examples

It doesn’t matter which data source you use to gather features. They both return equivalent dataframes.

>>> df1 = finagg.sec.feat.quarterly.normalized.from_other_refined("AAPL").head(5)
>>> df2 = finagg.sec.feat.quarterly.normalized.from_refined("AAPL").head(5)
>>> pd.testing.assert_frame_equal(df1, df2, rtol=1e-4)
classmethod from_other_refined(ticker: str, /, *, level: Literal[2, 3, 4] = 2, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from other feature SQL tables.

Parameters:
  • ticker – Company ticker.

  • level

    Industry level to aggregate relative features at. The industry used according to ticker is subsampled according to this value. Options include:

    • 2 = major group (e.g., furniture and fixtures)

    • 3 = industry group (e.g., office furnitures)

    • 4 = industry (e.g., wood office furniture)

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Relative quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Examples

>>> finagg.sec.feat.quarterly.normalized.from_other_refined("AAPL").head(5)  
                    NORM(LOG_CHANGE(Assets))  NORM(LOG_CHANGE(AssetsCurrent))  NORM(LOG_CHANGE(CommonStockSharesOutstanding)) ...
fy   fp filed                                                                                                                 ...
2010 Q2 2010-04-21                  0.000000                         0.000000                                        0.000000 ...
     Q3 2010-07-21                  0.000000                         0.000000                                        0.000000 ...
2011 Q1 2011-01-19                  0.978816                         0.074032                                        0.407282 ...
     Q2 2011-04-21                  0.000000                         0.000000                                        0.000000 ...
     Q3 2011-07-20                 -0.353553                        -0.353553                                        0.051602 ...
classmethod from_refined(ticker: str, /, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get features from the features SQL table.

This is the preferred method for accessing features for offline analysis (assuming data in the local SQL table is current).

Parameters:
  • ticker – Company ticker.

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Quarterly data dataframe with each tag as a separate column. Sorted by filing date.

Raises:

NoResultFound – If there are no rows for ticker in the refined SQL table.

Examples

>>> finagg.sec.feat.quarterly.normalized.from_refined("AAPL").head(5)  
                    NORM(LOG_CHANGE(Assets))  NORM(LOG_CHANGE(AssetsCurrent))  NORM(LOG_CHANGE(CommonStockSharesOutstanding)) ...
fy   fp filed                                                                                                                 ...
2010 Q2 2010-04-21                  0.000000                         0.000000                                        0.000000 ...
     Q3 2010-07-21                  0.000000                         0.000000                                        0.000000 ...
2011 Q1 2011-01-19                  0.978816                         0.074032                                        0.407282 ...
     Q2 2011-04-21                  0.000000                         0.000000                                        0.000000 ...
     Q3 2011-07-20                 -0.353553                        -0.353553                                        0.051602 ...
classmethod get_candidate_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the quarterly SQL table that MAY BE ELIGIBLE to be in the feature’s SQL table.

This is just an alias for finagg.sec.feat.Quarterly.get_ticker_set().

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that may be valid for creating industry-normalized quarterly features that also have at least lb rows for each tag used for constructing the features.

Examples

>>> "AAPL" in finagg.sec.feat.quarterly.normalized.get_candidate_ticker_set()  
True
classmethod get_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique tickers in the feature’s SQL table.

Parameters:
  • lb – Minimum number of rows required to include a ticker in the returned set.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

All unique tickers that contain all the columns for creating industry-normalized quarterly features that also have at least lb rows.

Examples

>>> "AAPL" in finagg.sec.feat.quarterly.normalized.get_ticker_set()  
True
classmethod get_tickers_sorted_by(column: str, /, *, ascending: bool = True, year: int = -1, quarter: int = -1, engine: None | Engine = None) list[str][source]

Get all tickers in the feature’s SQL table sorted by a particular column.

Parameters:
  • column – Feature column to sort by.

  • ascending – Whether to return results in ascending order according to the values in column.

  • year – Year to select from. Defaults to the most recent year that has data available.

  • quarter – Quarter to select from. Defaults to the most recent quarter that has data available.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Tickers sorted by a feature column for a particular year and quarter.

Examples

>>> ts = finagg.sec.feat.quarterly.normalized.get_tickers_sorted_by(
...         "NORM(EarningsPerShareBasic)",
...         year=2020,
...         quarter=3
... )
>>> "AMD" == ts[0]  
True
classmethod install(tickers: None | set[str] = None, *, processes: int = 3, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the quarterly SQL tables, transforming them into normalized features, and then writing to the refined quarterly normalized SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the candidate tickers from NormalizedQuarterly.get_candidate_ticker_set().

  • processes – Number of background processes to run in parallel when processing tickers.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_refined(ticker: str, df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the dataframe to the feature store for ticker.

Parameters:
  • ticker – Company ticker.

  • df – Dataframe to store completely as rows in a local SQL table.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.

class finagg.sec.feat.Submissions[source]

Bases: object

Get a single company’s metadata as-is from raw SEC data.

The module variable finagg.sec.feat.submissions is an instance of this feature set implementation and is the most popular interface for calling feature methods.

classmethod from_raw(ticker: str, /, *, engine: None | Engine = None) DataFrame[source]

Get a single company’s metadata as-is from raw SEC data.

The metadata provided for each company varies and each company’s metadata may be incomplete. Only their SEC CIK and SIC industry code are guaranteed to be provided.

Parameters:
  • ticker – Company ticker.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

A dataframe containing the company’s metadata.

Raises:

NoResultFound – If there are no rows for ticker in the raw SQL table.

Examples

>>> finagg.sec.feat.submissions.from_raw("AAPL")  
          cik ticker entity_type   sic sic_description ...
0  0000320193   AAPL        None  3571            None ...
classmethod get_ticker_set(*, engine: None | Engine = None) set[str][source]

Get all unique ticker symbols in the raw SQL submissions table.

This method is convenient for accessing the tickers that have raw SQL data associated with them so the data associated with those tickers can be further refined. A common pattern is to use this method and other get_ticker_set methods (such as those found in finagg.sec.feat) to determine which tickers are missing data from other tables or features.

Parameters:

engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Examples

>>> "AAPL" in finagg.sec.feat.submissions.get_ticker_set()  
True
classmethod install(tickers: None | set[str] = None, *, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the API, and then writing the data to the raw submissions SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the tickers from finagg.indices.api.get_ticker_set().

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod install_from_zip(tickers: None | set[str] = None, *, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install all submissions data by downloading the bulk submissions zip file from the API, and then writing the data to the raw submissions SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the tickers from finagg.indices.api.get_ticker_set().

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_raw(df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the given dataframe to the raw feature table.

Parameters:
  • df – Dataframe to store as rows in a local SQL table

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.

class finagg.sec.feat.Tags[source]

Bases: object

Get a single company concept tag as-is from raw SEC data.

The module variable finagg.sec.feat.tags is an instance of this feature set implementation and is the most popular interface for calling feature methods.

classmethod from_raw(ticker: str, tag: str, /, *, form: str = '10-Q', start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get a single company concept tag as-is from raw SEC data.

This is the preferred method for accessing raw SEC data without using the SEC API.

Parameters:
  • ticker – Company ticker.

  • tag – Company concept tag to retreive.

  • form

    SEC filing form to retrieve rows for. Options include:

    • ”10-Q” = quarterly filings

    • ”10-K” = annual filings

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

A dataframe containing the company concept tag values across the specified period.

Raises:

NoResultFound – If there are no rows for ticker and tag in the raw SQL table.

Examples

>>> finagg.sec.feat.tags.from_raw("AAPL", "EarningsPerShareBasic").head(5)  
                         units  value
fy   fp filed
2009 Q3 2009-07-22  USD/shares   4.20
2010 Q1 2010-01-25  USD/shares   2.54
     Q2 2010-04-21  USD/shares   4.35
     Q3 2010-07-21  USD/shares   6.40
2011 Q1 2011-01-19  USD/shares   3.74
classmethod get_ticker_set(lb: int = 1, *, start: None | str = None, end: None | str = None, engine: None | Engine = None) set[str][source]

Get all unique ticker symbols in the raw SQL tables that have at least lb rows.

This method is convenient for accessing the tickers that have raw SQL data associated with them so the data associated with those tickers can be further refined. A common pattern is to use this method and other get_ticker_set methods (such as those found in finagg.sec.feat) to determine which tickers are missing data from other tables or features.

Parameters:
  • lb – Lower bound number of rows that a company must have for its ticker to be included in the set returned by this method.

  • start – The start date of the observation period to include when searching for tickers. Defaults to the first recorded date.

  • end – The end date of the observation period to include when searching for tickers. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Examples

>>> "AAPL" in finagg.sec.feat.tags.get_ticker_set()  
True
classmethod group_and_pivot_from_raw(ticker: str, tags: list[str], /, *, form: str = '10-Q', start: None | str = None, end: None | str = None, engine: None | Engine = None) DataFrame[source]

Get one or more company concept tags from raw SEC data.

Joins all the tags into one table, pivoting the columns such that each tag is in its own column. Tags are forward-filled to fill gaps.

Parameters:
  • ticker – Company ticker.

  • tags – Company concept tags to retreive.

  • form

    SEC filing form to retrieve rows for. Options include:

    • ”10-Q” = quarterly filings

    • ”10-K” = annual filings

  • start – The start date of the observation period. Defaults to the first recorded date.

  • end – The end date of the observation period. Defaults to the last recorded date.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

A dataframe containing the company concept tag values across the specified period.

Raises:

NoResultFound – If there are no rows for ticker or any of the tags in tags for ticker in the raw SQL table.

Examples

>>> finagg.sec.feat.tags.group_and_pivot_from_raw(
...     "AAPL",
...     ["Assets", "EarningsPerShareBasic"],
...     form="10-Q"
... ).head(5)  
                          Assets  EarningsPerShareBasic
fy   fp filed
2009 Q3 2009-07-22  3.957200e+10                   4.20
2010 Q1 2010-01-25  4.750100e+10                   2.54
     Q2 2010-04-21  4.750100e+10                   4.35
     Q3 2010-07-21  4.750100e+10                   6.40
2011 Q1 2011-01-19  7.518300e+10                   3.74
classmethod install(tickers: None | set[str] = None, *, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install data associated with tickers by pulling data from the API, and then writing the data to the raw tags SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the tickers from Submissions.get_ticker_set().

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod install_from_zip(tickers: None | set[str] = None, *, processes: int = 3, engine: None | Engine = None, recreate_tables: bool = False) int[source]

Install all popular tags data by downloading the bulk company facts zip file from the API, and then writing the data to the raw tags SQL table.

Tables associated with this method are created if they don’t already exist.

Parameters:
  • tickers – Set of tickers to install features for. Defaults to all the tickers from Submissions.get_ticker_set().

  • processes – Number of background processes to use when installing data.

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

  • recreate_tables – Whether to drop and recreate tables, wiping all previously installed data.

Returns:

Number of rows written to the feature’s SQL table.

classmethod to_raw(df: DataFrame, /, *, engine: None | Engine = None) int[source]

Write the given dataframe to the raw feature table.

Parameters:
  • df – Dataframe to store as rows in a local SQL table

  • engine – Feature store database engine. Defaults to the engine at finagg.backend.engine.

Returns:

Number of rows written to the SQL table.