finagg package

Subpackages

Submodules

finagg.config module

finagg configuration and global SQLAlchemy setup. Backend file paths and SQLAlchemy engine database URLs are configured in this module at runtime according to environment variables.

Environment variables should ideally be configured using an .env file in the desired working directory. Running finagg install will automaticaly setup the .env file for you according to your input values. Environment variables assigned in the .env file are loaded on the finagg module’s first instantiation.

finagg.config.root_path

Parent directory of the findata directory where the backend database and API cache file will be stored (unless otherwise configured according to the relevant environment variables). This can be set with the FINAGG_ROOT_PATH environment variable. This defaults to and is typically set to the current working directory. It’s recommended you permanently set this value using the finagg install CLI.

finagg.config.disable_http_cache

Whether the disable the HTTP requests cache. Instead of a cachable session, a default, uncached user session will be used for all requests.

finagg.config.http_cache_path

Path to the API cache file. This can be set with the FINAGG_HTTP_CACHE_PATH environment variable and should NOT include a file extension. All API implementations share the same cache backend.

finagg.config.database_path

Default path to the database file. The FINAGG_DATABASE_URL environment variable will take precedence over this value.

finagg.config.database_url

SQLAlchemy URL to the database. This can be set with the FINAGG_DATABASE_URL environment variable and should include a file extension. This defaults to f"sqlite:///{finagg.config.database_path}".

finagg.config.engine

The default SQLAlchemy engine for the backend database. All feature and SQL submodules use this engine and the database URL as configured by database_url for reading and writing to and from the database by default.

finagg.ratelimit module

Customizable rate-limiting for requests-style getters.

The definitions within this submodule are used throughout finagg for respecting 3rd party API rate limits to avoid server-side throttling.

class finagg.ratelimit.RateLimit(limit: float, period: float | timedelta, /, *, buffer: float = 0.0)[source]

Bases: ABC

Interface for defining a rate limit for an external API getter.

You can create a custom rate-limiter by inheriting from this class and implementing a custom eval() method.

Parameters:
  • limit – Max limit within period (e.g., max number of requests, errors, size in memory, etc.).

  • period – Time interval for evaluating limit.

  • buffer – Reduce limit by this fraction. Adds a bit of leeway to ensure limit is not reached. Useful for enforcing response size limits.

See also

guard(): For the intended usage of getting a

RateLimitGuard instance.

RequestLimit: For an example of a request

rate limiter.

limit: float

Max quantity allowed within period. The quantity type being limited is dependent on what’s returned by eval().

period: float

Time interval for evaluating limit (in seconds).

abstract eval(response: Response, /) float | dict[str, float][source]

Evaluate a response and determine how much it contributes to the max limit imposed by this instance.

This is the main method that should be overwritten by subclasses to create custom rate-limiters. This method is called with each requests’s response to determine how much that request/response contributes to the rate-limiting.

Parameters:

response – Request response (possibly cached).

Returns:

A number indicating the request/response’s contribution to the rate limit OR a dictionary containing:

  • ”limit”: a number indicating the request/response’s contribution to the rate limit

  • ”wait”: time to wait before a new request can be made

class finagg.ratelimit.RequestLimit(limit: float, period: float | timedelta, /, *, buffer: float = 0.0)[source]

Bases: RateLimit

Limit the number of requests made by the underlying getter.

eval(response: Response, /) float | dict[str, float][source]

Evaluate a response and determine how much it contributes to the max limit imposed by this instance.

This is the main method that should be overwritten by subclasses to create custom rate-limiters. This method is called with each requests’s response to determine how much that request/response contributes to the rate-limiting.

Parameters:

response – Request response (possibly cached).

Returns:

A number indicating the request/response’s contribution to the rate limit OR a dictionary containing:

  • ”limit”: a number indicating the request/response’s contribution to the rate limit

  • ”wait”: time to wait before a new request can be made

class finagg.ratelimit.ErrorLimit(limit: float, period: float | timedelta, /, *, buffer: float = 0.0)[source]

Bases: RateLimit

Limit the number of errors occurred when using the underlying getter.

eval(response: Response, /) float | dict[str, float][source]

Evaluate a response and determine how much it contributes to the max limit imposed by this instance.

This is the main method that should be overwritten by subclasses to create custom rate-limiters. This method is called with each requests’s response to determine how much that request/response contributes to the rate-limiting.

Parameters:

response – Request response (possibly cached).

Returns:

A number indicating the request/response’s contribution to the rate limit OR a dictionary containing:

  • ”limit”: a number indicating the request/response’s contribution to the rate limit

  • ”wait”: time to wait before a new request can be made

class finagg.ratelimit.SizeLimit(limit: float, period: float | timedelta, /, *, buffer: float = 0.0)[source]

Bases: RateLimit

Limit the size of responses when using the underlying getter.

eval(response: Response, /) float | dict[str, float][source]

Evaluate a response and determine how much it contributes to the max limit imposed by this instance.

This is the main method that should be overwritten by subclasses to create custom rate-limiters. This method is called with each requests’s response to determine how much that request/response contributes to the rate-limiting.

Parameters:

response – Request response (possibly cached).

Returns:

A number indicating the request/response’s contribution to the rate limit OR a dictionary containing:

  • ”limit”: a number indicating the request/response’s contribution to the rate limit

  • ”wait”: time to wait before a new request can be made

class finagg.ratelimit.RateLimitGuard(f: Callable[[_P], Response], limits: tuple[finagg.ratelimit.RateLimit, ...], /, *, warn: bool = False)[source]

Bases: Generic[_P]

Wraps requests-like getters to introduce blocking functionality when requests are getting close to violating call limits.

Parameters:
  • f – Requests-style getter that’s wrapped and rate-limited.

  • limits – Limits to apply to the requests-style getter.

  • warn – Whether to print a message to stdout whenever client-side throttling is occurring to respect limits.

See also

guard(): For the intended usage of getting a

RateLimitGuard instance.

RequestLimit: For an example of a request

rate limiter.

f: Callable[[_P], Response]

requests-like getter that returns a response.

limits: tuple[finagg.ratelimit.RateLimit, ...]

Limits to apply to requests/responses.

warn: bool

Whether to print a warning when requests are being temporarily blocked to respect imposed rate limits.

finagg.ratelimit.guard(limits: Sequence[RateLimit], /, *, warn: bool = False) Callable[[Callable[[_P], Response]], RateLimitGuard[_P]][source]

Apply limits to a requests-style getter.

Parameters:
  • limits – Rate limits to apply to the requests-style getter.

  • warn – Whether to print a message when client-side throttling is occurring.

Returns:

A decorator that wraps the original requests-style getter in a RateLimitGuard to avoid exceeding limits.

Examples

Limit 5 requests to Google per second.

>>> import requests
>>> from datetime import timedelta
>>> from finagg.ratelimit import RequestLimit, guard
>>> @guard([RequestLimit(5, timedelta(seconds=1))])
... def get() -> requests.Response:
...     return requests.get("https://google.com")

finagg.testing module

Testing utils used for finagg’s own unit tests.

finagg.testing.sqlite_engine(path: str, /, *, metadata: None | MetaData = None, table: None | Table = None) Generator[Engine, None, None][source]

Yield a test database engine that’s cleaned-up after usage.

Parameters:
  • path – Path to SQLite database file.

  • metadata – Optional metadata for creating and dropping tables before and after yielding the engine, respectively.

  • table – Optional table for creating and dropping before and after yielding the engine, respectively.

Returns:

A database engine that’s subsequently disposed of and whose respective database file is deleted after use.

Raises:

ValueError – If both metadata and table are provided.

Examples

Using the testing util as a pytest fixture.

>>> import pytest
>>> from sqlalchemy.engine import Engine
>>> @pytest.fixture
... def engine() -> Engine:
...     yield from finagg.testing.sqlite_engine("/path/to/db.sqlite")

finagg.utils module

Generic utils used by subpackages.

finagg.utils.expand_csv(values: str | list[str], /) set[str][source]

Expand the given list of strings into a set of strings, where each value in the list of strings could be:

  1. Comma-separated values

  2. A path that points to a CSV file containing values

  3. A regular ol’ string

Parameters:

values – List of strings denoting comma-separated values, or CSV files containing comma-separated values.

Returns:

A set of all strings found within the given list.

Examples

>>> ts = finagg.utils.expand_csv(["AAPL,MSFT"])
>>> "AAPL" in ts
True
finagg.utils.get_func_cols(table: Table | DataFrame, /) list[str][source]

Return the column names in table that have the format FUNC(arg0, arg1, ...).

Parameters:

table – SQLAlchemy table or dataframe.

Returns:

List of functional-style column names in table. Returns an empty list if none are found.

Raises:

TypeError – If the given object is not a SQLAlchemy table or dataframe.

finagg.utils.parse_func_call(s: str, /) None | tuple[str, list[str]][source]

Parse a function’s name and its arguments’ names from a string of format FUNC(arg0, arg1, ...).

Parameters:

s – Any string of format FUNC(arg0, arg1, ...).

Returns:

A tuple containing the parsed function’s name and its arguments’ names. Returns None if the string doesn’t match the expected format.

Examples

>>> finagg.utils.parse_func_call("LOG_CHANGE(high, open)")
('LOG_CHANGE', ['high', 'open'])
finagg.utils.resolve_col_order(table: Table, df: DataFrame, /, *, extra_ignore: None | list[str] = None) DataFrame[source]

Reorder the columns in df to match the order of the columns in table.

Parameters:
  • table – SQLAlchemy table that defines the column order. Primary keys are ignored from the column order as they’re assumed to be used as part of the index in df.

  • df – Dataframe to reorder.

  • extra_ignore – Extra columns to ignore in the reordering. Sometimes columns aren’t used as primary keys but are used as part of the index in the dataframe. Those columns should be provided in this option.

Returns:

Dataframe with columns ordered according to the column order in table.

finagg.utils.resolve_func_cols(table: Table, df: DataFrame, /, *, drop: bool = False, inplace: bool = False) DataFrame[source]

Inspect table and apply functions to columns that exist in table and df according to columns named like FUNC(col0, col1, ...) within table such that new columns in df are the result of the applied functions and have names matching the function call signatures.

Parameters:
  • table – SQLAchemy table that defines a superset of columns that should exist in df.

  • df – Dataframe that contains a subset of columns within table that will be updated with columns defined by table that have names like FUNC(col0, col1, ...).

  • drop – Whether to drop all other columns on the returned dataframe except for the columns in table.

  • inplace – Whether to perform operations in-place and use df as the output dataframe.

Returns:

A new dataframe with columns from df and columns according to columns named within table like FUNC(col0, col1, ...) where columns col0 and col1 exist in df.

Raises:

ValueError – If the function parsed from the column name has no supported and corresponding function.

finagg.utils.safe_log_change(series: Series, other: None | Series = None) Series[source]

Safely compute log change between two columns.

Replaces Inf values with NaN and forward-fills. This function is meant to be used with pd.Series.apply.

Parameters:
  • series – Series of values.

  • other – Reference series to compute change against. Defaults to series shifted forward one index.

Returns:

A series representing percent changes of col.

finagg.utils.safe_pct_change(series: Series, other: None | Series = None) Series[source]

Safely compute percent change between two columns.

Replaces Inf values with NaN and forward-fills. This function is meant to be used with pd.Series.apply.

Parameters:
  • series – Series of values.

  • other – Reference series to compute change against. Defaults to series shifted forward one index.

Returns:

A series representing percent changes of col.

finagg.utils.setenv(name: str, value: str, /, *, exist_ok: bool = False) Path[source]

Set the value of the environment variable name to value.

The environment variable is permanently set in the environment and in the current process.

Parameters:
  • name – Environment variable name.

  • value – Environment variable value.

  • exist_ok – Whether it’s okay if an environment variable of the same name already exists. If True, it will be overwritten.

Returns:

Path to the file the environment variable was written to.

Raises:

RuntimeError – If exist_ok is False and an environment variable of the same name already exists.

finagg.utils.today

Today’s date. Used by a number of submodules as the default end date when getting data from APIs or SQL tables.

Module contents

Main package interface.