rl8.nn.modules package

Submodules

rl8.nn.modules.activations module

Activation function registry for convenience.

class rl8.nn.modules.activations.SquaredReLU(*args, **kwargs)[source]

Bases: Module[(<class ‘torch.Tensor’>,), Tensor]

forward(x: Tensor) Tensor[source]

Subclasses implement this method.

rl8.nn.modules.activations.get_activation(name: str, /, **params: Any) Module[source]

Return an activation instance by its name.

rl8.nn.modules.attention module

Attention module definitions.

class rl8.nn.modules.attention.PointerNetwork(embed_dim: int, /)[source]

Bases: Module[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor), Tensor]

3D attention applied to sequence encoders and decoders for selecting the next element from the encoder’s sequence to be appended to the decoder’s sequence.

An implementation of Pointer Networks adapted from this blog post which is adapted from this repo.

Parameters:

embed_dim – Feature dimension of the encoders/decoders.

W1: Linear

Weights applied to the encoder’s output.

W2: Linear

Weights applied to the decoder’s output.

VT: Linear

Weights applied to the blended encoder-decoder selection matrix.

forward(decoder_out: Tensor, encoder_out: Tensor, mask: None | Tensor = None) Tensor[source]

Select valid values from encoder_out as indicated by mask using features from decoder_out.

Parameters:
  • decoder_out – Sequence decoder output with shape [B, D, C].

  • encoder_out – Sequence encoder output with shape [B, E, C].

  • mask – Mask with shape [B, D, E] indicating the sequence element of encoder_out that can be selected.

Returns:

Logits with shape [B, D, E] indicating the likelihood of selecting an encoded sequence element in E for each decoder sequence element in D. The last item in the D dimension, [:, -1, :], typically indicates the likelihoods of selecting each encoder sequence element for the next decoder sequence element (which is usually the desired output).

class rl8.nn.modules.attention.CrossAttention(embed_dim: int, /, num_heads: int = 2, hidden_dim: int = 128, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: None | str = 'cat')[source]

Bases: Module[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor), Tensor]

Perform multihead attention keys to a query, mapping the keys of sequence length K to the query of sequence length Q.

Parameters:
  • embed_dim – Key and query feature dimension.

  • num_heads – Number of attention heads.

  • hidden_dim – Number of hidden features in the hidden layers of the feedforward network that’s after performing attention.

  • activation_fn – Activation function ID.

  • attention_dropout – Sequence dropout in the attention heads.

  • hidden_dropout – Feedforward dropout after performing attention.

  • skip_kind – Kind of residual or skip connection to make between the output of the multihead attention and the feedforward module.

attention: MultiheadAttention

Underlying multihead attention mechanism.

kv_norm: LayerNorm

Norm for the keys.

q_norm: LayerNorm

Norm for the queries.

skip_connection: SequentialSkipConnection

Skip connection for applying special residual connections.

forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor[source]

Apply multihead attention keys to a query, mapping the keys of sequence length K to the query of sequence length Q.

Parameters:
  • q – Query with shape [B, Q, E].

  • kv – Keys with shape [B, K, E].

  • key_padding_mask – Mask with shape [B, K] indicating sequence elements of kv that are PADDED or INVALID values.

  • attention_mask – Mask with shape [Q, K] that indicates whether elements in Q can attend to elements in K.

Returns:

Values with shape [B, Q, E].

class rl8.nn.modules.attention.SelfAttention(embed_dim: int, /, num_heads: int = 2, hidden_dim: int = 128, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: None | str = 'cat')[source]

Bases: Module[(<class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor), Tensor]

Perform multihead attention keys to a a sequence, using it for the queries, keys, and values.

Parameters:
  • embed_dim – Key and query feature dimension.

  • num_heads – Number of attention heads.

  • hidden_dim – Number of hidden features in the hidden layers of the feedforward network that’s after performing attention.

  • activation_fn – Activation function ID.

  • attention_dropout – Sequence dropout in the attention heads.

  • hidden_dropout – Feedforward dropout after performing attention.

  • skip_kind – Kind of residual or skip connection to make between the output of the multihead attention and the feedforward module.

attention: MultiheadAttention

Underlying multihead attention mechanism.

skip_connection: SequentialSkipConnection

Skip connection for applying special residual connections.

x_norm: LayerNorm

Norm for the queries/keys/values.

forward(x: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor[source]

Apply self-attention to x, attending sequence elements to themselves.

Parameters:
  • x – Query with shape [B, X, E].

  • key_padding_mask – Mask with shape [B, X] indicating sequence elements of kv that are PADDED or INVALID values.

  • attention_mask – Mask with shape [X, X] that indicates whether elements in X can attend to other elements in X.

Returns:

Values with shape [B, X, E].

class rl8.nn.modules.attention.SelfAttentionStack(module: SelfAttention, num_layers: int, /, *, share_parameters: bool = False)[source]

Bases: Module[(<class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor), Tensor]

Stacks of self-attention to iteratively attend over a sequence.

Parameters:
  • module – Self-attention module to repeat.

  • num_layers – Number of layers of module to repeat.

  • share_parameters – Whether to use the same module for each layer.

forward(x: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor[source]

Iteratively apply self-attention to x, attending sequence elements to themselves.

Parameters:
  • x – Query with shape [B, X, E].

  • key_padding_mask – Mask with shape [B, X] indicating sequence elements of `kv` that are PADDED or INVALID values.

  • attention_mask – Mask with shape [X, X] that indicates whether elements in X can attend to other elements in X.

Returns:

Values with shape [B, X, E].

rl8.nn.modules.embeddings module

Embeddings for sequences.

class rl8.nn.modules.embeddings.PositionalEmbedding(embed_dim: int, max_len: int, /, *, dropout: float = 0.0)[source]

Bases: Module[(<class ‘torch.Tensor’>,), Tensor]

Apply positional embeddings to an input sequence.

Positional embeddings that help distinguish values at different parts of a sequence. Beneficial if an entire sequence is attended to.

Parameters:
  • embed_dim – Input feature dimension.

  • max_len – Max input sequence length.

  • dropout – Dropout on the output of PositionalEmbedding.forward().

pe: Tensor

Positional embedding tensor.

dropout: Dropout

Dropout on the output of PositionalEmbedding.forward().

forward(x: Tensor, /) Tensor[source]

Add positional embeddings to x.

Parameters:

x – Tensor with shape [B, T, E] where B is the batch dimension, T is the time or sequence dimension, and E is a feature dimension.

Returns:

Tensor with added positional embeddings.

rl8.nn.modules.mlp module

class rl8.nn.modules.mlp.MLP(input_dim: int, hiddens: Sequence[int], /, *, activation_fn: str = 'relu', norm_layer: None | type[torch.nn.modules.batchnorm.BatchNorm1d | torch.nn.modules.normalization.LayerNorm] = None, bias: bool = True, dropout: float = 0.0, inplace: bool = False)[source]

Bases: Sequential, Module[(<class ‘torch.Tensor’>,), Tensor]

Simple implementation of a multi-layer perceptron.

Parameters:
  • input_dim – Input layer dimension.

  • hiddens – Hidden layer dimensions.

  • activation_fn – Hidden activation function that immediately follows the linear layer or the norm layer (if one exists).

  • norm_layer – Optional normalization layer type that immediately follows the linear layer.

  • bias – Whether to include a bias for each layer in the MLP.

  • dropout – Optional dropout that after the activation function.

  • inplace – Whether activation functions occur in-place.

rl8.nn.modules.module module

Typing help for torch.nn.Module.

class rl8.nn.modules.module.Module(*args, **kwargs)[source]

Bases: ABC, Generic[_P, _T], Module

Workaround for PyTorch modules with variadic generics.

abstract forward(*args: ~typing.~_P, **kwargs: ~typing.~_P) _T[source]

Subclasses implement this method.

rl8.nn.modules.perceiver module

Perceiver definitions.

class rl8.nn.modules.perceiver.PerceiverLayer(embed_dim: int, /, *, num_heads: int = 2, hidden_dim: int = 128, num_layers: int = 2, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: str = 'cat', share_parameters: bool = False)[source]

Bases: Module[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor), Tensor]

An implementation of a Perceiver with cross-attention followed by self-attention stacks.

Useful for embedding several, variable-length sequences into a latent array for dimensionality reduction. Allows inputs of different feature sizes to be embedded into a constant size.

Parameters:
  • embed_dim – Feature dimension of the latent array and input sequence. Each sequence is expected to be embedded by its own embedder, which could just be a simple linear transform.

  • num_heads – Number of attention heads in the cross-attention and self-attention modules.

  • hidden_dim – Number of hidden features in the hidden layers of the feedforward networks that’re after performing attention.

  • activation_fn – Activation function ID.

  • attention_dropout – Sequence dropout in the attention heads.

  • hidden_dropout – Feedforward dropout after performing attention.

  • skip_kind – Kind of residual or skip connection to make between outputs of the multihead attentions and the feedforward modules.

  • share_parameters – Whether to use the same parameters for the layers in the self-attention stack.

forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor[source]

Apply cross-attention keys to a query, mapping the keys of sequence length K to the query of sequence length Q.

Parameters:
  • q – Query with shape [B, Q, E]. Usually the latent array from previous forward passes or perceiver layers.

  • kv – Keys with shape [B, K, E].

  • key_padding_mask – Mask with shape [B, K] indicating sequence elements of kv that are PADDED or INVALID values.

  • attention_mask – Mask with shape [Q, K] that indicates whether elements in Q can attend to elements in K.

Returns:

Values with shape [B, Q, E].

class rl8.nn.modules.perceiver.PerceiverIOLayer(embed_dim: int, output_seq_dim: int, /, *, num_heads: int = 2, hidden_dim: int = 128, num_layers: int = 2, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: str = 'cat', share_parameters: bool = False)[source]

Bases: Module[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor), Tensor]

An implementation of PerceiverIO with cross-attention followed by self-attention stacks followed by cross-attention with a fixed-sized output array.

In addition to the benefits of PerceiverLayer, this module attends a latent array to a final output dimensionality to effectively apply weighted averaging of sequences to a different dimension. Useful if the latent array needs to be processed into several, different-sized sequences for separate outputs.

Parameters:
  • embed_dim – Feature dimension of the latent array and input sequence. Each sequence is expected to be embedded by its own embedder, which could just be a simple linear transform.

  • output_seq_dim – Output sequence size to transform the latent array sequence size to.

  • num_heads – Number of attention heads in the cross-attention and self-attention modules.

  • hidden_dim – Number of hidden features in the hidden layers of the feedforward networks that’re after performing attention.

  • activation_fn – Activation function ID.

  • attention_dropout – Sequence dropout in the attention heads.

  • hidden_dropout – Feedforward dropout after performing attention.

  • skip_kind – Kind of residual or skip connection to make between outputs of the multihead attentions and the feedforward modules.

  • share_parameters – Whether to use the same parameters for the layers in the self-attention stack.

forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor[source]

Apply cross-attention keys to a query, mapping the keys of sequence length K to the query of sequence length Q.

Parameters:
  • q – Query with shape [B, Q, E]. Usually the latent array from previous forward passes or perceiver layers.

  • kv – Keys with shape [B, K, E].

  • key_padding_mask – Mask with shape [B, K] indicating sequence elements of kv that are PADDED or INVALID values.

  • attention_mask – Mask with shape [Q, K] that indicates whether elements in Q can attend to elements in K.

Returns:

Values with shape [B, O, E] where O is the output array sequence size.

rl8.nn.modules.skip module

Skip connection module definitions.

class rl8.nn.modules.skip.SequentialSkipConnection(embed_dim: int, kind: None | str = 'cat')[source]

Bases: Module[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>), Tensor]

Sequential skip connection.

Apply a skip connection to an input and the output of a layer that uses that input.

Parameters:
  • embed_dim – Original input feature size.

  • kind

    Type of skip connection to apply. Options include:

    • ”residual” for a standard residual connection (summing outputs)

    • ”cat” for concatenating outputs

    • None for no skip connection

kind: None | str

Kind of skip connection. “residual” for a standard residual connection (summing outputs), “cat” for concatenating outputs, and None for no skip connection (reduces to a regular, sequential module).

append(module: Module, /) int[source]

Append module to the skip connection.

If the skip connection kind is concatenation, then an intermediate layer is also appended to downsample the feature dimension back to the original embedding dimension.

Parameters:

module – Module to append and apply a skip connection to.

Returns:

Number of output features from the sequential skip connection.

forward(x: Tensor, y: Tensor, /) Tensor[source]

Perform a sequential skip connection, first applying a skip connection to x and y, and then sequentially applying skip connections to the output and the output of the next layer.

Parameters:
  • x – Skip connection seed with shape [B, T, ...].

  • y – Skip connection seed with same shape as y.

Returns:

A tensor with shape depending on SequentialSkipConnection.kind.

property in_features: int

Return the first number of input features.

property out_features: int

Return the number of output features according to the number of input features, the kind of skip connection, and whether there’s a fan-in layer.

Module contents

Custom PyTorch modules.