rl8.nn.modules package
Submodules
rl8.nn.modules.activations module
Activation function registry for convenience.
rl8.nn.modules.attention module
Attention module definitions.
- class rl8.nn.modules.attention.PointerNetwork(embed_dim: int, /)[source]
Bases:
Module
[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor),Tensor
]3D attention applied to sequence encoders and decoders for selecting the next element from the encoder’s sequence to be appended to the decoder’s sequence.
An implementation of Pointer Networks adapted from this blog post which is adapted from this repo.
- Parameters:
embed_dim – Feature dimension of the encoders/decoders.
- forward(decoder_out: Tensor, encoder_out: Tensor, mask: None | Tensor = None) Tensor [source]
Select valid values from
encoder_out
as indicated bymask
using features fromdecoder_out
.- Parameters:
decoder_out – Sequence decoder output with shape
[B, D, C]
.encoder_out – Sequence encoder output with shape
[B, E, C]
.mask – Mask with shape
[B, D, E]
indicating the sequence element ofencoder_out
that can be selected.
- Returns:
Logits with shape
[B, D, E]
indicating the likelihood of selecting an encoded sequence element in E for each decoder sequence element inD
. The last item in theD
dimension,[:, -1, :]
, typically indicates the likelihoods of selecting each encoder sequence element for the next decoder sequence element (which is usually the desired output).
- class rl8.nn.modules.attention.CrossAttention(embed_dim: int, /, num_heads: int = 2, hidden_dim: int = 128, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: None | str = 'cat')[source]
Bases:
Module
[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor),Tensor
]Perform multihead attention keys to a query, mapping the keys of sequence length
K
to the query of sequence lengthQ
.- Parameters:
embed_dim – Key and query feature dimension.
num_heads – Number of attention heads.
hidden_dim – Number of hidden features in the hidden layers of the feedforward network that’s after performing attention.
activation_fn – Activation function ID.
attention_dropout – Sequence dropout in the attention heads.
hidden_dropout – Feedforward dropout after performing attention.
skip_kind – Kind of residual or skip connection to make between the output of the multihead attention and the feedforward module.
- attention: MultiheadAttention
Underlying multihead attention mechanism.
- skip_connection: SequentialSkipConnection
Skip connection for applying special residual connections.
- forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor [source]
Apply multihead attention keys to a query, mapping the keys of sequence length
K
to the query of sequence lengthQ
.- Parameters:
q – Query with shape
[B, Q, E]
.kv – Keys with shape
[B, K, E]
.key_padding_mask – Mask with shape
[B, K]
indicating sequence elements ofkv
that are PADDED or INVALID values.attention_mask – Mask with shape
[Q, K]
that indicates whether elements inQ
can attend to elements inK
.
- Returns:
Values with shape
[B, Q, E]
.
- class rl8.nn.modules.attention.SelfAttention(embed_dim: int, /, num_heads: int = 2, hidden_dim: int = 128, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: None | str = 'cat')[source]
Bases:
Module
[(<class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor),Tensor
]Perform multihead attention keys to a a sequence, using it for the queries, keys, and values.
- Parameters:
embed_dim – Key and query feature dimension.
num_heads – Number of attention heads.
hidden_dim – Number of hidden features in the hidden layers of the feedforward network that’s after performing attention.
activation_fn – Activation function ID.
attention_dropout – Sequence dropout in the attention heads.
hidden_dropout – Feedforward dropout after performing attention.
skip_kind – Kind of residual or skip connection to make between the output of the multihead attention and the feedforward module.
- attention: MultiheadAttention
Underlying multihead attention mechanism.
- skip_connection: SequentialSkipConnection
Skip connection for applying special residual connections.
- forward(x: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor [source]
Apply self-attention to
x
, attending sequence elements to themselves.- Parameters:
x – Query with shape
[B, X, E]
.key_padding_mask – Mask with shape
[B, X]
indicating sequence elements ofkv
that are PADDED or INVALID values.attention_mask – Mask with shape
[X, X]
that indicates whether elements inX
can attend to other elements inX
.
- Returns:
Values with shape
[B, X, E]
.
- class rl8.nn.modules.attention.SelfAttentionStack(module: SelfAttention, num_layers: int, /, *, share_parameters: bool = False)[source]
Bases:
Module
[(<class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor),Tensor
]Stacks of self-attention to iteratively attend over a sequence.
- Parameters:
module – Self-attention module to repeat.
num_layers – Number of layers of
module
to repeat.share_parameters – Whether to use the same module for each layer.
- forward(x: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor [source]
Iteratively apply self-attention to
x
, attending sequence elements to themselves.- Parameters:
x – Query with shape
[B, X, E]
.key_padding_mask – Mask with shape
[B, X]
indicating sequence elements of`kv`
that are PADDED or INVALID values.attention_mask – Mask with shape
[X, X]
that indicates whether elements inX
can attend to other elements inX
.
- Returns:
Values with shape
[B, X, E]
.
rl8.nn.modules.embeddings module
Embeddings for sequences.
- class rl8.nn.modules.embeddings.PositionalEmbedding(embed_dim: int, max_len: int, /, *, dropout: float = 0.0)[source]
Bases:
Module
[(<class ‘torch.Tensor’>,),Tensor
]Apply positional embeddings to an input sequence.
Positional embeddings that help distinguish values at different parts of a sequence. Beneficial if an entire sequence is attended to.
- Parameters:
embed_dim – Input feature dimension.
max_len – Max input sequence length.
dropout – Dropout on the output of
PositionalEmbedding.forward()
.
- dropout: Dropout
Dropout on the output of
PositionalEmbedding.forward()
.
rl8.nn.modules.mlp module
- class rl8.nn.modules.mlp.MLP(input_dim: int, hiddens: Sequence[int], /, *, activation_fn: str = 'relu', norm_layer: None | type[torch.nn.modules.batchnorm.BatchNorm1d | torch.nn.modules.normalization.LayerNorm] = None, bias: bool = True, dropout: float = 0.0, inplace: bool = False)[source]
Bases:
Sequential
,Module
[(<class ‘torch.Tensor’>,),Tensor
]Simple implementation of a multi-layer perceptron.
- Parameters:
input_dim – Input layer dimension.
hiddens – Hidden layer dimensions.
activation_fn – Hidden activation function that immediately follows the linear layer or the norm layer (if one exists).
norm_layer – Optional normalization layer type that immediately follows the linear layer.
bias – Whether to include a bias for each layer in the MLP.
dropout – Optional dropout that after the activation function.
inplace – Whether activation functions occur in-place.
rl8.nn.modules.module module
Typing help for torch.nn.Module.
rl8.nn.modules.perceiver module
Perceiver definitions.
- class rl8.nn.modules.perceiver.PerceiverLayer(embed_dim: int, /, *, num_heads: int = 2, hidden_dim: int = 128, num_layers: int = 2, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: str = 'cat', share_parameters: bool = False)[source]
Bases:
Module
[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor),Tensor
]An implementation of a Perceiver with cross-attention followed by self-attention stacks.
Useful for embedding several, variable-length sequences into a latent array for dimensionality reduction. Allows inputs of different feature sizes to be embedded into a constant size.
- Parameters:
embed_dim – Feature dimension of the latent array and input sequence. Each sequence is expected to be embedded by its own embedder, which could just be a simple linear transform.
num_heads – Number of attention heads in the cross-attention and self-attention modules.
hidden_dim – Number of hidden features in the hidden layers of the feedforward networks that’re after performing attention.
activation_fn – Activation function ID.
attention_dropout – Sequence dropout in the attention heads.
hidden_dropout – Feedforward dropout after performing attention.
skip_kind – Kind of residual or skip connection to make between outputs of the multihead attentions and the feedforward modules.
share_parameters – Whether to use the same parameters for the layers in the self-attention stack.
- forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor [source]
Apply cross-attention keys to a query, mapping the keys of sequence length
K
to the query of sequence lengthQ
.- Parameters:
q – Query with shape
[B, Q, E]
. Usually the latent array from previous forward passes or perceiver layers.kv – Keys with shape
[B, K, E]
.key_padding_mask – Mask with shape
[B, K]
indicating sequence elements ofkv
that are PADDED or INVALID values.attention_mask – Mask with shape
[Q, K]
that indicates whether elements inQ
can attend to elements inK
.
- Returns:
Values with shape
[B, Q, E]
.
- class rl8.nn.modules.perceiver.PerceiverIOLayer(embed_dim: int, output_seq_dim: int, /, *, num_heads: int = 2, hidden_dim: int = 128, num_layers: int = 2, activation_fn: str = 'relu', attention_dropout: float = 0.0, hidden_dropout: float = 0.0, skip_kind: str = 'cat', share_parameters: bool = False)[source]
Bases:
Module
[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>, None | torch.Tensor, None | torch.Tensor),Tensor
]An implementation of PerceiverIO with cross-attention followed by self-attention stacks followed by cross-attention with a fixed-sized output array.
In addition to the benefits of PerceiverLayer, this module attends a latent array to a final output dimensionality to effectively apply weighted averaging of sequences to a different dimension. Useful if the latent array needs to be processed into several, different-sized sequences for separate outputs.
- Parameters:
embed_dim – Feature dimension of the latent array and input sequence. Each sequence is expected to be embedded by its own embedder, which could just be a simple linear transform.
output_seq_dim – Output sequence size to transform the latent array sequence size to.
num_heads – Number of attention heads in the cross-attention and self-attention modules.
hidden_dim – Number of hidden features in the hidden layers of the feedforward networks that’re after performing attention.
activation_fn – Activation function ID.
attention_dropout – Sequence dropout in the attention heads.
hidden_dropout – Feedforward dropout after performing attention.
skip_kind – Kind of residual or skip connection to make between outputs of the multihead attentions and the feedforward modules.
share_parameters – Whether to use the same parameters for the layers in the self-attention stack.
- forward(q: Tensor, kv: Tensor, key_padding_mask: None | Tensor = None, attention_mask: None | Tensor = None) Tensor [source]
Apply cross-attention keys to a query, mapping the keys of sequence length
K
to the query of sequence lengthQ
.- Parameters:
q – Query with shape
[B, Q, E]
. Usually the latent array from previous forward passes or perceiver layers.kv – Keys with shape
[B, K, E]
.key_padding_mask – Mask with shape
[B, K]
indicating sequence elements ofkv
that are PADDED or INVALID values.attention_mask – Mask with shape
[Q, K]
that indicates whether elements inQ
can attend to elements inK
.
- Returns:
Values with shape
[B, O, E]
whereO
is the output array sequence size.
rl8.nn.modules.skip module
Skip connection module definitions.
- class rl8.nn.modules.skip.SequentialSkipConnection(embed_dim: int, kind: None | str = 'cat')[source]
Bases:
Module
[(<class ‘torch.Tensor’>, <class ‘torch.Tensor’>),Tensor
]Sequential skip connection.
Apply a skip connection to an input and the output of a layer that uses that input.
- Parameters:
embed_dim – Original input feature size.
kind –
Type of skip connection to apply. Options include:
”residual” for a standard residual connection (summing outputs)
”cat” for concatenating outputs
None for no skip connection
- kind: None | str
Kind of skip connection. “residual” for a standard residual connection (summing outputs), “cat” for concatenating outputs, and
None
for no skip connection (reduces to a regular, sequential module).
- append(module: Module, /) int [source]
Append
module
to the skip connection.If the skip connection kind is concatenation, then an intermediate layer is also appended to downsample the feature dimension back to the original embedding dimension.
- Parameters:
module – Module to append and apply a skip connection to.
- Returns:
Number of output features from the sequential skip connection.
- forward(x: Tensor, y: Tensor, /) Tensor [source]
Perform a sequential skip connection, first applying a skip connection to
x
andy
, and then sequentially applying skip connections to the output and the output of the next layer.- Parameters:
x – Skip connection seed with shape
[B, T, ...]
.y – Skip connection seed with same shape as
y
.
- Returns:
A tensor with shape depending on
SequentialSkipConnection.kind
.
Module contents
Custom PyTorch modules.