Modules

Dynamic Self Attention Encoder

class pytorch_wrapper.modules.dynamic_self_attention_encoder.DynamicSelfAttentionEncoder(time_step_size, att_scores_nb=1, att_iterations=2, projection_size=100, projection_activation=<sphinx.ext.autodoc.importer._MockObject object>, attended_representation_activation=<sphinx.ext.autodoc.importer._MockObject object>, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Dynamic Self Attention Encoder (https://arxiv.org/abs/1808.07383).

Parameters:
  • time_step_size – Time step size.
  • att_scores_nb – Number of attended representations.
  • att_iterations – Number of iterations of the dynamic self-attention algorithm.
  • projection_size – Size of the projection layer.
  • projection_activation – Callable that creates the activation of the projection layer.
  • attended_representation_activation – Callable that creates the activation used on the attended representations after each iteration.
  • is_end_padded – Whether to mask at the end.
forward(batch_sequences, batch_sequence_lengths)
Parameters:
  • batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size).
  • batch_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the sequences.
Returns:

2D Tensor (batch_size, projection_size * att_scores_nb) containing the encodings.

Embedding Layer

class pytorch_wrapper.modules.embedding_layer.EmbeddingLayer(vocab_size, emb_size, trainable, padding_idx=None)

Bases: sphinx.ext.autodoc.importer._MockObject

Embedding Layer.

Parameters:
  • vocab_size – Size of the vocabulary.
  • emb_size – Size of the embeddings.
  • trainable – Whether the embeddings should be altered during training.
  • padding_idx – Index of the vector to be initialized with zeros.
forward(x)
load_embeddings(embeddings)

Loads pre-trained embeddings.

Parameters:embeddings – Numpy array of the appropriate size containing the pre-trained embeddings.

Layer Norm

class pytorch_wrapper.modules.layer_norm.LayerNorm(last_dim_size, eps=1e-06)

Bases: sphinx.ext.autodoc.importer._MockObject

Layer Normalization (https://arxiv.org/pdf/1607.06450.pdf).

Parameters:
  • last_dim_size – Size of last dimension.
  • eps – Small number for numerical stability (avoid division by zero).
forward(x)
Parameters:x – Tensor to be layer normalized.
Returns:Layer normalized Tensor.

MLP

class pytorch_wrapper.modules.mlp.MLP(input_size, input_activation=None, input_dp=None, input_pre_activation_bn=False, input_post_activation_bn=False, input_pre_activation_ln=False, input_post_activation_ln=False, num_hidden_layers=1, hidden_layer_size=128, hidden_layer_bias=True, hidden_layer_init=None, hidden_layer_bias_init=None, hidden_activation=<sphinx.ext.autodoc.importer._MockObject object>, hidden_dp=None, hidden_layer_pre_activation_bn=False, hidden_layer_post_activation_bn=False, hidden_layer_pre_activation_ln=False, hidden_layer_post_activation_ln=False, output_layer_init=None, output_layer_bias_init=None, output_size=1, output_layer_bias=True, output_activation=None, output_dp=None, output_layer_pre_activation_bn=False, output_layer_post_activation_bn=False, output_layer_pre_activation_ln=False, output_layer_post_activation_ln=False)

Bases: sphinx.ext.autodoc.importer._MockObject

Multi Layer Perceptron.

Parameters:
  • input_size – Size of the last dimension of the input.
  • input_activation – Callable that creates the activation used on the input.
  • input_dp – Callable that creates the activation used on the input.
  • input_pre_activation_bn – Whether to use batch normalization before the activation of the input layer.
  • input_post_activation_bn – Whether to use batch normalization after the activation of the input layer.
  • input_pre_activation_ln – Whether to use layer normalization before the activation of the input layer.
  • input_post_activation_ln – Whether to use layer normalization after the activation of the input layer.
  • num_hidden_layers – Number of hidden layers.
  • hidden_layer_size – Size of hidden layers. It is also possible to provide a list containing a different size for each hidden layer.
  • hidden_layer_bias – Whether to use bias. It is also possible to provide a list containing a different option for each hidden layer.
  • hidden_layer_init – Callable that initializes inplace the weights of the hidden layers.
  • hidden_layer_bias_init – Callable that initializes inplace the bias of the hidden layers.
  • hidden_activation – Callable that creates the activation used after each hidden layer. It is also possible to provide a list containing num_hidden_layers callables.
  • hidden_dp – Dropout probability for the hidden layers. It is also possible to provide a list containing num_hidden_layers probabilities.
  • hidden_layer_pre_activation_bn – Whether to use batch normalization before the activation of each hidden layer.
  • hidden_layer_post_activation_bn – Whether to use batch normalization after the activation of each hidden layer.
  • hidden_layer_pre_activation_ln – Whether to use layer normalization before the activation of each hidden layer.
  • hidden_layer_post_activation_ln – Whether to use layer normalization after the activation of each hidden layer.
  • output_layer_init – Callable that initializes inplace the weights of the output layer.
  • output_layer_bias_init – Callable that initializes inplace the bias of the output layer.
  • output_size – Output size.
  • output_layer_bias – Whether to use bias.
  • output_activation – Callable that creates the activation used after the output layer.
  • output_dp – Dropout probability for the output layer.
  • output_layer_pre_activation_bn – Whether to use batch normalization before the activation of the output layer.
  • output_layer_post_activation_bn – Whether to use batch normalization before the activation of the output layer.
  • output_layer_pre_activation_ln – Whether to use layer normalization before the activation of the output layer.
  • output_layer_post_activation_ln – Whether to use layer normalization before the activation of the output layer.
forward(x)
Parameters:x – Tensor having its last dimension being of size input_size.
Returns:Tensor with the same shape as x except the last dimension which is of size output_size.

Multi-Head Attention

class pytorch_wrapper.modules.multi_head_attention.MultiHeadAttention(q_time_step_size, k_time_step_size, v_time_step_size, heads, attention_type='dot', dp=0, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Multi Head Attention (https://arxiv.org/pdf/1706.03762.pdf).

Parameters:
  • q_time_step_size – Query time step size.
  • k_time_step_size – Key time step size.
  • v_time_step_size – Value time step size.
  • heads – Number of attention heads.
  • attention_type – Attention type [‘dot’, ‘multiplicative’, ‘additive’].
  • dp – Dropout probability.
  • is_end_padded – Whether to mask at the end.
forward(q, k, v, q_sequence_lengths, k_sequence_lengths)
Parameters:
  • q – 3D Tensor (batch_size, q_sequence_length, time_step_size) containing the queries.
  • k – 3D Tensor (batch_size, k_sequence_length, time_step_size) containing the keys.
  • v – 3D Tensor (batch_size, k_sequence_length, time_step_size) containing the values.
  • q_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the query sequences.
  • k_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the key sequences.
Returns:

3D Tensor (batch_size, q_sequence_length, time_step_size).

Residual

class pytorch_wrapper.modules.residual.Residual(module, residual_index=None, model_output_key=None)

Bases: sphinx.ext.autodoc.importer._MockObject

Adds the input of a module to it’s output.

Parameters:
  • module – The module to wrap.
  • residual_index – The index of the input to be added. Leave None if it is not a multi-input module.
  • model_output_key – The key of the output of the model to be added. Leave None if it is not a multi-output module.
forward(*x)
Parameters:x – The input of the wrapped module.
Returns:The output of the wrapped module added to it’s input.

Sequence Basic CNN Block

class pytorch_wrapper.modules.sequence_basic_cnn_block.SequenceBasicCNNBlock(time_step_size, kernel_height=3, out_channels=300, activation=<sphinx.ext.autodoc.importer._MockObject object>, dp=0)

Bases: sphinx.ext.autodoc.importer._MockObject

Sequence Basic CNN Block.

Parameters:
  • time_step_size – Time step size.
  • kernel_height – Filter height.
  • out_channels – Number of filters.
  • activation – Callable that creates the activation function.
  • dp – Dropout probability.
forward(batch_sequences)
Parameters:batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size) containing the sequence.
Returns:2D Tensor (batch_size, sequence_length, out_channels) containing the encodings.

Sequence Basic CNN Encoder

class pytorch_wrapper.modules.sequence_basic_cnn_encoder.SequenceBasicCNNEncoder(time_step_size, input_activation=None, kernel_heights=(1, 2, 3, 4, 5), out_channels=300, pre_pooling_activation=<sphinx.ext.autodoc.importer._MockObject object>, pooling_function=<sphinx.ext.autodoc.importer._MockObject object>, post_pooling_activation=None, post_pooling_dp=0)

Bases: sphinx.ext.autodoc.importer._MockObject

Basic CNN Encoder for sequences (https://arxiv.org/abs/1408.5882).

Parameters:
  • time_step_size – Time step size.
  • input_activation – Callable that creates the activation used on the input.
  • kernel_heights – Tuple containing filter heights.
  • out_channels – Number of filters for each filter height.
  • pre_pooling_activation – Callable that creates the activation used before pooling.
  • pooling_function – Callable that performs a pooling function before the activation.
  • post_pooling_activation – Callable that creates the activation used after pooling.
  • post_pooling_dp – Callable that performs a pooling function before the activation.
forward(batch_sequences)
Parameters:batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size) containing the sequence.
Returns:2D Tensor (batch_size, len(kernel_heights) * out_channels) containing the encodings.

Sequence Dense CNN

class pytorch_wrapper.modules.sequence_dense_cnn.SequenceDenseCNN(input_size, projection_layer_size=150, kernel_heights=(3, 5), feature_map_increase=75, cnn_depth=3, output_projection_layer_size=300, activation=<sphinx.ext.autodoc.importer._MockObject object>, dp=0, normalize_output=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Dense CNN for sequences (https://arxiv.org/abs/1808.07383).

Parameters:
  • input_size – Time step size.
  • projection_layer_size – Size of projection_layer.
  • kernel_heights – Kernel height of the filters.
  • feature_map_increase – Number of filters of each convolutional layer.
  • cnn_depth – Number of convolutional layers per kernel height.
  • output_projection_layer_size – Size of the output time_steps.
  • activation – Callable that creates the activation used after each layer.
  • dp – Dropout probability.
  • normalize_output – Whether to perform l2 normalization on the output.
forward(batch_sequences)
Parameters:batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size).
Returns:3D Tensor (batch_size, sequence_length, output_projection_layer_size).

Sinusoidal Positional Embedding Layer

class pytorch_wrapper.modules.sinusoidal_positional_embedding_layer.SinusoidalPositionalEmbeddingLayer(emb_size, pad_at_end=True, init_max_sentence_length=1024)

Bases: sphinx.ext.autodoc.importer._MockObject

Sinusoidal Positional Embeddings (https://arxiv.org/pdf/1706.03762.pdf).

Parameters:
  • emb_size – Size of the positional embeddings.
  • pad_at_end – Whether to pad at the end.
  • init_max_sentence_length – Initial maximum length of sentence.
create_embeddings(num_embeddings)
forward(length_tensor, max_sequence_length)
Parameters:
  • length_tensor – ND Tensor containing the real lengths.
  • max_sequence_length – Int that corresponds to the size of (N+1)D dimension.
Returns:

(N+2)D Tensor with the positional embeddings.

Softmax Attention Layer

class pytorch_wrapper.modules.softmax_attention_encoder.SoftmaxAttentionEncoder(attention_mlp, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Encodes a sequence using context based soft-max attention.

Parameters:
  • attention_mlp – MLP object used to generate unnormalized attention score(s). If the last dimension of the tensor returned by the MLP is larger than 1 then multi-attention is applied.
  • is_end_padded – Whether to mask at the end.
forward(batch_sequences, batch_context_vector, batch_sequence_lengths)
Parameters:
  • batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size).
  • batch_context_vector – 2D Tensor (batch_size, context_vector_size).
  • batch_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the sequences.
Returns:

Dict with a 2D Tensor (batch_size, time_step_size) or a 3D Tensor in case of multi-attention (batch_size, nb_attentions, time_step_size) containing the encodings (key=`output`) and a 2D Tensor (batch_size, sequence_length) or a 3D Tensor (batch_size, sequence_length, nb_attentions) containing the attention scores (key=`att_scores`).

Softmax Self Attention Layer

class pytorch_wrapper.modules.softmax_self_attention_encoder.SoftmaxSelfAttentionEncoder(attention_mlp, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Encodes a sequence using soft-max self-attention.

Parameters:
  • attention_mlp – MLP object used to generate unnormalized attention score(s). If the last dimension of the tensor returned by the MLP is larger than 1 then multi-attention is applied.
  • is_end_padded – Whether to mask at the end.
forward(batch_sequences, batch_sequence_lengths)
Parameters:
  • batch_sequences – 3D Tensor (batch_size, sequence_length, time_step_size).
  • batch_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the sequences.
Returns:

Dict with a 2D Tensor (batch_size, time_step_size) or a 3D Tensor in case of multi-attention (batch_size, nb_attentions, time_step_size) containing the encodings (key=`output`) and a 2D Tensor (batch_size, sequence_length) or a 3D Tensor (batch_size, sequence_length, nb_attentions) containing the attention scores (key=`att_scores`).

Transformer Encoder

class pytorch_wrapper.modules.transformer_encoder.TransformerEncoder(time_step_size, heads, depth, dp=0, use_positional_embeddings=True, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Transformer Encoder (https://arxiv.org/pdf/1706.03762.pdf).

Parameters:
  • time_step_size – Time step size.
  • heads – Number of attention heads.
  • depth – Number of transformer blocks.
  • dp – Dropout probability.
  • use_positional_embeddings – Whether to use positional embeddings.
  • is_end_padded – Whether to mask at the end.
forward(batch_sequences, batch_sequence_lengths)
Parameters:
  • batch_sequences – batch_sequences: 3D Tensor (batch_size, sequence_length, time_step_size).
  • batch_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the sequences.
Returns:

3D Tensor (batch_size, sequence_length, time_step_size).

Transformer Encoder Block

class pytorch_wrapper.modules.transformer_encoder_block.TransformerEncoderBlock(time_step_size, heads, out_mlp, dp=0, is_end_padded=True)

Bases: sphinx.ext.autodoc.importer._MockObject

Transformer Encoder Block (https://arxiv.org/pdf/1706.03762.pdf).

Parameters:
  • time_step_size – Time step size.
  • heads – Number of attention heads.
  • out_mlp – MLP that will be performed after the attended sequence is generated.
  • dp – Dropout probability.
  • is_end_padded – Whether to mask at the end.
forward(batch_sequences, batch_sequence_lengths)
Parameters:
  • batch_sequences – batch_sequences: 3D Tensor (batch_size, sequence_length, time_step_size).
  • batch_sequence_lengths – 1D Tensor (batch_size) containing the lengths of the sequences.
Returns:

3D Tensor (batch_size, sequence_length, time_step_size).