pytext.models.embeddings package¶
Submodules¶
pytext.models.embeddings.char_embedding module¶
-
class
pytext.models.embeddings.char_embedding.
CharacterEmbedding
(num_embeddings: int, embed_dim: int, out_channels: int, kernel_sizes: List[int], highway_layers: int, projection_dim: Optional[int], *args, **kwargs)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for character aware CNN embeddings for tokens. It uses convolution followed by max-pooling over character embeddings to obtain an embedding vector for each token.
Implementation is loosely based on https://arxiv.org/abs/1508.06615.
Parameters: - num_embeddings (int) – Total number of characters (vocabulary size).
- embed_dim (int) – Size of character embeddings to be passed to convolutions.
- out_channels (int) – Number of output channels.
- kernel_sizes (List[int]) – Dimension of input Tensor passed to MLP.
- highway_layers (int) – Number of highway layers applied to pooled output.
- projection_dim (int) – If specified, size of output embedding for token, via a linear projection from convolution output.
-
char_embed
¶ Character embedding table.
Type: nn.Embedding
-
convs
¶ Convolution layers that operate on character
Type: nn.ModuleList
-
embeddings.
-
highway_layers
¶ Highway layers on top of convolution output.
Type: nn.Module
-
projection
¶ Final linear layer to token embedding.
Type: nn.Module
-
embedding_dim
¶ Dimension of the final token embedding produced.
Type: int
-
forward
(chars: torch.Tensor) → torch.Tensor[source]¶ Given a batch of sentences such that tokens are broken into character ids, produce token embedding vectors for each sentence in the batch.
Parameters: - chars (torch.Tensor) – Batch of sentences where each token is broken
- characters. (into) –
- Dimension – batch size X maximum sentence length X maximum word length
Returns: Embedded batch of sentences. Dimension: batch size X maximum sentence length, token embedding size. Token embedding size = out_channels * len(self.convs))
Return type: torch.Tensor
-
classmethod
from_config
(config: pytext.config.field_config.CharFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, vocab_size: Optional[int] = None)[source]¶ Factory method to construct an instance of CharacterEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (CharFeatConfig) – Configuration object specifying all the parameters of CharacterEmbedding.
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of CharacterEmbedding.
Return type: type
-
class
pytext.models.embeddings.char_embedding.
Highway
(input_dim: int, num_layers: int = 1)[source]¶ Bases:
torch.nn.modules.module.Module
A Highway layer <https://arxiv.org/abs/1505.00387>. Adopted from the AllenNLP implementation.
-
forward
(x: torch.Tensor)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
pytext.models.embeddings.contextual_token_embedding module¶
-
class
pytext.models.embeddings.contextual_token_embedding.
ContextualTokenEmbedding
(embed_dim: int, downsample_dim: Optional[int] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for providing token embeddings from a pretrained model.
-
forward
(embedding: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
pytext.models.embeddings.dict_embedding module¶
-
class
pytext.models.embeddings.dict_embedding.
DictEmbedding
(num_embeddings: int, embed_dim: int, pooling_type: pytext.config.module_config.PoolingType, pad_index: int = 1, unk_index: int = 0, mobile: bool = False)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for dictionary feature embeddings for tokens. Dictionary features are also known as gazetteer features. These are per token discrete features that the module learns embeddings for. Example: For the utterance Order coffee from Starbucks, the dictionary features could be
[ {"tokenIdx": 1, "features": {"drink/beverage": 0.8, "music/song": 0.2}}, {"tokenIdx": 3, "features": {"store/coffee_shop": 1.0}} ]
:: Thus, for a given token there can be more than one dictionary features each of which has a confidence score. The final embedding for a token is the weighted average of the dictionary embeddings followed by a pooling operation such that the module produces an embedding vector per token.
Parameters: - num_embeddings (int) – Total number of dictionary features (vocabulary size).
- embed_dim (int) – Size of embedding vector.
- pooling_type (PoolingType) – Type of pooling for combining the dictionary feature embeddings.
-
pooling_type
¶ Type of pooling for combining the dictionary feature embeddings.
Type: PoolingType
-
find_and_replace
(tensor: torch.Tensor, find_val: int, replace_val: int) → torch.Tensor[source]¶ torch.where is not supported for mobile ONNX, this hack allows a mobile exported version of torch.where which is computationally more expensive
-
forward
(feats: torch.Tensor, weights: torch.Tensor, lengths: torch.Tensor) → torch.Tensor[source]¶ Given a batch of sentences such containing dictionary feature ids per token, produce token embedding vectors for each sentence in the batch.
Parameters: - feats (torch.Tensor) – Batch of sentences with dictionary feature ids. shape: [bsz, seq_len * max_feat_per_token]
- weights (torch.Tensor) – Batch of sentences with dictionary feature weights for the dictionary features. shape: [bsz, seq_len * max_feat_per_token]
- lengths (torch.Tensor) – Batch of sentences with the number of dictionary features per token. shape: [bsz, seq_len]
Returns: Embedded batch of sentences. Dimension: batch size X maximum sentence length, token embedding size. Token embedding size = embed_dim passed to the constructor.
Return type: torch.Tensor
-
classmethod
from_config
(config: pytext.config.field_config.DictFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, labels: Optional[pytext.data.utils.Vocabulary] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None)[source]¶ Factory method to construct an instance of DictEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (DictFeatConfig) – Configuration object specifying all the
- of DictEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of DictEmbedding.
Return type: type
pytext.models.embeddings.embedding_base module¶
-
class
pytext.models.embeddings.embedding_base.
EmbeddingBase
(embedding_dim: int)[source]¶ Bases:
pytext.models.module.Module
Base class for token level embedding modules.
Parameters: embedding_dim (int) – Size of embedding vector. -
num_emb_modules
¶ Number of ways to embed a token.
Type: int
-
embedding_dim
¶ Size of embedding vector.
Type: int
-
pytext.models.embeddings.embedding_list module¶
-
class
pytext.models.embeddings.embedding_list.
EmbeddingList
(embeddings: Iterable[pytext.models.embeddings.embedding_base.EmbeddingBase], concat: bool)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
,torch.nn.modules.container.ModuleList
There are more than one way to embed a token and this module provides a way to generate a list of sub-embeddings, concat embedding tensors into a single Tensor or return a tuple of Tensors that can be used by downstream modules.
Parameters: - embeddings (Iterable[EmbeddingBase]) – A sequence of embedding modules to
- a token. (embed) –
- concat (bool) – Whether to concatenate the embedding vectors emitted from
- modules. (embeddings) –
-
num_emb_modules
¶ Number of flattened embeddings in embeddings, e.g: ((e1, e2), e3) has 3 in total
Type: int
-
input_start_indices
¶ List of indices of the sub-embeddings in the embedding list.
Type: List[int]
-
concat
¶ Whether to concatenate the embedding vectors emitted from embeddings modules.
Type: bool
-
embedding_dim
¶ Total embedding size, can be a single int or tuple of int depending on concat setting
-
forward
(*emb_input) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶ Get embeddings from all sub-embeddings and either concatenate them into one Tensor or return them in a tuple.
Parameters: *emb_input (type) – Sequence of token level embeddings to combine. The inputs should match the size of configured embeddings. Each of them is either a Tensor or a tuple of Tensors. Returns: - If concat is True then
- a Tensor is returned by concatenating all embeddings. Otherwise all embeddings are returned in a tuple.
Return type: Union[torch.Tensor, Tuple[torch.Tensor]]
pytext.models.embeddings.mlp_embedding module¶
-
class
pytext.models.embeddings.mlp_embedding.
MLPEmbedding
(embedding_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, mlp_layer_dims: List[int] = ())[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
An MLP embedding wrapper module around torch.nn.Embedding to add transformations for float tensors.
Parameters: - num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- mlp_layer_dims (List[int]) – List of layer dimensions (if any) to add on top of the embedding lookup.
-
forward
(input)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_config
(config: pytext.config.field_config.MLPFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of MLPEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (MLPFeatConfig) – Configuration object specifying all the
- of MLPEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of MLPEmbedding.
Return type: type
pytext.models.embeddings.scriptable_embedding_list module¶
-
class
pytext.models.embeddings.scriptable_embedding_list.
ScriptableEmbeddingList
(embeddings: Iterable[pytext.models.embeddings.embedding_base.EmbeddingBase])[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
This class is a Torchscript-friendly version of pytext.models.embeddings.EmbeddingList. The main differences are that it requires input arguments to be passed in as a list of Tensors, since Torchscript does not allow variable arguments, and that it only supports concat mode, since Torchscript does not support return value variance.
-
class
Wrapper1
(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(xs: List[torch.Tensor])[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
Wrapper3
(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(xs: List[torch.Tensor])[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
forward
(emb_input: List[List[torch.Tensor]]) → torch.Tensor[source]¶ Get embeddings from all sub-embeddings and either concatenate them into one Tensor or return them in a tuple.
Parameters: emb_input (type) – Sequence of token level embeddings to combine. The inputs should match the size of configured embeddings. Each of them is a List of Tensors. Returns: a Tensor is returned by concatenating all embeddings. Return type: torch.Tensor
-
class
pytext.models.embeddings.word_embedding module¶
-
class
pytext.models.embeddings.word_embedding.
WordEmbedding
(num_embeddings: int, embedding_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, unk_token_idx: int = 0, mlp_layer_dims: List[int] = (), padding_idx: Optional[int] = None, vocab: Optional[List[str]] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
A word embedding wrapper module around torch.nn.Embedding with options to initialize the word embedding weights and add MLP layers acting on each word.
Note: Embedding weights for UNK token are always initialized to zeros.
Parameters: - num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- unk_token_idx (int) – Index of UNK token in the word vocabulary.
- mlp_layer_dims (List[int]) – List of layer dimensions (if any) to add on top of the embedding lookup.
-
forward
(input)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_config
(config: pytext.config.field_config.WordFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of WordEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (WordFeatConfig) – Configuration object specifying all the
- of WordEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of WordEmbedding.
Return type: type
pytext.models.embeddings.word_seq_embedding module¶
-
class
pytext.models.embeddings.word_seq_embedding.
WordSeqEmbedding
(lstm_config: pytext.models.representations.bilstm.BiLSTM.Config, num_embeddings: int, word_embed_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, unk_token_idx: int = 0, padding_idx: Optional[int] = None, vocab: Optional[List[str]] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
An embedding module represents a sequence of sentences
Parameters: - lstm_config (BiLSTM.Config) – config of the lstm layer
- num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- unk_token_idx (int) – Index of UNK token in the word vocabulary.
-
forward
(seq_token_idx, seq_token_count)[source]¶ Parameters: - seq_token_idx – shape [batch_size * max_seq_len * max_token_count]
- seq_token_count – shape [batch_size * max_seq_len]
Returns: shape (batch_size * max_seq_len * output_dim)
Return type: embedding
-
classmethod
from_config
(config: pytext.models.embeddings.word_seq_embedding.WordSeqEmbedding.Config, tensorizer: pytext.data.tensorizers.Tensorizer = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of WordEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (WordSeqEmbedding.Config) – Configuration object specifying all the
- of WordEmbedding. (parameters) –
Returns: An instance of WordSeqEmbedding.
Return type: type
Module contents¶
-
class
pytext.models.embeddings.
EmbeddingBase
(embedding_dim: int)[source]¶ Bases:
pytext.models.module.Module
Base class for token level embedding modules.
Parameters: embedding_dim (int) – Size of embedding vector. -
num_emb_modules
¶ Number of ways to embed a token.
Type: int
-
embedding_dim
¶ Size of embedding vector.
Type: int
-
-
class
pytext.models.embeddings.
EmbeddingList
(embeddings: Iterable[pytext.models.embeddings.embedding_base.EmbeddingBase], concat: bool)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
,torch.nn.modules.container.ModuleList
There are more than one way to embed a token and this module provides a way to generate a list of sub-embeddings, concat embedding tensors into a single Tensor or return a tuple of Tensors that can be used by downstream modules.
Parameters: - embeddings (Iterable[EmbeddingBase]) – A sequence of embedding modules to
- a token. (embed) –
- concat (bool) – Whether to concatenate the embedding vectors emitted from
- modules. (embeddings) –
-
num_emb_modules
¶ Number of flattened embeddings in embeddings, e.g: ((e1, e2), e3) has 3 in total
Type: int
-
input_start_indices
¶ List of indices of the sub-embeddings in the embedding list.
Type: List[int]
-
concat
¶ Whether to concatenate the embedding vectors emitted from embeddings modules.
Type: bool
-
embedding_dim
¶ Total embedding size, can be a single int or tuple of int depending on concat setting
-
forward
(*emb_input) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶ Get embeddings from all sub-embeddings and either concatenate them into one Tensor or return them in a tuple.
Parameters: *emb_input (type) – Sequence of token level embeddings to combine. The inputs should match the size of configured embeddings. Each of them is either a Tensor or a tuple of Tensors. Returns: - If concat is True then
- a Tensor is returned by concatenating all embeddings. Otherwise all embeddings are returned in a tuple.
Return type: Union[torch.Tensor, Tuple[torch.Tensor]]
-
class
pytext.models.embeddings.
WordEmbedding
(num_embeddings: int, embedding_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, unk_token_idx: int = 0, mlp_layer_dims: List[int] = (), padding_idx: Optional[int] = None, vocab: Optional[List[str]] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
A word embedding wrapper module around torch.nn.Embedding with options to initialize the word embedding weights and add MLP layers acting on each word.
Note: Embedding weights for UNK token are always initialized to zeros.
Parameters: - num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- unk_token_idx (int) – Index of UNK token in the word vocabulary.
- mlp_layer_dims (List[int]) – List of layer dimensions (if any) to add on top of the embedding lookup.
-
forward
(input)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_config
(config: pytext.config.field_config.WordFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of WordEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (WordFeatConfig) – Configuration object specifying all the
- of WordEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of WordEmbedding.
Return type: type
-
class
pytext.models.embeddings.
DictEmbedding
(num_embeddings: int, embed_dim: int, pooling_type: pytext.config.module_config.PoolingType, pad_index: int = 1, unk_index: int = 0, mobile: bool = False)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for dictionary feature embeddings for tokens. Dictionary features are also known as gazetteer features. These are per token discrete features that the module learns embeddings for. Example: For the utterance Order coffee from Starbucks, the dictionary features could be
[ {"tokenIdx": 1, "features": {"drink/beverage": 0.8, "music/song": 0.2}}, {"tokenIdx": 3, "features": {"store/coffee_shop": 1.0}} ]
:: Thus, for a given token there can be more than one dictionary features each of which has a confidence score. The final embedding for a token is the weighted average of the dictionary embeddings followed by a pooling operation such that the module produces an embedding vector per token.
Parameters: - num_embeddings (int) – Total number of dictionary features (vocabulary size).
- embed_dim (int) – Size of embedding vector.
- pooling_type (PoolingType) – Type of pooling for combining the dictionary feature embeddings.
-
pooling_type
¶ Type of pooling for combining the dictionary feature embeddings.
Type: PoolingType
-
find_and_replace
(tensor: torch.Tensor, find_val: int, replace_val: int) → torch.Tensor[source]¶ torch.where is not supported for mobile ONNX, this hack allows a mobile exported version of torch.where which is computationally more expensive
-
forward
(feats: torch.Tensor, weights: torch.Tensor, lengths: torch.Tensor) → torch.Tensor[source]¶ Given a batch of sentences such containing dictionary feature ids per token, produce token embedding vectors for each sentence in the batch.
Parameters: - feats (torch.Tensor) – Batch of sentences with dictionary feature ids. shape: [bsz, seq_len * max_feat_per_token]
- weights (torch.Tensor) – Batch of sentences with dictionary feature weights for the dictionary features. shape: [bsz, seq_len * max_feat_per_token]
- lengths (torch.Tensor) – Batch of sentences with the number of dictionary features per token. shape: [bsz, seq_len]
Returns: Embedded batch of sentences. Dimension: batch size X maximum sentence length, token embedding size. Token embedding size = embed_dim passed to the constructor.
Return type: torch.Tensor
-
classmethod
from_config
(config: pytext.config.field_config.DictFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, labels: Optional[pytext.data.utils.Vocabulary] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None)[source]¶ Factory method to construct an instance of DictEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (DictFeatConfig) – Configuration object specifying all the
- of DictEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of DictEmbedding.
Return type: type
-
class
pytext.models.embeddings.
CharacterEmbedding
(num_embeddings: int, embed_dim: int, out_channels: int, kernel_sizes: List[int], highway_layers: int, projection_dim: Optional[int], *args, **kwargs)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for character aware CNN embeddings for tokens. It uses convolution followed by max-pooling over character embeddings to obtain an embedding vector for each token.
Implementation is loosely based on https://arxiv.org/abs/1508.06615.
Parameters: - num_embeddings (int) – Total number of characters (vocabulary size).
- embed_dim (int) – Size of character embeddings to be passed to convolutions.
- out_channels (int) – Number of output channels.
- kernel_sizes (List[int]) – Dimension of input Tensor passed to MLP.
- highway_layers (int) – Number of highway layers applied to pooled output.
- projection_dim (int) – If specified, size of output embedding for token, via a linear projection from convolution output.
-
char_embed
¶ Character embedding table.
Type: nn.Embedding
-
convs
¶ Convolution layers that operate on character
Type: nn.ModuleList
-
embeddings.
-
highway_layers
¶ Highway layers on top of convolution output.
Type: nn.Module
-
projection
¶ Final linear layer to token embedding.
Type: nn.Module
-
embedding_dim
¶ Dimension of the final token embedding produced.
Type: int
-
forward
(chars: torch.Tensor) → torch.Tensor[source]¶ Given a batch of sentences such that tokens are broken into character ids, produce token embedding vectors for each sentence in the batch.
Parameters: - chars (torch.Tensor) – Batch of sentences where each token is broken
- characters. (into) –
- Dimension – batch size X maximum sentence length X maximum word length
Returns: Embedded batch of sentences. Dimension: batch size X maximum sentence length, token embedding size. Token embedding size = out_channels * len(self.convs))
Return type: torch.Tensor
-
classmethod
from_config
(config: pytext.config.field_config.CharFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, vocab_size: Optional[int] = None)[source]¶ Factory method to construct an instance of CharacterEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (CharFeatConfig) – Configuration object specifying all the parameters of CharacterEmbedding.
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of CharacterEmbedding.
Return type: type
-
class
pytext.models.embeddings.
ContextualTokenEmbedding
(embed_dim: int, downsample_dim: Optional[int] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
Module for providing token embeddings from a pretrained model.
-
forward
(embedding: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
pytext.models.embeddings.
WordSeqEmbedding
(lstm_config: pytext.models.representations.bilstm.BiLSTM.Config, num_embeddings: int, word_embed_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, unk_token_idx: int = 0, padding_idx: Optional[int] = None, vocab: Optional[List[str]] = None)[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
An embedding module represents a sequence of sentences
Parameters: - lstm_config (BiLSTM.Config) – config of the lstm layer
- num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- unk_token_idx (int) – Index of UNK token in the word vocabulary.
-
forward
(seq_token_idx, seq_token_count)[source]¶ Parameters: - seq_token_idx – shape [batch_size * max_seq_len * max_token_count]
- seq_token_count – shape [batch_size * max_seq_len]
Returns: shape (batch_size * max_seq_len * output_dim)
Return type: embedding
-
classmethod
from_config
(config: pytext.models.embeddings.word_seq_embedding.WordSeqEmbedding.Config, tensorizer: pytext.data.tensorizers.Tensorizer = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of WordEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (WordSeqEmbedding.Config) – Configuration object specifying all the
- of WordEmbedding. (parameters) –
Returns: An instance of WordSeqEmbedding.
Return type: type
-
class
pytext.models.embeddings.
MLPEmbedding
(embedding_dim: int = 300, embeddings_weight: Optional[torch.Tensor] = None, init_range: Optional[List[int]] = None, init_std: Optional[float] = None, mlp_layer_dims: List[int] = ())[source]¶ Bases:
pytext.models.embeddings.embedding_base.EmbeddingBase
An MLP embedding wrapper module around torch.nn.Embedding to add transformations for float tensors.
Parameters: - num_embeddings (int) – Total number of words/tokens (vocabulary size).
- embedding_dim (int) – Size of embedding vector.
- embeddings_weight (torch.Tensor) – Pretrained weights to initialize the embedding table with.
- init_range (List[int]) – Range of uniform distribution to initialize the weights with if embeddings_weight is None.
- mlp_layer_dims (List[int]) – List of layer dimensions (if any) to add on top of the embedding lookup.
-
forward
(input)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
classmethod
from_config
(config: pytext.config.field_config.MLPFeatConfig, metadata: Optional[pytext.fields.field.FieldMeta] = None, tensorizer: Optional[pytext.data.tensorizers.Tensorizer] = None, init_from_saved_state: Optional[bool] = False)[source]¶ Factory method to construct an instance of MLPEmbedding from the module’s config object and the field’s metadata object.
Parameters: - config (MLPFeatConfig) – Configuration object specifying all the
- of MLPEmbedding. (parameters) –
- metadata (FieldMeta) – Object containing this field’s metadata.
Returns: An instance of MLPEmbedding.
Return type: type