pytext.models package

Subpackages

Submodules

pytext.models.bert_classification_models module

class pytext.models.bert_classification_models.BertPairwiseModel(encoder1, encoder2, decoder, output_layer, encode_relations, shared_encoder)[source]

Bases: pytext.models.bert_classification_models._EncoderPairwiseModel

Bert Pairwise classification model

The model takes two sets of tokens (left and right) and calculates their representations separately using shared BERT encoder. The final prediction can be the cosine similarity of the embeddings, or if encoder_relations is specified the concatenation of the embeddings, their absolute difference, and elementwise product.

class pytext.models.bert_classification_models.NewBertModel(encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.bert_classification_models._EncoderBaseModel

BERT single sentence classification.

pytext.models.bert_regression_model module

class pytext.models.bert_regression_model.BertPairwiseRegressionModel(encoder1, encoder2, decoder, output_layer, encode_relations, shared_encoder)[source]

Bases: pytext.models.bert_classification_models.BertPairwiseModel

Two-tower model for regression. Encode two texts separately and use the cosine similarity between sentence embeddings to predict regression label.

class pytext.models.bert_regression_model.NewBertRegressionModel(encoder, decoder, output_layer)[source]

Bases: pytext.models.bert_classification_models.NewBertModel

BERT single sentence (or concatenated sentences) regression.

classmethod from_config(config: pytext.models.bert_regression_model.NewBertRegressionModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]

pytext.models.crf module

class pytext.models.crf.CRF(num_tags: int, ignore_index: int, default_label_pad_index: int)[source]

Bases: torch.nn.modules.module.Module

Compute the log-likelihood of the input assuming a conditional random field model.

Parameters:num_tags – The number of tags
decode(emissions: torch.Tensor, seq_lens: torch.Tensor) → torch.Tensor[source]

Given a set of emission probabilities, return the predicted tags.

Parameters:
  • emissions – Emission probabilities with expected shape of batch_size * seq_len * num_labels
  • seq_lens – Length of each input.
export_to_caffe2(workspace, init_net, predict_net, logits_output_name)[source]

Exports the crf layer to caffe2 by manually adding the necessary operators to the init_net and predict net.

Parameters:
  • init_net – caffe2 init net created by the current graph
  • predict_net – caffe2 net created by the current graph
  • workspace – caffe2 current workspace
  • output_names – current output names of the caffe2 net
  • py_model – original pytorch model object
Returns:

The updated predictions blob name

Return type:

string

forward(emissions: torch.Tensor, tags: torch.Tensor, reduce: bool = True) → torch.Tensor[source]

Compute log-likelihood of input.

Parameters:
  • emissions – Emission values for different tags for each input. The expected shape is batch_size * seq_len * num_labels. Padding is should be on the right side of the input.
  • tags – Actual tags for each token in the input. Expected shape is batch_size * seq_len
get_transitions()[source]
reset_parameters() → None[source]
set_transitions(transitions: torch.Tensor = None)[source]

pytext.models.disjoint_multitask_model module

class pytext.models.disjoint_multitask_model.DisjointMultitaskModel(models, loss_weights)[source]

Bases: pytext.models.model.Model

Wrapper model to train multiple PyText models that share parameters. Designed to be used for multi-tasking when the tasks have disjoint datasets.

Modules which have the same shared_module_key and type share parameters. Only need to configure the first such module in full in each case.

Parameters:models (type) – Dictionary of models of sub-tasks.
current_model

Current model to route the input batch to.

Type:type
contextualize(context)[source]

Add additional context into model. context can be anything that helps maintaining/updating state. For example, it is used by DisjointMultitaskModel for changing the task that should be trained with a given iterator.

current_model
forward(*inputs) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_loss(logits, targets, context)[source]
get_pred(logits, targets=None, context=None, *args)[source]
save_modules(base_path, suffix='')[source]

Save each sub-module in separate files for reusing later.

class pytext.models.disjoint_multitask_model.NewDisjointMultitaskModel(models, loss_weights)[source]

Bases: pytext.models.disjoint_multitask_model.DisjointMultitaskModel

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]

pytext.models.distributed_model module

class pytext.models.distributed_model.DistributedModel(*args, **kwargs)[source]

Bases: torch.nn.parallel.distributed.DistributedDataParallel

Wrapper model class to train models in distributed data parallel manner. The way to use this class to train your module in distributed manner is:

distributed_model = DistributedModel(
    module=model,
    device_ids=[device_id0, device_id1],
    output_device=device_id0,
    broadcast_buffers=False,
)

where, model is the object of the actual model class you want to train in distributed manner.

cpu()[source]

Moves all model parameters and buffers to the CPU.

Returns:self
Return type:Module
eval(stage=<Stage.TEST: 'Test'>)[source]

Override to set stage

load_state_dict(*args, **kwargs)[source]

Copies parameters and buffers from state_dict into this module and its descendants. If strict is True, then the keys of state_dict must exactly match the keys returned by this module’s state_dict() function.

Parameters:
  • state_dict (dict) – a dict containing parameters and persistent buffers.
  • strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: True
Returns:

  • missing_keys is a list of str containing the missing keys
  • unexpected_keys is a list of str containing the unexpected keys

Return type:

NamedTuple with missing_keys and unexpected_keys fields

state_dict(*args, **kwargs)[source]

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names.

Returns:a dictionary containing a whole state of the module
Return type:dict

Example:

>>> module.state_dict().keys()
['bias', 'weight']
train(mode=True)[source]

Override to set stage

pytext.models.doc_model module

class pytext.models.doc_model.ByteTokensDocumentModel(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase)[source]

Bases: pytext.models.doc_model.DocModel

DocModel that receives both word IDs and byte IDs as inputs (concatenating word and byte-token embeddings to represent input tokens).

arrange_model_inputs(tensor_dict)[source]
classmethod create_embedding(config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
get_export_input_names(tensorizers)[source]
torchscriptify(tensorizers, traced_model)[source]
class pytext.models.doc_model.DocModel(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase)[source]

Bases: pytext.models.model.Model

DocModel that’s compatible with the new Model abstraction, which is responsible for describing which inputs it expects and arranging its input tensors.

arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
classmethod create_decoder(config: pytext.models.doc_model.DocModel.Config, representation_dim: int, num_labels: int)[source]
classmethod create_embedding(config: pytext.models.doc_model.DocModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
classmethod create_output_layer(config: pytext.models.doc_model.DocModel.Config, labels: pytext.data.tensorizers.VocabConfig)[source]
classmethod from_config(config: pytext.models.doc_model.DocModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
get_export_input_names(tensorizers)[source]
get_export_output_names(tensorizers)[source]
get_num_examples_from_batch(tensor_dict)[source]
torchscriptify(tensorizers, traced_model)[source]
vocab_to_export(tensorizers)[source]
class pytext.models.doc_model.DocRegressionModel(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase)[source]

Bases: pytext.models.doc_model.DocModel

Model that’s compatible with the new Model abstraction, and is configured for regression tasks (specifically for labels, predictions, and loss).

classmethod from_config(config: pytext.models.doc_model.DocRegressionModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
class pytext.models.doc_model.PersonalizedDocModel(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase, user_embedding: Optional[pytext.models.embeddings.embedding_base.EmbeddingBase] = None)[source]

Bases: pytext.models.doc_model.DocModel

DocModel that includes a user embedding which learns user features to produce personalized prediction. In this class, user-embedding is fed directly to the decoder (i.e., does not go through the encoders).

arrange_model_inputs(tensor_dict)[source]
forward(*inputs) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
get_export_input_names(tensorizers)[source]
torchscriptify(tensorizers, traced_model)[source]
vocab_to_export(tensorizers)[source]

pytext.models.joint_model module

class pytext.models.joint_model.IntentSlotModel(default_doc_loss_weight, default_word_loss_weight, *args, **kwargs)[source]

Bases: pytext.models.model.Model

A joint intent-slot model. This is framed as a model to do document classification model and word tagging tasks where the embedding and text representation layers are shared for both tasks.

The supported representation layers are based on bidirectional LSTM or CNN.

It can be instantiated just like any other Model.

This is in the new data handling design involving tensorizers; that is the difference between this and JointModel

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
classmethod create_embedding(config, tensorizers)[source]
classmethod from_config(config, tensorizers)[source]
get_export_input_names(tensorizers)[source]
get_export_output_names(tensorizers)[source]
get_weights_context(tensor_dict)[source]
vocab_to_export(tensorizers)[source]

pytext.models.masked_lm module

class pytext.models.masked_lm.MaskedLanguageModel(encoder: pytext.models.representations.transformer_sentence_encoder_base.TransformerSentenceEncoderBase, decoder: pytext.models.decoders.mlp_decoder.MLPDecoder, output_layer: pytext.models.output_layers.lm_output_layer.LMOutputLayer, token_tensorizer: pytext.data.bert_tensorizer.BERTTensorizerBase, vocab: pytext.data.utils.Vocabulary, mask_prob: float = 0.15, mask_bos: float = False, masking_strategy: pytext.models.masking_utils.MaskingStrategy = <MaskingStrategy.RANDOM: 'random'>, stage: pytext.common.constants.Stage = <Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.model.BaseModel

Masked language model for BERT style pre-training.

SUPPORT_FP16_OPTIMIZER = True
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
forward(*inputs) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.masked_lm.MaskedLanguageModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]

pytext.models.masking_utils module

class pytext.models.masking_utils.MaskingStrategy[source]

Bases: enum.Enum

An enumeration.

FREQUENCY = 'frequency_based'
RANDOM = 'random'
pytext.models.masking_utils.frequency_based_masking(tokens: None._VariableFunctionsClass.tensor, token_sampling_weights: numpy.ndarray, mask_prob: float) → torch.Tensor[source]

Function to mask tokens based on frequency.

Inputs:
  1. tokens: Tensor with token ids of shape (batch_size x seq_len)
  2. token_sampling_weights: numpy array with shape (batch_size x seq_len)
    and each element representing the sampling weight assicated with the corresponding token in tokens
  3. mask_prob: Probability of masking a particular token
Outputs:
mask: Tensor with same shape as input tokens (batch_size x seq_len)
with masked tokens represented by a 1 and everything else as 0.
pytext.models.masking_utils.random_masking(tokens: None._VariableFunctionsClass.tensor, mask_prob: float) → torch.Tensor[source]

Function to mask tokens randomly.

Inputs:
  1. tokens: Tensor with token ids of shape (batch_size x seq_len)
  2. mask_prob: Probability of masking a particular token
Outputs:
mask: Tensor with same shape as input tokens (batch_size x seq_len)
with masked tokens represented by a 1 and everything else as 0.

pytext.models.model module

class pytext.models.model.BaseModel(stage: pytext.common.constants.Stage = <Stage.TRAIN: 'Training'>)[source]

Bases: torch.nn.modules.module.Module, pytext.config.component.Component

Base model class which inherits from nn.Module. Also has a stage flag to indicate it’s in train, eval, or test stage. This is because the built-in train/eval flag in PyTorch can’t distinguish eval and test, which is required to support some use cases.

SUPPORT_FP16_OPTIMIZER = False
arrange_caffe2_model_inputs(tensor_dict)[source]

Generate inputs for exported caffe2 model, default behavior is flatten the input tuples

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
contextualize(context)[source]

Add additional context into model. context can be anything that helps maintaining/updating state. For example, it is used by DisjointMultitaskModel for changing the task that should be trained with a given iterator.

eval(stage=<Stage.TEST: 'Test'>)[source]

Override to explicitly maintain the stage (train, eval, test).

get_loss(logit, target, context)[source]
get_num_examples_from_batch(batch)[source]
get_pred(logit, target=None, context=None, *args)[source]
onnx_trace_input(tensor_dict)[source]
prepare_for_onnx_export_(**kwargs)[source]

Make model exportable via ONNX trace.

quantize()[source]

Quantize the model during export.

save_modules(base_path: str = '', suffix: str = '')[source]

Save each sub-module in separate files for reusing later.

trace(inputs)[source]
train(mode=True)[source]

Override to explicitly maintain the stage (train, eval, test).

classmethod train_batch(model, batch, state=None)[source]
class pytext.models.model.Model(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase)[source]

Bases: pytext.models.model.BaseModel

Generic single-task model class that expects four components:

  1. Embedding
  2. Representation
  3. Decoder
  4. Output Layer

Forward pass: embedding -> representation -> decoder -> output_layer

These four components have specific responsibilities as described below.

Embedding layer should implement the way to represent each token in the input text. It can be as simple as just token/word embedding or can be composed of multiple ways to represent a token, e.g., word embedding, character embedding, etc.

Representation layer should implement the way to encode the entire input text such that the output vector(s) can be used by decoder to produce logits. There is no restriction on the number of inputs it should encode. There is also not restriction on the number of ways to encode input.

Decoder layer should implement the way to consume the output of model’s representation and produce logits that can be used by the output layer to compute loss or generate predictions (and prediction scores/confidence)

Output layer should implement the way loss computation is done as well as the logic to generate predictions from the logits.

Let us discuss the joint intent-slot model as a case to go over these layers. The model predicts intent of input utterance and the slots in the utterance. (Refer to Train Intent-Slot model on ATIS Dataset for details about intent-slot model.)

  1. EmbeddingList layer is tasked with representing tokens. To do so we can use learnable word embedding table in conjunction with learnable character embedding table that are distilled to token level representation using CNN and pooling. Note: This class is meant to be reused by all models. It acts as a container of all the different ways of representing a token/word.
  2. BiLSTMDocSlotAttention is tasked with encoding the embedded input string for intent classification and slot filling. In order to do that it has a shared bidirectional LSTM layer followed by separate attention layers for document level attention and word level attention. Finally it produces two vectors per utterance.
  3. IntentSlotModelDecoder accepts the two input vectors from BiLSTMDocSlotAttention and produces logits for intent classification and slot filling. Conditioned on a flag it can also use the probabilities from intent classification for slot filling.
  4. IntentSlotOutputLayer implements the logic behind computing loss and prediction, as well as, how to export this layer to export to Caffe2. This is used by model exporter as a post-processing Caffe2 operator.
Parameters:
  • embedding (EmbeddingBase) – Description of parameter embedding.
  • representation (RepresentationBase) – Description of parameter representation.
  • decoder (DecoderBase) – Description of parameter decoder.
  • output_layer (OutputLayerBase) – Description of parameter output_layer.
embedding
representation
decoder
output_layer
classmethod compose_embedding(sub_emb_module_dict: Dict[str, pytext.models.embeddings.embedding_base.EmbeddingBase], metadata) → pytext.models.embeddings.embedding_list.EmbeddingList[source]

Default implementation is to compose an instance of EmbeddingList with all the sub-embedding modules. You should override this class method if you want to implement a specific way to embed tokens/words.

Parameters:sub_emb_module_dict (Dict[str, EmbeddingBase]) – Named dictionary of embedding modules each of which implement a way to embed/encode a token.
Returns:An instance of EmbeddingList.
Return type:EmbeddingList
classmethod create_embedding(feat_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata)[source]
classmethod create_sub_embs(emb_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata) → Dict[str, pytext.models.embeddings.embedding_base.EmbeddingBase][source]

Creates the embedding modules defined in the emb_config.

Parameters:
  • emb_config (FeatureConfig) – Object containing all the sub-embedding configurations.
  • metadata (CommonMetadata) – Object containing features and label metadata.
Returns:

Named dictionary of embedding modules.

Return type:

Dict[str, EmbeddingBase]

forward(*inputs) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.model.Model.Config, feat_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata)[source]
class pytext.models.model.ModelInputBase(**kwargs)[source]

Bases: pytext.config.pytext_config.ConfigBase

Base class for model inputs.

class pytext.models.model.ModelInputMeta[source]

Bases: pytext.config.pytext_config.ConfigBaseMeta

pytext.models.module module

class pytext.models.module.Module(config=None)[source]

Bases: torch.nn.modules.module.Module, pytext.config.component.Component

Generic module class that serves as base class for all PyText modules.

Parameters:config (type) – Module’s config object. Specific contents of this object depends on the module. Defaults to None.
freeze() → None[source]
pytext.models.module.create_module(module_config, *args, create_fn=<function _create_module_from_registry>, **kwargs)[source]

Create module object given the module’s config object. It depends on the global shared module registry. Hence, your module must be available for the registry. This entails that your module must be imported somewhere in the code path during module creation (ideally in your model class) for the module to be visible for registry.

Parameters:
  • module_config (type) – Module config object.
  • create_fn (type) – The function to use for creating the module. Use this parameter if your module creation requires custom code and pass your function here. Defaults to _create_module_from_registry().
Returns:

Description of returned object.

Return type:

type

pytext.models.pair_classification_model module

class pytext.models.pair_classification_model.BasePairwiseModel(decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase, encode_relations: bool)[source]

Bases: pytext.models.model.BaseModel

A base classification model that scores a pair of texts.

Subclasses need to implement the from_config, forward and save_modules.

forward(input1: Tuple[torch.Tensor, ...], input2: Tuple[torch.Tensor, ...])[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.pair_classification_model.BasePairwiseModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
save_modules(base_path: str = '', suffix: str = '')[source]

Save each sub-module in separate files for reusing later.

class pytext.models.pair_classification_model.PairwiseModel(embeddings: torch.nn.modules.container.ModuleList, representations: torch.nn.modules.container.ModuleList, decoder: pytext.models.decoders.mlp_decoder.MLPDecoder, output_layer: pytext.models.output_layers.doc_classification_output_layer.ClassificationOutputLayer, encode_relations: bool, shared_representations: bool)[source]

Bases: pytext.models.pair_classification_model.BasePairwiseModel

A classification model that scores a pair of texts, for example, a model for natural language inference.

The model shares embedding space (so it doesn’t support pairs of texts where left and right are in different languages). It uses bidirectional LSTM or CNN to represent the two documents, and concatenates them along with their absolute difference and elementwise product. This concatenated pair representation is passed to a multi-layer perceptron to decode to label/target space.

See https://arxiv.org/pdf/1705.02364.pdf for more details.

It can be instantiated just like any other Model.

EMBEDDINGS = ['embedding']
INPUTS_PAIR = [['tokens1'], ['tokens2']]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
forward(input1: Tuple[torch.Tensor, ...], input2: Tuple[torch.Tensor, ...]) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.pair_classification_model.PairwiseModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
save_modules(base_path: str = '', suffix: str = '')[source]

Save each sub-module in separate files for reusing later.

pytext.models.query_document_pairwise_ranking_model module

class pytext.models.query_document_pairwise_ranking_model.QueryDocPairwiseRankingModel(embeddings: torch.nn.modules.container.ModuleList, representations: torch.nn.modules.container.ModuleList, decoder: pytext.models.decoders.mlp_decoder.MLPDecoder, output_layer: pytext.models.output_layers.doc_classification_output_layer.ClassificationOutputLayer, encode_relations: bool, shared_representations: bool)[source]

Bases: pytext.models.pair_classification_model.PairwiseModel

Pairwise ranking model This model takes in a query, and two responses (pos_response and neg_response) It passes representations of the query and the two responses to a decoder pos_response should be ranked higher than neg_response - this is ensured by training with a ranking hinge loss function

arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
forward(pos_response: Tuple[torch.Tensor, torch.Tensor], neg_response: Tuple[torch.Tensor, torch.Tensor], query: Tuple[torch.Tensor, torch.Tensor]) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.query_document_pairwise_ranking_model.QueryDocPairwiseRankingModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
get_num_examples_from_batch(tensor_dict)[source]

pytext.models.r3f_models module

class pytext.models.r3f_models.R3FConfigOptions(**kwargs)[source]

Bases: pytext.config.pytext_config.ConfigBase

Configuration options for models using R3F

eps = 1e-05
noise_type = 'uniform'
r3f_default_lambda = 0.5
r3f_lambda_by_loss = {}
class pytext.models.r3f_models.R3FNoiseContextManager(context)[source]

Bases: contextlib.AbstractContextManager

Context manager that adds a forward hook to the embedding module, to insert noise into the model and detatch embedding when doing this pass

class pytext.models.r3f_models.R3FNoiseType[source]

Bases: enum.Enum

An enumeration.

NORMAL = 'normal'
UNIFORM = 'uniform'
class pytext.models.r3f_models.R3FPyTextMixin(config: pytext.models.r3f_models.R3FConfigOptions)[source]

Bases: object

Mixin class for applying the R3F method, to apply R3F with any model inherit the class and implement the abstract functions.

For more details: https://arxiv.org/abs/2008.03156

forward(*args, use_r3f: bool = False, **kwargs)[source]
forward_with_noise(*args, **kwargs)[source]
get_embedding_module(*args, **kwargs)[source]

Given the core model outputs, this returns the embedding module that is used for the R3F loss, in particular noise will be injected to this module.

get_r3f_loss_terms(model_outputs, noise_model_outputs, sample_size: int) → torch.Tensor[source]

Computes the auxillary loss for R3F, in particular computes a symmetric KL divergence between the result from the input embedding and the noise input embedding.

get_r3f_model_output(model_output)[source]

Extracts the output from the model.forward() call that is used for the r3f loss term

get_sample_size(model_inputs, targets)[source]

Gets the sample size of the model that is used as a regularization factor to the model itself

original_forward(*args, **kwargs)[source]

Runs the traditional forward of this model

classmethod train_batch(model, batch, state=None)[source]

Runs training over a batch with the R3F method, training will use R3F while eval and test do not.

pytext.models.r3f_models.build_noise_sampler(noise_type: pytext.models.r3f_models.R3FNoiseType, eps: float)[source]

Given a noise_type (R3FNoiseType): builds a torch.distribution capable of generating noise within the passed in eps (float).

pytext.models.r3f_models.compute_symmetric_kl(noised_logits, input_logits)[source]

Computes symmetric KL loss by taking the KL for both the input logits and the noised logits and comparing the two

pytext.models.roberta module

class pytext.models.roberta.RoBERTa(encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.bert_classification_models.NewBertModel

graph_mode_quantize(inputs, data_loader, calibration_num_batches=64, qconfig_dict=None, force_quantize=False)[source]

Quantize the model during export with graph mode quantization.

torchscriptify(tensorizers, traced_model)[source]

Using the traced model, create a ScriptModule which has a nicer API that includes generating tensors from simple data types, and returns classified values according to the output layer (eg. as a dict mapping class name to score)

trace(inputs)[source]
class pytext.models.roberta.RoBERTaEncoder(config: pytext.models.roberta.RoBERTaEncoder.Config, output_encoded_layers: bool, **kwarg)[source]

Bases: pytext.models.roberta.RoBERTaEncoderBase

A PyTorch RoBERTa implementation

forward(input_tuple: Tuple[torch.Tensor, ...], *args) → Tuple[torch.Tensor, ...][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pytext.models.roberta.RoBERTaEncoderBase(config: pytext.models.representations.transformer_sentence_encoder_base.TransformerSentenceEncoderBase.Config, output_encoded_layers=False, *args, **kwargs)[source]

Bases: pytext.models.representations.transformer_sentence_encoder_base.TransformerSentenceEncoderBase

class pytext.models.roberta.RoBERTaEncoderJit(config: pytext.models.roberta.RoBERTaEncoderJit.Config, output_encoded_layers: bool, **kwarg)[source]

Bases: pytext.models.roberta.RoBERTaEncoderBase

A TorchScript RoBERTa implementation

class pytext.models.roberta.RoBERTaR3F(encoder, decoder, output_layer, r3f_options, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.roberta.RoBERTa, pytext.models.r3f_models.R3FPyTextMixin

forward(*args, use_r3f: bool = False, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_embedding_module(*args, **kwargs)[source]

Given the core model outputs, this returns the embedding module that is used for the R3F loss, in particular noise will be injected to this module.

get_sample_size(model_inputs, targets)[source]

Gets the sample size of the model that is used as a regularization factor to the model itself

original_forward(*args, **kwargs)[source]

Runs the traditional forward of this model

classmethod train_batch(model, batch, state=None)[source]

Runs training over a batch with the R3F method, training will use R3F while eval and test do not.

class pytext.models.roberta.RoBERTaRegression(encoder, decoder, output_layer)[source]

Bases: pytext.models.bert_regression_model.NewBertRegressionModel

torchscriptify(tensorizers, traced_model)[source]

Using the traced model, create a ScriptModule which has a nicer API that includes generating tensors from simple data types, and returns classified values according to the output layer (eg. as a dict mapping class name to score)

class pytext.models.roberta.RoBERTaWordTaggingModel(encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.model.BaseModel

Single Sentence Token-level Classification Model using XLM.

arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
forward(encoder_inputs: Tuple[torch.Tensor, ...], *args) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.roberta.RoBERTaWordTaggingModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
class pytext.models.roberta.SELFIE(encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.roberta.RoBERTa

forward(encoder_inputs: Tuple[torch.Tensor, ...], *args) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

pytext.models.roberta.init_params(module)[source]

Initialize the RoBERTa weights for pre-training from scratch.

pytext.models.two_tower_classification_model module

class pytext.models.two_tower_classification_model.TwoTowerClassificationModel(right_encoder, left_encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.model.BaseModel

SUPPORT_FP16_OPTIMIZER = True
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
forward(right_encoder_inputs: Tuple[torch.Tensor, ...], left_encoder_inputs: Tuple[torch.Tensor, ...], *args) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.two_tower_classification_model.TwoTowerClassificationModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
graph_mode_quantize(inputs, data_loader, calibration_num_batches=64)[source]

Quantize the model during export with graph mode quantization for linformer encoder.

torchscriptify(tensorizers, traced_model)[source]

Using the traced model, create a ScriptModule which has a nicer API that includes generating tensors from simple data types, and returns classified values according to the output layer (eg. as a dict mapping class name to score)

trace(inputs)[source]

pytext.models.utils module

pytext.models.utils.normalize_embeddings(embeddings: torch.Tensor)[source]

pytext.models.word_model module

class pytext.models.word_model.WordTaggingLiteModel(*args, **kwargs)[source]

Bases: pytext.models.word_model.WordTaggingModel

Also a word tagging model, but uses bytes as inputs to the model. Using bytes instead of words, the model does not need to store a word embedding table mapping words in the vocab to their embedding vector representations, but instead compute them on the fly using CharacterEmbedding. This produces an exported/serialized model that requires much less storage space as well as less memory during run/inference time.

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
classmethod create_embedding(config, tensorizers)[source]
get_export_input_names(tensorizers)[source]
torchscriptify(tensorizers, traced_model)[source]
vocab_to_export(tensorizers)[source]
class pytext.models.word_model.WordTaggingModel(*args, **kwargs)[source]

Bases: pytext.models.model.Model

Word tagging model. It can be used for any task that requires predicting the tag for a word/token. For example, the following tasks can be modeled as word tagging tasks. This is not an exhaustive list. 1. Part of speech tagging. 2. Named entity recognition. 3. Slot filling for task oriented dialog.

It can be instantiated just like any other Model.

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
classmethod create_embedding(config, tensorizers)[source]
classmethod from_config(config, tensorizers)[source]
get_export_input_names(tensorizers)[source]
get_export_output_names(tensorizers)[source]
torchscriptify(tensorizers, traced_model)[source]
vocab_to_export(tensorizers)[source]

Module contents

class pytext.models.Model(embedding: pytext.models.embeddings.embedding_base.EmbeddingBase, representation: pytext.models.representations.representation_base.RepresentationBase, decoder: pytext.models.decoders.decoder_base.DecoderBase, output_layer: pytext.models.output_layers.output_layer_base.OutputLayerBase)[source]

Bases: pytext.models.model.BaseModel

Generic single-task model class that expects four components:

  1. Embedding
  2. Representation
  3. Decoder
  4. Output Layer

Forward pass: embedding -> representation -> decoder -> output_layer

These four components have specific responsibilities as described below.

Embedding layer should implement the way to represent each token in the input text. It can be as simple as just token/word embedding or can be composed of multiple ways to represent a token, e.g., word embedding, character embedding, etc.

Representation layer should implement the way to encode the entire input text such that the output vector(s) can be used by decoder to produce logits. There is no restriction on the number of inputs it should encode. There is also not restriction on the number of ways to encode input.

Decoder layer should implement the way to consume the output of model’s representation and produce logits that can be used by the output layer to compute loss or generate predictions (and prediction scores/confidence)

Output layer should implement the way loss computation is done as well as the logic to generate predictions from the logits.

Let us discuss the joint intent-slot model as a case to go over these layers. The model predicts intent of input utterance and the slots in the utterance. (Refer to Train Intent-Slot model on ATIS Dataset for details about intent-slot model.)

  1. EmbeddingList layer is tasked with representing tokens. To do so we can use learnable word embedding table in conjunction with learnable character embedding table that are distilled to token level representation using CNN and pooling. Note: This class is meant to be reused by all models. It acts as a container of all the different ways of representing a token/word.
  2. BiLSTMDocSlotAttention is tasked with encoding the embedded input string for intent classification and slot filling. In order to do that it has a shared bidirectional LSTM layer followed by separate attention layers for document level attention and word level attention. Finally it produces two vectors per utterance.
  3. IntentSlotModelDecoder accepts the two input vectors from BiLSTMDocSlotAttention and produces logits for intent classification and slot filling. Conditioned on a flag it can also use the probabilities from intent classification for slot filling.
  4. IntentSlotOutputLayer implements the logic behind computing loss and prediction, as well as, how to export this layer to export to Caffe2. This is used by model exporter as a post-processing Caffe2 operator.
Parameters:
  • embedding (EmbeddingBase) – Description of parameter embedding.
  • representation (RepresentationBase) – Description of parameter representation.
  • decoder (DecoderBase) – Description of parameter decoder.
  • output_layer (OutputLayerBase) – Description of parameter output_layer.
embedding
representation
decoder
output_layer
classmethod compose_embedding(sub_emb_module_dict: Dict[str, pytext.models.embeddings.embedding_base.EmbeddingBase], metadata) → pytext.models.embeddings.embedding_list.EmbeddingList[source]

Default implementation is to compose an instance of EmbeddingList with all the sub-embedding modules. You should override this class method if you want to implement a specific way to embed tokens/words.

Parameters:sub_emb_module_dict (Dict[str, EmbeddingBase]) – Named dictionary of embedding modules each of which implement a way to embed/encode a token.
Returns:An instance of EmbeddingList.
Return type:EmbeddingList
classmethod create_embedding(feat_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata)[source]
classmethod create_sub_embs(emb_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata) → Dict[str, pytext.models.embeddings.embedding_base.EmbeddingBase][source]

Creates the embedding modules defined in the emb_config.

Parameters:
  • emb_config (FeatureConfig) – Object containing all the sub-embedding configurations.
  • metadata (CommonMetadata) – Object containing features and label metadata.
Returns:

Named dictionary of embedding modules.

Return type:

Dict[str, EmbeddingBase]

forward(*inputs) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.model.Model.Config, feat_config: pytext.config.field_config.FeatureConfig, metadata: pytext.data.data_handler.CommonMetadata)[source]
class pytext.models.BaseModel(stage: pytext.common.constants.Stage = <Stage.TRAIN: 'Training'>)[source]

Bases: torch.nn.modules.module.Module, pytext.config.component.Component

Base model class which inherits from nn.Module. Also has a stage flag to indicate it’s in train, eval, or test stage. This is because the built-in train/eval flag in PyTorch can’t distinguish eval and test, which is required to support some use cases.

SUPPORT_FP16_OPTIMIZER = False
arrange_caffe2_model_inputs(tensor_dict)[source]

Generate inputs for exported caffe2 model, default behavior is flatten the input tuples

arrange_model_context(tensor_dict)[source]
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
contextualize(context)[source]

Add additional context into model. context can be anything that helps maintaining/updating state. For example, it is used by DisjointMultitaskModel for changing the task that should be trained with a given iterator.

eval(stage=<Stage.TEST: 'Test'>)[source]

Override to explicitly maintain the stage (train, eval, test).

get_loss(logit, target, context)[source]
get_num_examples_from_batch(batch)[source]
get_pred(logit, target=None, context=None, *args)[source]
onnx_trace_input(tensor_dict)[source]
prepare_for_onnx_export_(**kwargs)[source]

Make model exportable via ONNX trace.

quantize()[source]

Quantize the model during export.

save_modules(base_path: str = '', suffix: str = '')[source]

Save each sub-module in separate files for reusing later.

trace(inputs)[source]
train(mode=True)[source]

Override to explicitly maintain the stage (train, eval, test).

classmethod train_batch(model, batch, state=None)[source]
class pytext.models.TwoTowerClassificationModel(right_encoder, left_encoder, decoder, output_layer, stage=<Stage.TRAIN: 'Training'>)[source]

Bases: pytext.models.model.BaseModel

SUPPORT_FP16_OPTIMIZER = True
arrange_model_inputs(tensor_dict)[source]
arrange_targets(tensor_dict)[source]
caffe2_export(tensorizers, tensor_dict, path, export_onnx_path=None)[source]
forward(right_encoder_inputs: Tuple[torch.Tensor, ...], left_encoder_inputs: Tuple[torch.Tensor, ...], *args) → List[torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

classmethod from_config(config: pytext.models.two_tower_classification_model.TwoTowerClassificationModel.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
graph_mode_quantize(inputs, data_loader, calibration_num_batches=64)[source]

Quantize the model during export with graph mode quantization for linformer encoder.

torchscriptify(tensorizers, traced_model)[source]

Using the traced model, create a ScriptModule which has a nicer API that includes generating tensors from simple data types, and returns classified values according to the output layer (eg. as a dict mapping class name to score)

trace(inputs)[source]