pytext.task package

Submodules

pytext.task.disjoint_multitask module

class pytext.task.disjoint_multitask.DisjointMultitask(target_task_name, exporters, **kwargs)[source]

Bases: pytext.task.task.TaskBase

Modules which have the same shared_module_key and type share parameters. Only the first instance of such module should be configured in tasks list.

export(multitask_model, export_path, metric_channels, export_onnx_path=None)[source]

Wrapper method to export PyTorch model to Caffe2 model using Exporter.

Parameters:
  • export_path (str) – file path of exported caffe2 model
  • metric_channels – output the PyTorch model’s execution graph to
  • export_onnx_path (str) – file path of exported onnx model
classmethod from_config(task_config: pytext.task.disjoint_multitask.DisjointMultitask.Config, metadata=None, model_state=None, tensorizers=None, rank=0, world_size=1)[source]

Create the task from config, and optionally load metadata/model_state This function will create components including DataHandler, Trainer, MetricReporter, Exporter, and wire them up.

Parameters:
  • task_config (Task.Config) – the config of the current task
  • metadata – saved global context of this task, e.g: vocabulary, will be generated by DataHandler if it’s None
  • model_state – saved model parameters, will be loaded into model when given
class pytext.task.disjoint_multitask.NewDisjointMultitask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task._NewTask

Multitask training based on underlying subtasks. To share parameters between modules from different tasks, specify the same shared_module_key. Only the first instance of each shared module should be configured in tasks list. Only the multitask trainer (not the per-task trainers) is used.

export(model, export_path, metric_channels=None, export_onnx_path=None)[source]

Wrapper method to export PyTorch model to Caffe2 model using Exporter.

Parameters:
  • export_path (str) – file path of exported caffe2 model
  • metric_channels (List[Channel]) – outputs of model’s execution graph
  • export_onnx_path (str) – file path of exported onnx model
classmethod from_config(task_config: pytext.task.disjoint_multitask.NewDisjointMultitask.Config, unused_metadata=None, model_state=None, tensorizers=None, rank=0, world_size=1)[source]

Create the task from config, and optionally load metadata/model_state This function will create components including DataHandler, Trainer, MetricReporter, Exporter, and wire them up.

Parameters:
  • task_config (Task.Config) – the config of the current task
  • metadata – saved global context of this task, e.g: vocabulary, will be generated by DataHandler if it’s None
  • model_state – saved model parameters, will be loaded into model when given
torchscript_export(model, export_path, quantize=False)[source]

pytext.task.new_task module

class pytext.task.new_task.NewTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task._NewTask

pytext.task.new_task.create_schema(tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer], extra_schema: Optional[Dict[str, Type[CT_co]]] = None) → Dict[str, Type[CT_co]][source]
pytext.task.new_task.create_tensorizers(model_inputs: Union[pytext.models.model.BaseModel.Config.ModelInput, Dict[str, pytext.data.tensorizers.Tensorizer.Config]]) → Dict[str, pytext.data.tensorizers.Tensorizer][source]

pytext.task.serialize module

class pytext.task.serialize.CheckpointManager[source]

Bases: object

CheckpointManager is class abstraction to manage training job’s checkpoints with different IO and storage, using two functions: save() and load().

DELIMITER = '-'
generate_checkpoint_path(config: pytext.config.pytext_config.PyTextConfig, identifier: str)[source]
get_latest_checkpoint_path() → str[source]

Return most recent saved checkpoint path in str Returns: checkpoint_path (str)

get_post_training_snapshot_path() → str[source]
list() → List[str][source]

Return all existing checkpoint path in str Returns: checkpoint_path_list (List[str]), list elements are in the same order of checkpoint saving

load(load_path: str, overwrite_config=None)[source]

Loads a checkpoint from disk. :param load_path: the file path to load for checkpoint :type load_path: str

Returns: task (Task), config (PyTextConfig) and training_state (TrainingState)

save(config: pytext.config.pytext_config.PyTextConfig, model: pytext.models.model.Model, meta: Optional[pytext.data.data_handler.CommonMetadata], tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer], training_state: Optional[pytext.trainers.training_state.TrainingState] = None, identifier: str = None) → str[source]

save a checkpoint to given path, config, model and training_state together represent the checkpoint. When identifier is None, this function is used to save post-training snapshot

pytext.task.serialize.get_latest_checkpoint_path(dir_path: Optional[str] = None) → str[source]

Get the latest checkpoint path :param dir_path: the dir to scan for existing checkpoint files. Default: if None, :param the latest checkpoint path saved in momery will be returned:

Returns: checkpoint_path

pytext.task.serialize.get_post_training_snapshot_path() → str[source]
pytext.task.serialize.load(load_path: str, overwrite_config=None)[source]

Load task, config and training state from a saved snapshot by default, it will construct the task using the saved config then load metadata and model state.

if overwrite_task is specified, it will construct the task using overwrite_task then load metadata and model state.

pytext.task.serialize.load_checkpoint(f: io.IOBase, overwrite_config=None)[source]
pytext.task.serialize.load_v1(state)[source]
pytext.task.serialize.load_v2(state)[source]
pytext.task.serialize.load_v3(state, overwrite_config=None)[source]
pytext.task.serialize.register_snapshot_loader(version)[source]
pytext.task.serialize.save(config: pytext.config.pytext_config.PyTextConfig, model: pytext.models.model.Model, meta: Optional[pytext.data.data_handler.CommonMetadata], tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer], training_state: Optional[pytext.trainers.training_state.TrainingState] = None, identifier: Optional[str] = None) → str[source]

Save all stateful information of a training task to a specified file-like object, will save the original config, model state, metadata, training state if training is not completed Args: identifier (str): used to identify a checkpoint within a training job, used as a suffix for save path config (PytextConfig): contains all raw parameter/hyper-parameters for training task model (Model): actual model in training training_state (TrainingState): stateful infomation during training Returns: identifier (str): if identifier is not specified, will save to config.save_snapshot_path to be consistent to post-training snapshot; if specified, will be used to save checkpoint during training, identifier is used to identify checkpoints in the same training

pytext.task.serialize.save_checkpoint(f: io.IOBase, config: pytext.config.pytext_config.PyTextConfig, model: pytext.models.model.Model, meta: Optional[pytext.data.data_handler.CommonMetadata], tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer], training_state: Optional[pytext.trainers.training_state.TrainingState] = None) → str[source]
pytext.task.serialize.set_checkpoint_manager(manager: pytext.task.serialize.CheckpointManager) → None[source]

pytext.task.task module

class pytext.task.task.TaskBase(trainer: pytext.trainers.trainer.Trainer, data_handler: pytext.data.data_handler.DataHandler, model: pytext.models.model.Model, metric_reporter: pytext.metric_reporters.metric_reporter.MetricReporter, exporter: Optional[pytext.exporters.exporter.ModelExporter])[source]

Bases: pytext.config.component.Component

Task is the central place to define and wire up components for data processing, model training, metric reporting, etc. Task class has a Config class containing the config of each component in a descriptive way.

export(model, export_path, metric_channels=None, export_onnx_path=None)[source]

Wrapper method to export PyTorch model to Caffe2 model using Exporter.

Parameters:
  • export_path (str) – file path of exported caffe2 model
  • metric_channels (List[Channel]) – outputs of model’s execution graph
  • export_onnx_path (str) – file path of exported onnx model
classmethod format_prediction(predictions, scores, context, target_meta)[source]

Format the prediction and score from model output, by default just return them in a dict

classmethod from_config(task_config, metadata=None, model_state=None, tensorizers=None, rank=1, world_size=0)[source]

Create the task from config, and optionally load metadata/model_state This function will create components including DataHandler, Trainer, MetricReporter, Exporter, and wire them up.

Parameters:
  • task_config (Task.Config) – the config of the current task
  • metadata – saved global context of this task, e.g: vocabulary, will be generated by DataHandler if it’s None
  • model_state – saved model parameters, will be loaded into model when given
predict(examples)[source]

Generates predictions using PyTorch model. The difference with test() is that this should be used when the the examples do not have any true label/target.

Parameters:examples – json format examples, input names should match the names specified in this task’s features config
test(test_path)[source]

Wrapper method to compute test metrics on holdout blind test dataset.

Parameters:test_path (str) – test data file path
train(train_config, rank=0, world_size=1, training_state=None)[source]

Wrapper method to train the model using Trainer object.

Parameters:
  • train_config (PyTextConfig) – config for training
  • rank (int) – for distributed training only, rank of the gpu, default is 0
  • world_size (int) – for distributed training only, total gpu to use, default is 1
class pytext.task.task.Task_Deprecated(trainer: pytext.trainers.trainer.Trainer, data_handler: pytext.data.data_handler.DataHandler, model: pytext.models.model.Model, metric_reporter: pytext.metric_reporters.metric_reporter.MetricReporter, exporter: Optional[pytext.exporters.exporter.ModelExporter])[source]

Bases: pytext.task.task.TaskBase

pytext.task.task.create_task(task_config, metadata=None, model_state=None, tensorizers=None, rank=0, world_size=1)[source]

Create a task by finding task class in registry and invoking the from_config function of the class, see from_config() for more details

pytext.task.tasks module

class pytext.task.tasks.BertPairRegressionTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.tasks.DocumentRegressionTask

class pytext.task.tasks.DocumentClassificationTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

classmethod format_prediction(predictions, scores, context, target_names)[source]

Format the prediction and score from model output, by default just return them in a dict

class pytext.task.tasks.DocumentRegressionTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.EnsembleTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

classmethod example_config()[source]
train_single_model(train_config, model_id, rank=0, world_size=1)[source]
class pytext.task.tasks.IntentSlotTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.LMTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.MaskedLMTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.NewBertClassificationTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.tasks.DocumentClassificationTask

class pytext.task.tasks.NewBertPairClassificationTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.tasks.DocumentClassificationTask

class pytext.task.tasks.PairwiseClassificationTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.QueryDocumentPairwiseRankingTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.RoBERTaNERTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

classmethod create_metric_reporter(config: pytext.task.tasks.RoBERTaNERTask.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]
class pytext.task.tasks.SemanticParsingTask(data: pytext.data.data.Data, model: pytext.models.semantic_parsers.rnng.rnng_parser.RNNGParser, metric_reporter: pytext.metric_reporters.compositional_metric_reporter.CompositionalMetricReporter, trainer: pytext.trainers.hogwild_trainer.HogwildTrainer)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.SeqNNTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.SquadQATask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

class pytext.task.tasks.WordTaggingTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task.NewTask

classmethod create_metric_reporter(config: pytext.task.tasks.WordTaggingTask.Config, tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer])[source]

Module contents

class pytext.task.NewTask(data: pytext.data.data.Data, model: pytext.models.model.BaseModel, metric_reporter: Optional[pytext.metric_reporters.metric_reporter.MetricReporter] = None, trainer: Optional[pytext.trainers.trainer.TaskTrainer] = None)[source]

Bases: pytext.task.new_task._NewTask

class pytext.task.Task_Deprecated(trainer: pytext.trainers.trainer.Trainer, data_handler: pytext.data.data_handler.DataHandler, model: pytext.models.model.Model, metric_reporter: pytext.metric_reporters.metric_reporter.MetricReporter, exporter: Optional[pytext.exporters.exporter.ModelExporter])[source]

Bases: pytext.task.task.TaskBase

class pytext.task.TaskBase(trainer: pytext.trainers.trainer.Trainer, data_handler: pytext.data.data_handler.DataHandler, model: pytext.models.model.Model, metric_reporter: pytext.metric_reporters.metric_reporter.MetricReporter, exporter: Optional[pytext.exporters.exporter.ModelExporter])[source]

Bases: pytext.config.component.Component

Task is the central place to define and wire up components for data processing, model training, metric reporting, etc. Task class has a Config class containing the config of each component in a descriptive way.

export(model, export_path, metric_channels=None, export_onnx_path=None)[source]

Wrapper method to export PyTorch model to Caffe2 model using Exporter.

Parameters:
  • export_path (str) – file path of exported caffe2 model
  • metric_channels (List[Channel]) – outputs of model’s execution graph
  • export_onnx_path (str) – file path of exported onnx model
classmethod format_prediction(predictions, scores, context, target_meta)[source]

Format the prediction and score from model output, by default just return them in a dict

classmethod from_config(task_config, metadata=None, model_state=None, tensorizers=None, rank=1, world_size=0)[source]

Create the task from config, and optionally load metadata/model_state This function will create components including DataHandler, Trainer, MetricReporter, Exporter, and wire them up.

Parameters:
  • task_config (Task.Config) – the config of the current task
  • metadata – saved global context of this task, e.g: vocabulary, will be generated by DataHandler if it’s None
  • model_state – saved model parameters, will be loaded into model when given
predict(examples)[source]

Generates predictions using PyTorch model. The difference with test() is that this should be used when the the examples do not have any true label/target.

Parameters:examples – json format examples, input names should match the names specified in this task’s features config
test(test_path)[source]

Wrapper method to compute test metrics on holdout blind test dataset.

Parameters:test_path (str) – test data file path
train(train_config, rank=0, world_size=1, training_state=None)[source]

Wrapper method to train the model using Trainer object.

Parameters:
  • train_config (PyTextConfig) – config for training
  • rank (int) – for distributed training only, rank of the gpu, default is 0
  • world_size (int) – for distributed training only, total gpu to use, default is 1
pytext.task.save(config: pytext.config.pytext_config.PyTextConfig, model: pytext.models.model.Model, meta: Optional[pytext.data.data_handler.CommonMetadata], tensorizers: Dict[str, pytext.data.tensorizers.Tensorizer], training_state: Optional[pytext.trainers.training_state.TrainingState] = None, identifier: Optional[str] = None) → str[source]

Save all stateful information of a training task to a specified file-like object, will save the original config, model state, metadata, training state if training is not completed Args: identifier (str): used to identify a checkpoint within a training job, used as a suffix for save path config (PytextConfig): contains all raw parameter/hyper-parameters for training task model (Model): actual model in training training_state (TrainingState): stateful infomation during training Returns: identifier (str): if identifier is not specified, will save to config.save_snapshot_path to be consistent to post-training snapshot; if specified, will be used to save checkpoint during training, identifier is used to identify checkpoints in the same training

pytext.task.load(load_path: str, overwrite_config=None)[source]

Load task, config and training state from a saved snapshot by default, it will construct the task using the saved config then load metadata and model state.

if overwrite_task is specified, it will construct the task using overwrite_task then load metadata and model state.

pytext.task.create_task(task_config, metadata=None, model_state=None, tensorizers=None, rank=0, world_size=1)[source]

Create a task by finding task class in registry and invoking the from_config function of the class, see from_config() for more details

pytext.task.get_latest_checkpoint_path(dir_path: Optional[str] = None) → str[source]

Get the latest checkpoint path :param dir_path: the dir to scan for existing checkpoint files. Default: if None, :param the latest checkpoint path saved in momery will be returned:

Returns: checkpoint_path