pytext package

Subpackages

Submodules

pytext.builtin_task module

pytext.builtin_task.add_include(path)[source]

Import tasks (and associated components) from the folder name.

pytext.builtin_task.register_builtin_tasks()[source]

pytext.main module

class pytext.main.Attrs[source]

Bases: object

pytext.main.gen_config_impl(task_name, *args, **kwargs)[source]
pytext.main.run_single(rank: int, config_json: str, world_size: int, dist_init_method: Optional[str], metadata: Union[Dict[str, pytext.data.data_handler.CommonMetadata], pytext.data.data_handler.CommonMetadata, None], metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]])[source]
pytext.main.train_model_distributed(config, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]])[source]

pytext.workflow module

class pytext.workflow.LogitsWriter(results: multiprocessing.context.BaseContext.Queue, output_path: str, use_gzip: bool, ndigits_precision: int)[source]

Bases: object

Writes model logits to a file.

The class is designed for use in an asynchronous process spawned by torch.multiprocessing.spawn, e.g.
logits_writer = LogitsWriter(…) logits_writer_ctx = torch.multiprocessing.spawn(logits_writer.run, join=False) logits_writer_ctx.join()
run(process_index)[source]
pytext.workflow.batch_predict(model_file: str, examples: List[Dict[str, Any]])[source]
pytext.workflow.dict_zip(*dicts, value_only=False)[source]
pytext.workflow.export_saved_model_to_caffe2(saved_model_path: str, export_caffe2_path: str, output_onnx_path: str = None) → None[source]
pytext.workflow.export_saved_model_to_torchscript(saved_model_path: str, path: str, export_config: pytext.config.pytext_config.ExportConfig) → None[source]
pytext.workflow.get_logits(snapshot_path: str, use_cuda_if_available: bool, output_path: Optional[str] = None, test_path: Optional[str] = None, field_names: Optional[List[str]] = None, dump_raw_input: bool = False, batch_size: int = 16, ndigits_precision: int = 0, output_columns: Optional[List[int]] = None, use_gzip: bool = False, device_id: int = 0, fp16: bool = False)[source]
pytext.workflow.prepare_task(config: pytext.config.pytext_config.PyTextConfig, dist_init_url: str = None, device_id: int = 0, rank: int = 0, world_size: int = 1, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, metadata: pytext.data.data_handler.CommonMetadata = None) → Tuple[pytext.task.task.Task_Deprecated, pytext.trainers.training_state.TrainingState][source]
pytext.workflow.prepare_task_metadata(config: pytext.config.pytext_config.PyTextConfig) → pytext.data.data_handler.CommonMetadata[source]

Loading the whole dataset into cpu memory on every single processes could cause OOMs for data parallel distributed training. To avoid such practice, we move the operations that required loading the whole dataset out of spawn, and pass the context to every single process.

pytext.workflow.reload_model_for_multi_export(config: pytext.config.pytext_config.PyTextConfig)[source]
pytext.workflow.save_and_export(config: pytext.config.pytext_config.PyTextConfig, task: pytext.task.task.Task_Deprecated, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None) → None[source]
pytext.workflow.save_pytext_snapshot(config: pytext.config.pytext_config.PyTextConfig) → None[source]
pytext.workflow.test_model(test_config: pytext.config.pytext_config.TestConfig, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]], test_out_path: str) → Any[source]
pytext.workflow.test_model_from_snapshot_path(snapshot_path: str, use_cuda_if_available: bool, test_path: Optional[str] = None, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, test_out_path: str = '', field_names: Optional[List[str]] = None)[source]
pytext.workflow.train_model(config: pytext.config.pytext_config.PyTextConfig, dist_init_url: str = None, device_id: int = 0, rank: int = 0, world_size: int = 1, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, metadata: pytext.data.data_handler.CommonMetadata = None) → Tuple[source]

Module contents

pytext.batch_predict_caffe2_model(pytext_model_file: str, caffe2_model_file: str, db_type: str = 'minidb', data_source: Optional[pytext.data.sources.data_source.DataSource] = None, use_cuda=False, task: Optional[pytext.task.new_task.NewTask] = None, train_config: Optional[pytext.config.pytext_config.PyTextConfig] = None, cache_size: int = 0)[source]

Gets predictions from caffe2 model from a batch of examples.

Parameters:
  • pytext_model_file – Path to pytext model file (required if task and training config is not specified)
  • caffe2_model_file – Path to caffe2 model file
  • db_type – DB type to use for caffe2
  • data_source – Data source for test examples
  • use_cuda – Whether to turn on cuda processing
  • task – The pytext task object
  • train_config – The pytext training config
  • cache_size – The LRU cache size to use for prediction. 0 = no cache, -1 = boundless cache, [1, inf) = size of cache
pytext.create_predictor(config: pytext.config.pytext_config.PyTextConfig, model_file: Optional[str] = None, db_type: str = 'minidb', task: Optional[pytext.task.new_task.NewTask] = None, cache_size: int = 0) → Callable[[Mapping[str, str]], Mapping[str, numpy.array]][source]

Create a simple prediction API from a training config and an exported caffe2 model file. This model file should be created by calling export on a trained model snapshot.

pytext.load_config(filename: str) → pytext.config.pytext_config.PyTextConfig[source]

Load a PyText configuration file from a file path. See pytext.config.pytext_config for more info on configs.