pytext package¶

Subpackages¶

Submodules¶

pytext.builtin_task module¶

pytext.builtin_task.add_include(path)[source]¶: Import tasks (and associated components) from the folder name.

pytext.builtin_task.register_builtin_tasks()[source]¶

pytext.main module¶

class pytext.main.Attrs[source]¶: Bases: object

pytext.main.gen_config_impl(task_name, *args, **kwargs)[source]¶

pytext.main.run_single(rank: int, config_json: str, world_size: int, dist_init_method: Optional[str], metadata: Union[Dict[str, pytext.data.data_handler.CommonMetadata], pytext.data.data_handler.CommonMetadata, None], metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]])[source]¶

pytext.main.train_model_distributed(config, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]])[source]¶

pytext.workflow module¶

class pytext.workflow.LogitsWriter(results: multiprocessing.context.BaseContext.Queue, output_path: str, use_gzip: bool, ndigits_precision: int)[source]¶

Bases: object

Writes model logits to a file.

The class is designed for use in an asynchronous process spawned by torch.multiprocessing.spawn, e.g.: logits_writer = LogitsWriter(…) logits_writer_ctx = torch.multiprocessing.spawn(logits_writer.run, join=False) logits_writer_ctx.join()

run(process_index)[source]¶

pytext.workflow.batch_predict(model_file: str, examples: List[Dict[str, Any]])[source]¶

pytext.workflow.dict_zip(*dicts, value_only=False)[source]¶

pytext.workflow.export_saved_model_to_caffe2(saved_model_path: str, export_caffe2_path: str, output_onnx_path: str = None) → None[source]¶

pytext.workflow.export_saved_model_to_torchscript(saved_model_path: str, path: str, export_config: pytext.config.pytext_config.ExportConfig) → None[source]¶

pytext.workflow.get_logits(snapshot_path: str, use_cuda_if_available: bool, output_path: Optional[str] = None, test_path: Optional[str] = None, field_names: Optional[List[str]] = None, dump_raw_input: bool = False, batch_size: int = 16, ndigits_precision: int = 0, output_columns: Optional[List[int]] = None, use_gzip: bool = False, device_id: int = 0, fp16: bool = False)[source]¶

pytext.workflow.prepare_task(config: pytext.config.pytext_config.PyTextConfig, dist_init_url: str = None, device_id: int = 0, rank: int = 0, world_size: int = 1, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, metadata: pytext.data.data_handler.CommonMetadata = None) → Tuple[pytext.task.task.Task_Deprecated, pytext.trainers.training_state.TrainingState][source]¶

pytext.workflow.prepare_task_metadata(config: pytext.config.pytext_config.PyTextConfig) → pytext.data.data_handler.CommonMetadata[source]¶: Loading the whole dataset into cpu memory on every single processes could cause OOMs for data parallel distributed training. To avoid such practice, we move the operations that required loading the whole dataset out of spawn, and pass the context to every single process.

pytext.workflow.reload_model_for_multi_export(config: pytext.config.pytext_config.PyTextConfig)[source]¶

pytext.workflow.save_and_export(config: pytext.config.pytext_config.PyTextConfig, task: pytext.task.task.Task_Deprecated, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None) → None[source]¶

pytext.workflow.save_pytext_snapshot(config: pytext.config.pytext_config.PyTextConfig) → None[source]¶

pytext.workflow.test_model(test_config: pytext.config.pytext_config.TestConfig, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]], test_out_path: str) → Any[source]¶

pytext.workflow.test_model_from_snapshot_path(snapshot_path: str, use_cuda_if_available: bool, test_path: Optional[str] = None, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, test_out_path: str = '', field_names: Optional[List[str]] = None)[source]¶

pytext.workflow.train_model(config: pytext.config.pytext_config.PyTextConfig, dist_init_url: str = None, device_id: int = 0, rank: int = 0, world_size: int = 1, metric_channels: Optional[List[pytext.metric_reporters.channel.Channel]] = None, metadata: pytext.data.data_handler.CommonMetadata = None) → Tuple[source]¶

Module contents¶

pytext.batch_predict_caffe2_model(pytext_model_file: str, caffe2_model_file: str, db_type: str = 'minidb', data_source: Optional[pytext.data.sources.data_source.DataSource] = None, use_cuda=False, task: Optional[pytext.task.new_task.NewTask] = None, train_config: Optional[pytext.config.pytext_config.PyTextConfig] = None, cache_size: int = 0)[source]¶

Gets predictions from caffe2 model from a batch of examples.

Parameters:

pytext_model_file – Path to pytext model file (required if task and training config is not specified)
caffe2_model_file – Path to caffe2 model file
db_type – DB type to use for caffe2
data_source – Data source for test examples
use_cuda – Whether to turn on cuda processing
task – The pytext task object
train_config – The pytext training config
cache_size – The LRU cache size to use for prediction. 0 = no cache, -1 = boundless cache, [1, inf) = size of cache

pytext.create_predictor(config: pytext.config.pytext_config.PyTextConfig, model_file: Optional[str] = None, db_type: str = 'minidb', task: Optional[pytext.task.new_task.NewTask] = None, cache_size: int = 0) → Callable[[Mapping[str, str]], Mapping[str, numpy.array]][source]¶: Create a simple prediction API from a training config and an exported caffe2 model file. This model file should be created by calling export on a trained model snapshot.

pytext.load_config(filename: str) → pytext.config.pytext_config.PyTextConfig[source]¶: Load a PyText configuration file from a file path. See pytext.config.pytext_config for more info on configs.