pytext.data.featurizer package¶
Submodules¶
pytext.data.featurizer.featurizer module¶
-
class
pytext.data.featurizer.featurizer.
Featurizer
(config, feature_config: pytext.config.field_config.FeatureConfig)[source]¶ Bases:
pytext.config.component.Component
Featurizer is tasked with performing data preprocessing that should be shared between training and inference, namely, tokenization and gazetteer features alignment.
This is an interface whose featurize() method must be implemented so that the implemented interface can be used with the appropriate data handler.
-
featurize
(input_record: pytext.data.featurizer.featurizer.InputRecord) → pytext.data.featurizer.featurizer.OutputRecord[source]¶
-
-
class
pytext.data.featurizer.featurizer.
InputRecord
[source]¶ Bases:
tuple
Input data contract between Featurizer and DataHandler.
-
locale
¶ Alias for field number 2
-
raw_gazetteer_feats
¶ Alias for field number 1
-
raw_text
¶ Alias for field number 0
-
-
class
pytext.data.featurizer.featurizer.
OutputRecord
[source]¶ Bases:
tuple
Output data contract between Featurizer and DataHandler.
-
characters
¶ Alias for field number 5
-
contextual_token_embedding
¶ Alias for field number 6
-
dense_feats
¶ Alias for field number 7
-
gazetteer_feat_lengths
¶ Alias for field number 3
-
gazetteer_feat_weights
¶ Alias for field number 4
-
gazetteer_feats
¶ Alias for field number 2
-
token_ranges
¶ Alias for field number 1
-
tokens
¶ Alias for field number 0
-
pytext.data.featurizer.simple_featurizer module¶
-
class
pytext.data.featurizer.simple_featurizer.
SimpleFeaturizer
(config, feature_config: pytext.config.field_config.FeatureConfig)[source]¶ Bases:
pytext.data.featurizer.featurizer.Featurizer
Simple featurizer for basic tokenization and gazetteer feature alignment.
-
featurize
(input_record: pytext.data.featurizer.featurizer.InputRecord) → pytext.data.featurizer.featurizer.OutputRecord[source]¶ Featurize one instance/example only.
-
featurize_batch
(input_records: Sequence[pytext.data.featurizer.featurizer.InputRecord]) → Sequence[pytext.data.featurizer.featurizer.OutputRecord][source]¶ Featurize a batch of instances/examples.
-
Module contents¶
-
class
pytext.data.featurizer.
Featurizer
(config, feature_config: pytext.config.field_config.FeatureConfig)[source]¶ Bases:
pytext.config.component.Component
Featurizer is tasked with performing data preprocessing that should be shared between training and inference, namely, tokenization and gazetteer features alignment.
This is an interface whose featurize() method must be implemented so that the implemented interface can be used with the appropriate data handler.
-
featurize
(input_record: pytext.data.featurizer.featurizer.InputRecord) → pytext.data.featurizer.featurizer.OutputRecord[source]¶
-
-
class
pytext.data.featurizer.
InputRecord
[source]¶ Bases:
tuple
Input data contract between Featurizer and DataHandler.
-
locale
¶ Alias for field number 2
-
raw_gazetteer_feats
¶ Alias for field number 1
-
raw_text
¶ Alias for field number 0
-
-
class
pytext.data.featurizer.
OutputRecord
[source]¶ Bases:
tuple
Output data contract between Featurizer and DataHandler.
-
characters
¶ Alias for field number 5
-
contextual_token_embedding
¶ Alias for field number 6
-
dense_feats
¶ Alias for field number 7
-
gazetteer_feat_lengths
¶ Alias for field number 3
-
gazetteer_feat_weights
¶ Alias for field number 4
-
gazetteer_feats
¶ Alias for field number 2
-
token_ranges
¶ Alias for field number 1
-
tokens
¶ Alias for field number 0
-
-
class
pytext.data.featurizer.
SimpleFeaturizer
(config, feature_config: pytext.config.field_config.FeatureConfig)[source]¶ Bases:
pytext.data.featurizer.featurizer.Featurizer
Simple featurizer for basic tokenization and gazetteer feature alignment.
-
featurize
(input_record: pytext.data.featurizer.featurizer.InputRecord) → pytext.data.featurizer.featurizer.OutputRecord[source]¶ Featurize one instance/example only.
-
featurize_batch
(input_records: Sequence[pytext.data.featurizer.featurizer.InputRecord]) → Sequence[pytext.data.featurizer.featurizer.OutputRecord][source]¶ Featurize a batch of instances/examples.
-