VocabConfigΒΆ

Component: Component

class pytext.data.tensorizers.VocabConfig[source]

Bases: Component.Config

All Attributes (including base classes)

build_from_data: bool = True
Whether to add tokens from training data to vocab.
size_from_data: int = 0
Add size_from_data most frequent tokens in training data to vocab (if this is 0, add all tokens from training data).
min_counts: int = 0
Add min_counts filter out tokens in training data that with count smaller than min_counts.
vocab_files: list[VocabFileConfig] = []

Default JSON

{
    "build_from_data": true,
    "size_from_data": 0,
    "min_counts": 0,
    "vocab_files": []
}