WordPieceTokenizer.ConfigΒΆ
Component: WordPieceTokenizer
-
class
WordPieceTokenizer.Config[source] Bases:
ConfigBase
All Attributes (including base classes)
- basic_tokenizer: BERTInitialTokenizer.Config = BERTInitialTokenizer.Config()
- wordpiece_vocab_path: str =
'manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt'
Default JSON
{
"basic_tokenizer": {
"split_regex": "\\s+",
"lowercase": true,
"use_byte_offsets": false
},
"wordpiece_vocab_path": "manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt"
}