BERTTensorizer.ConfigΒΆ
Component: BERTTensorizer
-
class
BERTTensorizer.
Config
[source] Bases:
BERTTensorizerBase.Config
All Attributes (including base classes)
- is_input: bool =
True
- columns: list[str] =
['text']
- tokenizer: Tokenizer.Config = WordPieceTokenizer.Config()
- base_tokenizer: Optional[Tokenizer.Config] =
None
- vocab_file: str =
'manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt'
- max_seq_len: int =
256
- Subclasses
BERTContextTensorizerForDenseRetrieval.Config
SquadForBERTTensorizer.Config
SquadForBERTTensorizerForKD.Config
Default JSON
{
"is_input": true,
"columns": [
"text"
],
"tokenizer": {
"WordPieceTokenizer": {
"basic_tokenizer": {
"split_regex": "\\s+",
"lowercase": true,
"use_byte_offsets": false
},
"wordpiece_vocab_path": "manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt"
}
},
"base_tokenizer": null,
"vocab_file": "manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt",
"max_seq_len": 256
}