BERTTensorizerBase.ConfigΒΆ
Component: BERTTensorizerBase
-
class
BERTTensorizerBase.Config[source] Bases:
Tensorizer.Config
All Attributes (including base classes)
- is_input: bool =
True- columns: list[str] =
['text']- tokenizer: Tokenizer.Config = Tokenizer.Config()
- base_tokenizer: Optional[Tokenizer.Config] =
None- vocab_file: str =
''- max_seq_len: int =
256
- Subclasses
BERTTensorizer.ConfigBERTContextTensorizerForDenseRetrieval.ConfigRoBERTaContextTensorizerForDenseRetrieval.ConfigRoBERTaTensorizer.ConfigRoBERTaTokenLevelTensorizer.ConfigSquadForBERTTensorizer.ConfigSquadForBERTTensorizerForKD.ConfigSquadForRoBERTaTensorizer.ConfigSquadForRoBERTaTensorizerForKD.Config
Default JSON
{
"is_input": true,
"columns": [
"text"
],
"tokenizer": {
"Tokenizer": {
"split_regex": "\\s+",
"lowercase": true,
"use_byte_offsets": false
}
},
"base_tokenizer": null,
"vocab_file": "",
"max_seq_len": 256
}