BERTTensorizerBase.ConfigΒΆ

Component: BERTTensorizerBase

class BERTTensorizerBase.Config[source]

Bases: Tensorizer.Config

All Attributes (including base classes)

is_input: bool = True
columns: list[str] = ['text']
tokenizer: Tokenizer.Config = Tokenizer.Config()
base_tokenizer: Optional[Tokenizer.Config] = None
vocab_file: str = ''
max_seq_len: int = 256
Subclasses
  • BERTTensorizer.Config
  • BERTContextTensorizerForDenseRetrieval.Config
  • RoBERTaContextTensorizerForDenseRetrieval.Config
  • RoBERTaTensorizer.Config
  • RoBERTaTokenLevelTensorizer.Config
  • SquadForBERTTensorizer.Config
  • SquadForBERTTensorizerForKD.Config
  • SquadForRoBERTaTensorizer.Config
  • SquadForRoBERTaTensorizerForKD.Config

Default JSON

{
    "is_input": true,
    "columns": [
        "text"
    ],
    "tokenizer": {
        "Tokenizer": {
            "split_regex": "\\s+",
            "lowercase": true,
            "use_byte_offsets": false
        }
    },
    "base_tokenizer": null,
    "vocab_file": "",
    "max_seq_len": 256
}