SeqTokenTensorizer.ConfigΒΆ
Component: SeqTokenTensorizer
-
class
SeqTokenTensorizer.
Config
[source] Bases:
Tensorizer.Config
All Attributes (including base classes)
- is_input: bool =
True
- column: str =
'text_seq'
- max_seq_len: Optional[int] =
None
- add_bos_token: bool =
False
- sentence markers
- add_eos_token: bool =
False
- use_eos_token_for_bos: bool =
False
- add_bol_token: bool =
False
- list markers
- add_eol_token: bool =
False
- use_eol_token_for_bol: bool =
False
- tokenizer: Tokenizer.Config = Tokenizer.Config()
- The tokenizer to use to split input text into tokens.
- max_turn: int =
50
Default JSON
{
"is_input": true,
"column": "text_seq",
"max_seq_len": null,
"add_bos_token": false,
"add_eos_token": false,
"use_eos_token_for_bos": false,
"add_bol_token": false,
"add_eol_token": false,
"use_eol_token_for_bol": false,
"tokenizer": {
"Tokenizer": {
"split_regex": "\\s+",
"lowercase": true,
"use_byte_offsets": false
}
},
"max_turn": 50
}