SeqTokenTensorizer.ConfigΒΆ

Component: SeqTokenTensorizer

class SeqTokenTensorizer.Config[source]

Bases: Tensorizer.Config

All Attributes (including base classes)

is_input: bool = True
column: str = 'text_seq'
max_seq_len: Optional[int] = None
add_bos_token: bool = False
sentence markers
add_eos_token: bool = False
use_eos_token_for_bos: bool = False
add_bol_token: bool = False
list markers
add_eol_token: bool = False
use_eol_token_for_bol: bool = False
tokenizer: Tokenizer.Config = Tokenizer.Config()
The tokenizer to use to split input text into tokens.
max_turn: int = 50

Default JSON

{
    "is_input": true,
    "column": "text_seq",
    "max_seq_len": null,
    "add_bos_token": false,
    "add_eos_token": false,
    "use_eos_token_for_bos": false,
    "add_bol_token": false,
    "add_eol_token": false,
    "use_eol_token_for_bol": false,
    "tokenizer": {
        "Tokenizer": {
            "split_regex": "\\s+",
            "lowercase": true,
            "use_byte_offsets": false
        }
    },
    "max_turn": 50
}