GazetteerTensorizer.ConfigΒΆ

Component: GazetteerTensorizer

class GazetteerTensorizer.Config[source]

Bases: Tensorizer.Config

All Attributes (including base classes)

is_input: bool = True
text_column: str = 'text'
dict_column: str = 'dict'
tokenizer: Tokenizer.Config = Tokenizer.Config()
tokenizer to split text and create dict tensors of the same size.

Default JSON

{
    "is_input": true,
    "text_column": "text",
    "dict_column": "dict",
    "tokenizer": {
        "Tokenizer": {
            "split_regex": "\\s+",
            "lowercase": true,
            "use_byte_offsets": false
        }
    }
}