BertModelInputΒΆ

class pytext.models.bert_classification_models.BertModelInput

Bases: EncoderModelInput

All Attributes (including base classes)

tokens: BERTTensorizer.Config = BERTTensorizer.Config(max_seq_len=128)
dense: Optional[FloatListTensorizer.Config] = None
labels: LabelTensorizer.Config = LabelTensorizer.Config()
num_tokens: NtokensTensorizer.Config = NtokensTensorizer.Config(names=['tokens'], indexes=[2])

Default JSON

{
    "tokens": {
        "BERTTensorizer": {
            "is_input": true,
            "columns": [
                "text"
            ],
            "tokenizer": {
                "WordPieceTokenizer": {
                    "basic_tokenizer": {
                        "split_regex": "\\s+",
                        "lowercase": true,
                        "use_byte_offsets": false
                    },
                    "wordpiece_vocab_path": "manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt"
                }
            },
            "base_tokenizer": null,
            "vocab_file": "manifold://nlp_technologies/tree/huggingface-models/bert-base-uncased/vocab.txt",
            "max_seq_len": 128
        }
    },
    "dense": null,
    "labels": {
        "LabelTensorizer": {
            "is_input": false,
            "column": "label",
            "allow_unknown": false,
            "pad_in_vocab": false,
            "label_vocab": null,
            "label_vocab_file": null,
            "add_labels": null
        }
    },
    "num_tokens": {
        "is_input": false,
        "names": [
            "tokens"
        ],
        "indexes": [
            2
        ]
    }
}