pytext.metrics package¶

Submodules¶

pytext.metrics.calibration_metrics module¶

class pytext.metrics.calibration_metrics.AllCalibrationMetrics(calibration_metrics)[source]¶

Bases: tuple

calibration_metrics¶: Alias for field number 0

print_metrics(report_pep=False) → None[source]¶

class pytext.metrics.calibration_metrics.CalibrationMetrics(expected_error, max_error, total_error)[source]¶

Bases: tuple

expected_error¶: Alias for field number 0

max_error¶: Alias for field number 1

print_metrics(report_pep=False) → None[source]¶

total_error¶: Alias for field number 2

pytext.metrics.calibration_metrics.calculate_error(n_samples: int, bucket_values: List[List[float]], bucket_confidence: List[List[float]], bucket_accuracy: List[List[float]]) → Tuple[float, float, float][source]¶: Computes several metrics used to measure calibration error, including expected calibration error (ECE), maximum calibration error (MCE), and total calibration error (TCE).

pytext.metrics.calibration_metrics.compute_calibration(label_predictions: List[pytext.metrics.LabelPrediction]) → Tuple[float, float, float][source]¶

pytext.metrics.calibration_metrics.get_bucket_accuracy(bucket_values: List[List[float]], y_true: List[float], y_pred: List[float]) → List[float][source]¶: Computes accuracy for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.

pytext.metrics.calibration_metrics.get_bucket_confidence(bucket_values: List[List[float]]) → List[float][source]¶: Computes average confidence for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.

pytext.metrics.calibration_metrics.get_bucket_scores(y_score: List[float], buckets: int = 10) → Tuple[List[List[float]], List[int]][source]¶: Organizes real-valued posterior probabilities into buckets. For example, if we have 10 buckets, the probabilities 0.0, 0.1, 0.2 are placed into buckets 0 (0.0 <= p < 0.1), 1 (0.1 <= p < 0.2), and 2 (0.2 <= p < 0.3), respectively.

pytext.metrics.dense_retrieval_metrics module¶

class pytext.metrics.dense_retrieval_metrics.DenseRetrievalMetrics[source]¶

Bases: tuple

Metric class for dense passage retrieval.

num_examples¶

number of samples

Type:	int

accuracy¶

how many times did we get the +ve doc from list of docs

Type:	float

average_rank¶

average rank of positive passage

Type:	float

mean_reciprocal_rank¶

average 1/rank of positive passage

Type:	float

accuracy: Alias for field number 1

average_rank: Alias for field number 2

mean_reciprocal_rank: Alias for field number 3

num_examples: Alias for field number 0

print_metrics() → None[source]¶

pytext.metrics.intent_slot_metrics module¶

class pytext.metrics.intent_slot_metrics.AllMetrics[source]¶

Bases: tuple

Aggregated class for intent-slot related metrics.

top_intent_accuracy¶: Accuracy of the top-level intent.

frame_accuracy¶: Frame accuracy.

frame_accuracies_by_depth¶: Frame accuracies bucketized by depth of the gold tree.

bracket_metrics¶: Bracket metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

tree_metrics¶: Tree metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

loss¶: Cross entropy loss.

bracket_metrics: Alias for field number 4

frame_accuracies_by_depth: Alias for field number 3

frame_accuracy: Alias for field number 1

frame_accuracy_top_k¶: Alias for field number 2

loss: Alias for field number 6

print_metrics() → None[source]¶

top_intent_accuracy: Alias for field number 0

tree_metrics: Alias for field number 5

pytext.metrics.intent_slot_metrics.FrameAccuraciesByDepth = typing.Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy]¶: Frame accuracies bucketized by depth of the gold tree.

class pytext.metrics.intent_slot_metrics.FrameAccuracy[source]¶

Bases: tuple

Frame accuracy for a collection of intent frame predictions.

Frame accuracy means the entire tree structure of the predicted frame matches that of the gold frame.

frame_accuracy¶: Alias for field number 1

num_samples¶: Alias for field number 0

class pytext.metrics.intent_slot_metrics.FramePredictionPair[source]¶

Bases: tuple

Pair of predicted and gold intent frames.

expected_frame¶: Alias for field number 1

predicted_frame¶: Alias for field number 0

class pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]¶

Bases: tuple

Aggregated class for intent and slot confusions.

intent_confusions¶: Confusion counts for intents.

slot_confusions¶: Confusion counts for slots.

intent_confusions: Alias for field number 0

slot_confusions: Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]¶

Bases: tuple

Precision/recall/F1 metrics for intents and slots.

intent_metrics¶: Precision/recall/F1 metrics for intents.

slot_metrics¶: Precision/recall/F1 metrics for slots.

overall_metrics¶: Combined precision/recall/F1 metrics for all nodes (merging intents and slots).

intent_metrics: Alias for field number 0

overall_metrics: Alias for field number 2

print_metrics() → None[source]¶

slot_metrics: Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentsAndSlots[source]¶

Bases: tuple

Collection of intents and slots in an intent frame.

intents¶: Alias for field number 0

slots¶: Alias for field number 1

class pytext.metrics.intent_slot_metrics.Node(label: str, span: pytext.data.data_structures.node.Span, children: Optional[AbstractSet[Node]] = None, text: str = None)[source]¶

Bases: pytext.data.data_structures.node.Node

Subclass of the base Node class, used for metric purposes. It is immutable so that hashing can be done on the class.

label¶

Label of the node.

Type:	str

span¶

Span of the node.

Type:	Span

children¶

frozenset of the node’s children, left empty when computing bracketing metrics.

Type:	`frozenset` of `Node`

text¶

Text the node covers (=utterance[span.start:span.end])

Type:	str

class pytext.metrics.intent_slot_metrics.NodesPredictionPair[source]¶

Bases: tuple

Pair of predicted and expected sets of nodes.

expected_nodes¶: Alias for field number 1

predicted_nodes¶: Alias for field number 0

pytext.metrics.intent_slot_metrics.compare_frames(predicted_frame: pytext.metrics.intent_slot_metrics.Node, expected_frame: pytext.metrics.intent_slot_metrics.Node, tree_based: bool, intent_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None, slot_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None) → pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]¶

Compares two intent frames and returns TP, FP, FN counts for intents and slots. Optionally collects the per label TP, FP, FN counts.

Parameters:

predicted_frame – Predicted intent frame.
expected_frame – Gold intent frame.
tree_based – Whether to get the tree-based confusions (if True) or bracket-based confusions (if False). For details, see the function compute_intent_slot_metrics().
intent_per_label_confusions – If provided, update the per label confusions for intents as well. Defaults to None.
slot_per_label_confusions – If provided, update the per label confusions for slots as well. Defaults to None.

Returns:

IntentSlotConfusions, containing confusion counts for intents and slots.

pytext.metrics.intent_slot_metrics.compute_all_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], top_intent_accuracy: bool = True, frame_accuracy: bool = True, frame_accuracies_by_depth: bool = True, bracket_metrics: bool = True, tree_metrics: bool = True, overall_metrics: bool = False, all_predicted_frames: List[List[pytext.metrics.intent_slot_metrics.Node]] = None, calculated_loss: float = None, length_metrics: Dict[KT, VT] = None) → pytext.metrics.intent_slot_metrics.AllMetrics[source]¶

Given a list of predicted and gold intent frames, computes intent-slot related metrics.

Parameters:

frame_pairs – List of predicted and gold intent frames.
top_intent_accuracy – Whether to compute top intent accuracy or not. Defaults to True.
frame_accuracy – Whether to compute frame accuracy or not. Defaults to True.
frame_accuracies_by_depth – Whether to compute frame accuracies by depth or not. Defaults to True.
bracket_metrics – Whether to compute bracket metrics or not. Defaults to True.
tree_metrics – Whether to compute tree metrics or not. Defaults to True.
overall_metrics – If bracket_metrics or tree_metrics is true, decides whether to compute overall (merging intents and slots) metrics for them. Defaults to False.

Returns:

AllMetrics which contains intent-slot related metrics.

pytext.metrics.intent_slot_metrics.compute_frame_accuracies_by_depth(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy][source]¶

Given a list of predicted and gold intent frames, splits the predictions into buckets according to the depth of the gold trees, and computes frame accuracy for each bucket.

Parameters:	frame_pairs – List of predicted and gold intent frames.
Returns:	FrameAccuraciesByDepth, a map from depths to their corresponding frame accuracies.

pytext.metrics.intent_slot_metrics.compute_frame_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶

Computes frame accuracy given a list of predicted and gold intent frames.

Parameters:	frame_pairs – List of predicted and gold intent frames.
Returns:	Frame accuracy. For a prediction, frame accuracy is achieved if the entire tree structure of the predicted frame matches that of the gold frame.

pytext.metrics.intent_slot_metrics.compute_frame_accuracy_top_k(frame_pairs: List[pytext.metrics.intent_slot_metrics.FramePredictionPair], all_frames: List[List[pytext.metrics.intent_slot_metrics.Node]]) → float[source]¶

pytext.metrics.intent_slot_metrics.compute_intent_slot_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], tree_based: bool, overall_metrics: bool = True) → pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]¶

Given a list of predicted and gold intent frames, computes precision, recall and F1 metrics for intents and slots, either in tree-based or bracket-based manner.

The following assumptions are taken on intent frames: 1. The root node is an intent, 2. Children of intents are always slots, and children of slots are always intents.

For tree-based metrics, a node (an intent or slot) in the predicted frame is considered a true positive only if the subtree rooted at this node has an exact copy in the gold frame, otherwise it is considered a false positive. A false negative is a node in the gold frame that does not have an exact subtree match in the predicted frame.

For bracket-based metrics, a node in the predicted frame is considered a true positive if there is a node in the gold frame having the same label and span (but not necessarily the same children). The definitions of false positives and false negatives are similar to the above.

Parameters:	frame_pairs – List of predicted and gold intent frames. tree_based – Whether to compute tree-based metrics (if True) or bracket-based metrics (if False). overall_metrics – Whether to compute overall (merging intents and slots) metrics or not. Defaults to True.
Returns:	IntentSlotMetrics, containing precision/recall/F1 metrics for intents and slots.

pytext.metrics.intent_slot_metrics.compute_metric_at_k(references: List[pytext.metrics.intent_slot_metrics.Node], hypothesis: List[List[pytext.metrics.intent_slot_metrics.Node]], metric_fn: Callable[[pytext.metrics.intent_slot_metrics.Node, pytext.metrics.intent_slot_metrics.Node], bool] = <function <lambda>>) → List[float][source]¶: Computes a boolean metric at each position in the ranked list of hypothesis, and returns an average for each position over all examples. By default metric_fn is comparing if frames are equal.

pytext.metrics.intent_slot_metrics.compute_prf1_metrics(nodes_pairs: Sequence[pytext.metrics.intent_slot_metrics.NodesPredictionPair]) → Tuple[pytext.metrics.AllConfusions, pytext.metrics.PRF1Metrics][source]¶

Computes precision/recall/F1 metrics given a list of predicted and expected sets of nodes.

Parameters:	nodes_pairs – List of predicted and expected node sets.
Returns:	A tuple, of which the first member contains the confusion information, and the second member contains the computed precision/recall/F1 metrics.

pytext.metrics.intent_slot_metrics.compute_top_intent_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶

Computes accuracy of the top-level intent.

Parameters:	frame_pairs – List of predicted and gold intent frames.
Returns:	Prediction accuracy of the top-level intent.

pytext.metrics.language_model_metrics module¶

class pytext.metrics.language_model_metrics.LanguageModelMetric[source]¶

Bases: tuple

Class for language model metrics.

perplexity_per_word¶: Average perplexity per word of the dataset.

perplexity_per_word: Alias for field number 0

print_metrics()[source]¶

pytext.metrics.language_model_metrics.compute_language_model_metric(loss_per_word: float) → pytext.metrics.language_model_metrics.LanguageModelMetric[source]¶

pytext.metrics.mask_metrics module¶

pytext.metrics.seq2seq_metrics module¶

class pytext.metrics.seq2seq_metrics.Seq2SeqMetrics(loss, exact_match, f1, bleu)[source]¶

Bases: tuple

bleu¶: Alias for field number 3

exact_match¶: Alias for field number 1

f1¶: Alias for field number 2

loss¶: Alias for field number 0

print_metrics() → None[source]¶

class pytext.metrics.seq2seq_metrics.Seq2SeqTopKMetrics[source]¶

Bases: pytext.metrics.seq2seq_metrics.Seq2SeqMetrics

print_metrics() → None[source]¶

pytext.metrics.seq2seq_metrics.compute_f1(hypothesis_list, reference_list, eps=1e-08)[source]¶: Computes token F1 given a hypothesis and reference. This is defined as F1 = 2 * ((P * R) / (P + R + eps)) where P = precision, R = recall, and eps = epsilon for smoothing zero denominators. By default, eps = 1e-8.

pytext.metrics.squad_metrics module¶

class pytext.metrics.squad_metrics.SquadMetrics(classification_metrics, num_examples, exact_matches, f1_score)[source]¶

Bases: tuple

classification_metrics¶: Alias for field number 0

exact_matches¶: Alias for field number 2

f1_score¶: Alias for field number 3

num_examples¶: Alias for field number 1

print_metrics() → None[source]¶

Module contents¶

class pytext.metrics.AllConfusions[source]¶

Bases: object

Aggregated class for per label confusions.

per_label_confusions¶: Per label confusion information.

confusions¶: Overall TP, FP and FN counts across the labels in per_label_confusions.

compute_metrics() → pytext.metrics.PRF1Metrics[source]¶

confusions

per_label_confusions

class pytext.metrics.ClassificationMetrics[source]¶

Bases: tuple

Metric class for various classification metrics.

accuracy¶: Overall accuracy of predictions.

macro_prf1_metrics¶: Macro precision/recall/F1 scores.

per_label_soft_scores¶: Per label soft metrics.

mcc¶: Matthews correlation coefficient.

roc_auc¶: Area under the Receiver Operating Characteristic curve.

loss¶: Training loss (only used for selecting best model, no need to print).

accuracy: Alias for field number 0

loss: Alias for field number 5

macro_prf1_metrics: Alias for field number 1

mcc: Alias for field number 3

per_label_soft_scores: Alias for field number 2

print_metrics(report_pep=False) → None[source]¶

print_pep()[source]¶

roc_auc: Alias for field number 4

class pytext.metrics.Confusions(TP: int = 0, FP: int = 0, FN: int = 0)[source]¶

Bases: object

Confusion information for a collection of predictions.

TP¶: Number of true positives.

FP¶: Number of false positives.

FN¶: Number of false negatives.

FN

FP

TP

compute_metrics() → pytext.metrics.PRF1Scores[source]¶

class pytext.metrics.LabelListPrediction[source]¶

Bases: tuple

Label list predictions of an example.

label_scores¶: Confidence scores that each label receives.

predicted_label¶: List of indices of the predicted label.

expected_label¶: List of indices of the true label.

expected_label: Alias for field number 2

label_scores: Alias for field number 0

predicted_label: Alias for field number 1

class pytext.metrics.LabelPrediction[source]¶

Bases: tuple

Label predictions of an example.

label_scores¶: Confidence scores that each label receives.

predicted_label¶: Index of the predicted label. This is usually the label with the highest confidence score in label_scores.

expected_label¶: Index of the true label.

expected_label: Alias for field number 2

label_scores: Alias for field number 0

predicted_label: Alias for field number 1

class pytext.metrics.MacroPRF1Metrics[source]¶

Bases: tuple

Aggregated metric class for macro precision/recall/F1 scores.

per_label_scores¶: Mapping from label string to the corresponding precision/recall/F1 scores.

macro_scores¶: Macro precision/recall/F1 scores across the labels in per_label_scores.

macro_scores: Alias for field number 1

per_label_scores: Alias for field number 0

print_metrics(indentation='') → None[source]¶

class pytext.metrics.MacroPRF1Scores[source]¶

Bases: tuple

Macro precision/recall/F1 scores (averages across each label).

num_label¶: Number of distinct labels.

precision¶: Equally weighted average of precisions for each label.

recall¶: Equally weighted average of recalls for each label.

f1¶: Equally weighted average of F1 scores for each label.

f1: Alias for field number 3

num_labels¶: Alias for field number 0

precision: Alias for field number 1

recall: Alias for field number 2

class pytext.metrics.MultiLabelSoftClassificationMetrics[source]¶

Bases: tuple

Classification scores that are independent of thresholds.

average_label_precision¶: Alias for field number 0

average_label_recall¶: Alias for field number 2

average_overall_accuracy¶: Alias for field number 11

average_overall_auc¶: Alias for field number 9

average_overall_precision¶: Alias for field number 1

average_overall_recall¶: Alias for field number 3

decision_thresh_at_precision¶: Alias for field number 5

decision_thresh_at_recall¶: Alias for field number 7

label_accuracy¶: Alias for field number 10

precision_at_recall¶: Alias for field number 6

recall_at_precision¶: Alias for field number 4

roc_auc¶: Alias for field number 8

pytext.metrics.PRECISION_AT_RECALL_THRESHOLDS = [0.2, 0.4, 0.6, 0.8, 0.9]¶: Basic metric classes and functions for single-label prediction problems. Extending to multi-label support

class pytext.metrics.PRF1Metrics[source]¶

Bases: tuple

Metric class for all types of precision/recall/F1 scores.

per_label_scores¶: Map from label string to the corresponding precision/recall/F1 scores.

macro_scores¶: Macro precision/recall/F1 scores across the labels in per_label_scores.

micro_scores¶: Micro (regular) precision/recall/F1 scores for the same collection of predictions.

macro_scores: Alias for field number 1

micro_scores: Alias for field number 2

per_label_scores: Alias for field number 0

print_metrics() → None[source]¶

class pytext.metrics.PRF1Scores[source]¶

Bases: tuple

Precision/recall/F1 scores for a collection of predictions.

true_positives¶: Number of true positives.

false_positives¶: Number of false positives.

false_negatives¶: Number of false negatives.

precision¶: TP / (TP + FP).

recall¶: TP / (TP + FN).

f1¶: 2 * TP / (2 * TP + FP + FN).

f1: Alias for field number 5

false_negatives: Alias for field number 2

false_positives: Alias for field number 1

precision: Alias for field number 3

recall: Alias for field number 4

true_positives: Alias for field number 0

class pytext.metrics.PairwiseRankingMetrics[source]¶

Bases: tuple

Metric class for pairwise ranking

num_examples¶

number of samples

Type:	int

accuracy¶

how many times did we rank in the correct order

Type:	float

average_score_difference¶

average score(higherRank) - score(lowerRank)

Type:	float

accuracy: Alias for field number 1

average_score_difference: Alias for field number 2

num_examples: Alias for field number 0

print_metrics() → None[source]¶

class pytext.metrics.PerLabelConfusions[source]¶

Bases: object

Per label confusion information.

label_confusions_map¶: Map from label string to the corresponding confusion counts.

compute_metrics() → pytext.metrics.MacroPRF1Metrics[source]¶

label_confusions_map

update(label: str, item: str, count: int) → None[source]¶

Increase one of TP, FP or FN count for a label by certain amount.

Parameters:	label – Label to be modified. item – Type of count to be modified, should be one of “TP”, “FP” or “FN”. count – Amount to be added to the count.
Returns:	None

class pytext.metrics.RealtimeMetrics[source]¶

Bases: tuple

Realtime Metrics for tracking training progress and performance.

samples¶

number of samples

Type:	int

tps¶

tokens per second

Type:	float

ups¶

updates per second

Type:	float

samples: Alias for field number 0

tps: Alias for field number 1

ups: Alias for field number 2

class pytext.metrics.RegressionMetrics[source]¶

Bases: tuple

Metrics for regression tasks.

num_examples¶

number of examples

Type:	int

pearson_correlation¶

correlation between predictions and labels

Type:	float

mse¶

mean-squared error between predictions and labels

Type:	float

mse: Alias for field number 2

num_examples: Alias for field number 0

pearson_correlation: Alias for field number 1

print_metrics()[source]¶

class pytext.metrics.SoftClassificationMetrics[source]¶

Bases: tuple

Classification scores that are independent of thresholds.

average_precision¶: Alias for field number 0

decision_thresh_at_precision¶: Alias for field number 2

decision_thresh_at_recall¶: Alias for field number 4

precision_at_recall¶: Alias for field number 3

recall_at_precision¶: Alias for field number 1

roc_auc¶: Alias for field number 5

pytext.metrics.average_precision_score(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray) → float[source]¶

Computes average precision, which summarizes the precision-recall curve as the precisions achieved at each threshold weighted by the increase in recall since the previous threshold.

Parameters:	y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct. Numpy array of confidence scores for the predictions in (y_score_sorted) – decreasing order.
Returns:	Average precision score.

TODO: This is too slow, improve the performance

pytext.metrics.compute_average_recall(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], average_precisions: Dict[str, float]) → float[source]¶

pytext.metrics.compute_classification_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶

A general function that computes classification metrics given a list of label predictions.

Parameters:

predictions – Label predictions, including the confidence score for each label.
label_names – Indexed label names.
average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
recall_at_precision_thresholds – precision thresholds at which to calculate recall
precision_at_recall_thresholds – recall thresholds at which to calculate precision

Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_macro_avg(soft_metrics: Dict[str, pytext.metrics.SoftClassificationMetrics], metric: str)[source]¶

pytext.metrics.compute_matthews_correlation_coefficients(TP: int, FP: int, FN: int, TN: int) → float[source]¶

Computes Matthews correlation coefficient, a way to summarize all four counts (TP, FP, FN, TN) in the confusion matrix of binary classification.

Parameters:	TP – Number of true positives. FP – Number of false positives. FN – Number of false negatives. TN – Number of true negatives.
Returns:	Matthews correlation coefficient, which is sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).

pytext.metrics.compute_multi_label_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶

A general function that computes classification metrics given a list of multi-label predictions.

Parameters:

predictions – multi-label predictions, including the confidence score for each label.
label_names – Indexed label names.
average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
recall_at_precision_thresholds – precision thresholds at which to calculate recall
precision_at_recall_thresholds – recall thresholds at which to calculate precision

Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_multi_label_full_vector_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶

A general function that computes classification metrics given a list of multi-label predictions.

Parameters:

predictions – multi-label predictions, including the confidence score for each label.
label_names – Indexed label names.
average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
recall_at_precision_thresholds – precision thresholds at which to calculate recall
precision_at_recall_thresholds – recall thresholds at which to calculate precision

Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_multi_label_multi_class_soft_metrics(predictions: Sequence[Sequence[pytext.metrics.LabelPrediction]], label_names: Sequence[str], label_vocabs: Sequence[Sequence[str]], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.MultiLabelSoftClassificationMetrics[source]¶

Computes multi-label soft classification metrics with multi-class accommodation

Parameters:	predictions – multi-label predictions, including the confidence score for each label. label_names – Indexed label names. recall_at_precision_thresholds – precision thresholds at which to calculate recall precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:	Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_multi_label_soft_full_vector_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶

Computes multi-label soft classification metrics

Parameters:	predictions – multi-label predictions, including the confidence score for each label. label_names – Indexed label names. May contain duplicate label names. recall_at_precision_thresholds – precision thresholds at which to calculate recall precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:	Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_multi_label_soft_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶

Computes multi-label soft classification metrics

Parameters:	predictions – multi-label predictions, including the confidence score for each label. label_names – Indexed label names. recall_at_precision_thresholds – precision thresholds at which to calculate recall precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:	Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_pairwise_ranking_metrics(predictions: Sequence[int], scores: Sequence[float]) → pytext.metrics.PairwiseRankingMetrics[source]¶

Computes metrics for pairwise ranking given sequences of predictions and scores

Parameters:	predictions – 1 if ranking was correct, 0 if ranking was incorrect scores – score(higher-ranked-sample) - score(lower-ranked-sample)
Returns:	PairwiseRankingMetrics object

pytext.metrics.compute_prf1(tp: int, fp: int, fn: int) → Tuple[float, float, float][source]¶

pytext.metrics.compute_regression_metrics(predictions: Sequence[float], targets: Sequence[float]) → pytext.metrics.RegressionMetrics[source]¶

Computes metrics for regression tasks.abs

Parameters:	predictions – 1-D sequence of float predictions targets – 1-D sequence of float labels
Returns:	RegressionMetrics object

pytext.metrics.compute_roc_auc(predictions: Sequence[pytext.metrics.LabelPrediction], target_class: int = 0) → Optional[float][source]¶: Computes area under the Receiver Operating Characteristic curve, for binary classification. Implementation based off of (and explained at) https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en.

pytext.metrics.compute_roc_auc_given_sorted_positives(y_true_sorted: numpy.ndarray) → Optional[float][source]¶

pytext.metrics.compute_soft_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶

Computes soft classification metrics given a list of label predictions.

Parameters:	predictions – Label predictions, including the confidence score for each label. label_names – Indexed label names. recall_at_precision_thresholds – precision thresholds at which to calculate recall precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:	Dict from label strings to their corresponding soft metrics.

pytext.metrics.precision_at_recall(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Tuple[Dict[float, float], Dict[float, float]][source]¶

Computes precision at various recall levels

Parameters:	y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct. y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order. thresholds – Sequence of floats indicating the requested recall thresholds
Returns:	Dictionary of maximum precision at requested recall thresholds. Dictionary of decision thresholds resulting in max precision at requested recall thresholds.

pytext.metrics.recall_at_precision(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Dict[float, float][source]¶

Computes recall at various precision levels

Parameters:	y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct. y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order. thresholds – Sequence of floats indicating the requested precision thresholds
Returns:	Dictionary of maximum recall at requested precision thresholds.

pytext.metrics.safe_division(n: Union[int, float], d: int) → float[source]¶

pytext.metrics.sort_by_score(y_true_list: Sequence[bool], y_score_list: Sequence[float])[source]¶