pytext.metrics package

Submodules

pytext.metrics.calibration_metrics module

class pytext.metrics.calibration_metrics.AllCalibrationMetrics(calibration_metrics)[source]

Bases: tuple

calibration_metrics

Alias for field number 0

print_metrics(report_pep=False) → None[source]
class pytext.metrics.calibration_metrics.CalibrationMetrics(expected_error, max_error, total_error)[source]

Bases: tuple

expected_error

Alias for field number 0

max_error

Alias for field number 1

print_metrics(report_pep=False) → None[source]
total_error

Alias for field number 2

pytext.metrics.calibration_metrics.calculate_error(n_samples: int, bucket_values: List[List[float]], bucket_confidence: List[List[float]], bucket_accuracy: List[List[float]]) → Tuple[float, float, float][source]

Computes several metrics used to measure calibration error, including expected calibration error (ECE), maximum calibration error (MCE), and total calibration error (TCE).

pytext.metrics.calibration_metrics.compute_calibration(label_predictions: List[pytext.metrics.LabelPrediction]) → Tuple[float, float, float][source]
pytext.metrics.calibration_metrics.get_bucket_accuracy(bucket_values: List[List[float]], y_true: List[float], y_pred: List[float]) → List[float][source]

Computes accuracy for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.

pytext.metrics.calibration_metrics.get_bucket_confidence(bucket_values: List[List[float]]) → List[float][source]

Computes average confidence for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.

pytext.metrics.calibration_metrics.get_bucket_scores(y_score: List[float], buckets: int = 10) → Tuple[List[List[float]], List[int]][source]

Organizes real-valued posterior probabilities into buckets. For example, if we have 10 buckets, the probabilities 0.0, 0.1, 0.2 are placed into buckets 0 (0.0 <= p < 0.1), 1 (0.1 <= p < 0.2), and 2 (0.2 <= p < 0.3), respectively.

pytext.metrics.dense_retrieval_metrics module

class pytext.metrics.dense_retrieval_metrics.DenseRetrievalMetrics[source]

Bases: tuple

Metric class for dense passage retrieval.

num_examples

number of samples

Type:int
accuracy

how many times did we get the +ve doc from list of docs

Type:float
average_rank

average rank of positive passage

Type:float
mean_reciprocal_rank

average 1/rank of positive passage

Type:float
accuracy

Alias for field number 1

average_rank

Alias for field number 2

mean_reciprocal_rank

Alias for field number 3

num_examples

Alias for field number 0

print_metrics() → None[source]

pytext.metrics.intent_slot_metrics module

class pytext.metrics.intent_slot_metrics.AllMetrics[source]

Bases: tuple

Aggregated class for intent-slot related metrics.

top_intent_accuracy

Accuracy of the top-level intent.

frame_accuracy

Frame accuracy.

frame_accuracies_by_depth

Frame accuracies bucketized by depth of the gold tree.

bracket_metrics

Bracket metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

tree_metrics

Tree metrics for intents and slots. For details, see the function compute_intent_slot_metrics().

loss

Cross entropy loss.

bracket_metrics

Alias for field number 4

frame_accuracies_by_depth

Alias for field number 3

frame_accuracy

Alias for field number 1

frame_accuracy_top_k

Alias for field number 2

loss

Alias for field number 6

print_metrics() → None[source]
top_intent_accuracy

Alias for field number 0

tree_metrics

Alias for field number 5

pytext.metrics.intent_slot_metrics.FrameAccuraciesByDepth = typing.Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy]

Frame accuracies bucketized by depth of the gold tree.

class pytext.metrics.intent_slot_metrics.FrameAccuracy[source]

Bases: tuple

Frame accuracy for a collection of intent frame predictions.

Frame accuracy means the entire tree structure of the predicted frame matches that of the gold frame.

frame_accuracy

Alias for field number 1

num_samples

Alias for field number 0

class pytext.metrics.intent_slot_metrics.FramePredictionPair[source]

Bases: tuple

Pair of predicted and gold intent frames.

expected_frame

Alias for field number 1

predicted_frame

Alias for field number 0

class pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]

Bases: tuple

Aggregated class for intent and slot confusions.

intent_confusions

Confusion counts for intents.

slot_confusions

Confusion counts for slots.

intent_confusions

Alias for field number 0

slot_confusions

Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]

Bases: tuple

Precision/recall/F1 metrics for intents and slots.

intent_metrics

Precision/recall/F1 metrics for intents.

slot_metrics

Precision/recall/F1 metrics for slots.

overall_metrics

Combined precision/recall/F1 metrics for all nodes (merging intents and slots).

intent_metrics

Alias for field number 0

overall_metrics

Alias for field number 2

print_metrics() → None[source]
slot_metrics

Alias for field number 1

class pytext.metrics.intent_slot_metrics.IntentsAndSlots[source]

Bases: tuple

Collection of intents and slots in an intent frame.

intents

Alias for field number 0

slots

Alias for field number 1

class pytext.metrics.intent_slot_metrics.Node(label: str, span: pytext.data.data_structures.node.Span, children: Optional[AbstractSet[Node]] = None, text: str = None)[source]

Bases: pytext.data.data_structures.node.Node

Subclass of the base Node class, used for metric purposes. It is immutable so that hashing can be done on the class.

label

Label of the node.

Type:str
span

Span of the node.

Type:Span
children

frozenset of the node’s children, left empty when computing bracketing metrics.

Type:frozenset of Node
text

Text the node covers (=utterance[span.start:span.end])

Type:str
class pytext.metrics.intent_slot_metrics.NodesPredictionPair[source]

Bases: tuple

Pair of predicted and expected sets of nodes.

expected_nodes

Alias for field number 1

predicted_nodes

Alias for field number 0

pytext.metrics.intent_slot_metrics.compare_frames(predicted_frame: pytext.metrics.intent_slot_metrics.Node, expected_frame: pytext.metrics.intent_slot_metrics.Node, tree_based: bool, intent_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None, slot_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None) → pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]

Compares two intent frames and returns TP, FP, FN counts for intents and slots. Optionally collects the per label TP, FP, FN counts.

Parameters:
  • predicted_frame – Predicted intent frame.
  • expected_frame – Gold intent frame.
  • tree_based – Whether to get the tree-based confusions (if True) or bracket-based confusions (if False). For details, see the function compute_intent_slot_metrics().
  • intent_per_label_confusions – If provided, update the per label confusions for intents as well. Defaults to None.
  • slot_per_label_confusions – If provided, update the per label confusions for slots as well. Defaults to None.
Returns:

IntentSlotConfusions, containing confusion counts for intents and slots.

pytext.metrics.intent_slot_metrics.compute_all_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], top_intent_accuracy: bool = True, frame_accuracy: bool = True, frame_accuracies_by_depth: bool = True, bracket_metrics: bool = True, tree_metrics: bool = True, overall_metrics: bool = False, all_predicted_frames: List[List[pytext.metrics.intent_slot_metrics.Node]] = None, calculated_loss: float = None, length_metrics: Dict[KT, VT] = None) → pytext.metrics.intent_slot_metrics.AllMetrics[source]

Given a list of predicted and gold intent frames, computes intent-slot related metrics.

Parameters:
  • frame_pairs – List of predicted and gold intent frames.
  • top_intent_accuracy – Whether to compute top intent accuracy or not. Defaults to True.
  • frame_accuracy – Whether to compute frame accuracy or not. Defaults to True.
  • frame_accuracies_by_depth – Whether to compute frame accuracies by depth or not. Defaults to True.
  • bracket_metrics – Whether to compute bracket metrics or not. Defaults to True.
  • tree_metrics – Whether to compute tree metrics or not. Defaults to True.
  • overall_metrics – If bracket_metrics or tree_metrics is true, decides whether to compute overall (merging intents and slots) metrics for them. Defaults to False.
Returns:

AllMetrics which contains intent-slot related metrics.

pytext.metrics.intent_slot_metrics.compute_frame_accuracies_by_depth(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy][source]

Given a list of predicted and gold intent frames, splits the predictions into buckets according to the depth of the gold trees, and computes frame accuracy for each bucket.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:FrameAccuraciesByDepth, a map from depths to their corresponding frame accuracies.
pytext.metrics.intent_slot_metrics.compute_frame_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]

Computes frame accuracy given a list of predicted and gold intent frames.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:Frame accuracy. For a prediction, frame accuracy is achieved if the entire tree structure of the predicted frame matches that of the gold frame.
pytext.metrics.intent_slot_metrics.compute_frame_accuracy_top_k(frame_pairs: List[pytext.metrics.intent_slot_metrics.FramePredictionPair], all_frames: List[List[pytext.metrics.intent_slot_metrics.Node]]) → float[source]
pytext.metrics.intent_slot_metrics.compute_intent_slot_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], tree_based: bool, overall_metrics: bool = True) → pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]

Given a list of predicted and gold intent frames, computes precision, recall and F1 metrics for intents and slots, either in tree-based or bracket-based manner.

The following assumptions are taken on intent frames: 1. The root node is an intent, 2. Children of intents are always slots, and children of slots are always intents.

For tree-based metrics, a node (an intent or slot) in the predicted frame is considered a true positive only if the subtree rooted at this node has an exact copy in the gold frame, otherwise it is considered a false positive. A false negative is a node in the gold frame that does not have an exact subtree match in the predicted frame.

For bracket-based metrics, a node in the predicted frame is considered a true positive if there is a node in the gold frame having the same label and span (but not necessarily the same children). The definitions of false positives and false negatives are similar to the above.

Parameters:
  • frame_pairs – List of predicted and gold intent frames.
  • tree_based – Whether to compute tree-based metrics (if True) or bracket-based metrics (if False).
  • overall_metrics – Whether to compute overall (merging intents and slots) metrics or not. Defaults to True.
Returns:

IntentSlotMetrics, containing precision/recall/F1 metrics for intents and slots.

pytext.metrics.intent_slot_metrics.compute_metric_at_k(references: List[pytext.metrics.intent_slot_metrics.Node], hypothesis: List[List[pytext.metrics.intent_slot_metrics.Node]], metric_fn: Callable[[pytext.metrics.intent_slot_metrics.Node, pytext.metrics.intent_slot_metrics.Node], bool] = <function <lambda>>) → List[float][source]

Computes a boolean metric at each position in the ranked list of hypothesis, and returns an average for each position over all examples. By default metric_fn is comparing if frames are equal.

pytext.metrics.intent_slot_metrics.compute_prf1_metrics(nodes_pairs: Sequence[pytext.metrics.intent_slot_metrics.NodesPredictionPair]) → Tuple[pytext.metrics.AllConfusions, pytext.metrics.PRF1Metrics][source]

Computes precision/recall/F1 metrics given a list of predicted and expected sets of nodes.

Parameters:nodes_pairs – List of predicted and expected node sets.
Returns:A tuple, of which the first member contains the confusion information, and the second member contains the computed precision/recall/F1 metrics.
pytext.metrics.intent_slot_metrics.compute_top_intent_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]

Computes accuracy of the top-level intent.

Parameters:frame_pairs – List of predicted and gold intent frames.
Returns:Prediction accuracy of the top-level intent.

pytext.metrics.language_model_metrics module

class pytext.metrics.language_model_metrics.LanguageModelMetric[source]

Bases: tuple

Class for language model metrics.

perplexity_per_word

Average perplexity per word of the dataset.

perplexity_per_word

Alias for field number 0

print_metrics()[source]
pytext.metrics.language_model_metrics.compute_language_model_metric(loss_per_word: float) → pytext.metrics.language_model_metrics.LanguageModelMetric[source]

pytext.metrics.mask_metrics module

pytext.metrics.seq2seq_metrics module

class pytext.metrics.seq2seq_metrics.Seq2SeqMetrics(loss, exact_match, f1, bleu)[source]

Bases: tuple

bleu

Alias for field number 3

exact_match

Alias for field number 1

f1

Alias for field number 2

loss

Alias for field number 0

print_metrics() → None[source]
class pytext.metrics.seq2seq_metrics.Seq2SeqTopKMetrics[source]

Bases: pytext.metrics.seq2seq_metrics.Seq2SeqMetrics

print_metrics() → None[source]
pytext.metrics.seq2seq_metrics.compute_f1(hypothesis_list, reference_list, eps=1e-08)[source]

Computes token F1 given a hypothesis and reference. This is defined as F1 = 2 * ((P * R) / (P + R + eps)) where P = precision, R = recall, and eps = epsilon for smoothing zero denominators. By default, eps = 1e-8.

pytext.metrics.squad_metrics module

class pytext.metrics.squad_metrics.SquadMetrics(classification_metrics, num_examples, exact_matches, f1_score)[source]

Bases: tuple

classification_metrics

Alias for field number 0

exact_matches

Alias for field number 2

f1_score

Alias for field number 3

num_examples

Alias for field number 1

print_metrics() → None[source]

Module contents

class pytext.metrics.AllConfusions[source]

Bases: object

Aggregated class for per label confusions.

per_label_confusions

Per label confusion information.

confusions

Overall TP, FP and FN counts across the labels in per_label_confusions.

compute_metrics() → pytext.metrics.PRF1Metrics[source]
confusions
per_label_confusions
class pytext.metrics.ClassificationMetrics[source]

Bases: tuple

Metric class for various classification metrics.

accuracy

Overall accuracy of predictions.

macro_prf1_metrics

Macro precision/recall/F1 scores.

per_label_soft_scores

Per label soft metrics.

mcc

Matthews correlation coefficient.

roc_auc

Area under the Receiver Operating Characteristic curve.

loss

Training loss (only used for selecting best model, no need to print).

accuracy

Alias for field number 0

loss

Alias for field number 5

macro_prf1_metrics

Alias for field number 1

mcc

Alias for field number 3

per_label_soft_scores

Alias for field number 2

print_metrics(report_pep=False) → None[source]
print_pep()[source]
roc_auc

Alias for field number 4

class pytext.metrics.Confusions(TP: int = 0, FP: int = 0, FN: int = 0)[source]

Bases: object

Confusion information for a collection of predictions.

TP

Number of true positives.

FP

Number of false positives.

FN

Number of false negatives.

FN
FP
TP
compute_metrics() → pytext.metrics.PRF1Scores[source]
class pytext.metrics.LabelListPrediction[source]

Bases: tuple

Label list predictions of an example.

label_scores

Confidence scores that each label receives.

predicted_label

List of indices of the predicted label.

expected_label

List of indices of the true label.

expected_label

Alias for field number 2

label_scores

Alias for field number 0

predicted_label

Alias for field number 1

class pytext.metrics.LabelPrediction[source]

Bases: tuple

Label predictions of an example.

label_scores

Confidence scores that each label receives.

predicted_label

Index of the predicted label. This is usually the label with the highest confidence score in label_scores.

expected_label

Index of the true label.

expected_label

Alias for field number 2

label_scores

Alias for field number 0

predicted_label

Alias for field number 1

class pytext.metrics.MacroPRF1Metrics[source]

Bases: tuple

Aggregated metric class for macro precision/recall/F1 scores.

per_label_scores

Mapping from label string to the corresponding precision/recall/F1 scores.

macro_scores

Macro precision/recall/F1 scores across the labels in per_label_scores.

macro_scores

Alias for field number 1

per_label_scores

Alias for field number 0

print_metrics(indentation='') → None[source]
class pytext.metrics.MacroPRF1Scores[source]

Bases: tuple

Macro precision/recall/F1 scores (averages across each label).

num_label

Number of distinct labels.

precision

Equally weighted average of precisions for each label.

recall

Equally weighted average of recalls for each label.

f1

Equally weighted average of F1 scores for each label.

f1

Alias for field number 3

num_labels

Alias for field number 0

precision

Alias for field number 1

recall

Alias for field number 2

class pytext.metrics.MultiLabelSoftClassificationMetrics[source]

Bases: tuple

Classification scores that are independent of thresholds.

average_label_precision

Alias for field number 0

average_label_recall

Alias for field number 2

average_overall_accuracy

Alias for field number 11

average_overall_auc

Alias for field number 9

average_overall_precision

Alias for field number 1

average_overall_recall

Alias for field number 3

decision_thresh_at_precision

Alias for field number 5

decision_thresh_at_recall

Alias for field number 7

label_accuracy

Alias for field number 10

precision_at_recall

Alias for field number 6

recall_at_precision

Alias for field number 4

roc_auc

Alias for field number 8

pytext.metrics.PRECISION_AT_RECALL_THRESHOLDS = [0.2, 0.4, 0.6, 0.8, 0.9]

Basic metric classes and functions for single-label prediction problems. Extending to multi-label support

class pytext.metrics.PRF1Metrics[source]

Bases: tuple

Metric class for all types of precision/recall/F1 scores.

per_label_scores

Map from label string to the corresponding precision/recall/F1 scores.

macro_scores

Macro precision/recall/F1 scores across the labels in per_label_scores.

micro_scores

Micro (regular) precision/recall/F1 scores for the same collection of predictions.

macro_scores

Alias for field number 1

micro_scores

Alias for field number 2

per_label_scores

Alias for field number 0

print_metrics() → None[source]
class pytext.metrics.PRF1Scores[source]

Bases: tuple

Precision/recall/F1 scores for a collection of predictions.

true_positives

Number of true positives.

false_positives

Number of false positives.

false_negatives

Number of false negatives.

precision

TP / (TP + FP).

recall

TP / (TP + FN).

f1

2 * TP / (2 * TP + FP + FN).

f1

Alias for field number 5

false_negatives

Alias for field number 2

false_positives

Alias for field number 1

precision

Alias for field number 3

recall

Alias for field number 4

true_positives

Alias for field number 0

class pytext.metrics.PairwiseRankingMetrics[source]

Bases: tuple

Metric class for pairwise ranking

num_examples

number of samples

Type:int
accuracy

how many times did we rank in the correct order

Type:float
average_score_difference

average score(higherRank) - score(lowerRank)

Type:float
accuracy

Alias for field number 1

average_score_difference

Alias for field number 2

num_examples

Alias for field number 0

print_metrics() → None[source]
class pytext.metrics.PerLabelConfusions[source]

Bases: object

Per label confusion information.

label_confusions_map

Map from label string to the corresponding confusion counts.

compute_metrics() → pytext.metrics.MacroPRF1Metrics[source]
label_confusions_map
update(label: str, item: str, count: int) → None[source]

Increase one of TP, FP or FN count for a label by certain amount.

Parameters:
  • label – Label to be modified.
  • item – Type of count to be modified, should be one of “TP”, “FP” or “FN”.
  • count – Amount to be added to the count.
Returns:

None

class pytext.metrics.RealtimeMetrics[source]

Bases: tuple

Realtime Metrics for tracking training progress and performance.

samples

number of samples

Type:int
tps

tokens per second

Type:float
ups

updates per second

Type:float
samples

Alias for field number 0

tps

Alias for field number 1

ups

Alias for field number 2

class pytext.metrics.RegressionMetrics[source]

Bases: tuple

Metrics for regression tasks.

num_examples

number of examples

Type:int
pearson_correlation

correlation between predictions and labels

Type:float
mse

mean-squared error between predictions and labels

Type:float
mse

Alias for field number 2

num_examples

Alias for field number 0

pearson_correlation

Alias for field number 1

print_metrics()[source]
class pytext.metrics.SoftClassificationMetrics[source]

Bases: tuple

Classification scores that are independent of thresholds.

average_precision

Alias for field number 0

decision_thresh_at_precision

Alias for field number 2

decision_thresh_at_recall

Alias for field number 4

precision_at_recall

Alias for field number 3

recall_at_precision

Alias for field number 1

roc_auc

Alias for field number 5

pytext.metrics.average_precision_score(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray) → float[source]

Computes average precision, which summarizes the precision-recall curve as the precisions achieved at each threshold weighted by the increase in recall since the previous threshold.

Parameters:
  • y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
  • Numpy array of confidence scores for the predictions in (y_score_sorted) – decreasing order.
Returns:

Average precision score.

TODO: This is too slow, improve the performance

pytext.metrics.compute_average_recall(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], average_precisions: Dict[str, float]) → float[source]
pytext.metrics.compute_classification_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]

A general function that computes classification metrics given a list of label predictions.

Parameters:
  • predictions – Label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_macro_avg(soft_metrics: Dict[str, pytext.metrics.SoftClassificationMetrics], metric: str)[source]
pytext.metrics.compute_matthews_correlation_coefficients(TP: int, FP: int, FN: int, TN: int) → float[source]

Computes Matthews correlation coefficient, a way to summarize all four counts (TP, FP, FN, TN) in the confusion matrix of binary classification.

Parameters:
  • TP – Number of true positives.
  • FP – Number of false positives.
  • FN – Number of false negatives.
  • TN – Number of true negatives.
Returns:

Matthews correlation coefficient, which is sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).

pytext.metrics.compute_multi_label_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]

A general function that computes classification metrics given a list of multi-label predictions.

Parameters:
  • predictions – multi-label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_multi_label_full_vector_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]

A general function that computes classification metrics given a list of multi-label predictions.

Parameters:
  • predictions – multi-label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

ClassificationMetrics which contains various classification metrics.

pytext.metrics.compute_multi_label_multi_class_soft_metrics(predictions: Sequence[Sequence[pytext.metrics.LabelPrediction]], label_names: Sequence[str], label_vocabs: Sequence[Sequence[str]], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.MultiLabelSoftClassificationMetrics[source]

Computes multi-label soft classification metrics with multi-class accommodation

Parameters:
  • predictions – multi-label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_multi_label_soft_full_vector_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]

Computes multi-label soft classification metrics

Parameters:
  • predictions – multi-label predictions, including the confidence score for each label.
  • label_names – Indexed label names. May contain duplicate label names.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_multi_label_soft_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]

Computes multi-label soft classification metrics

Parameters:
  • predictions – multi-label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

Dict from label strings to their corresponding soft metrics.

pytext.metrics.compute_pairwise_ranking_metrics(predictions: Sequence[int], scores: Sequence[float]) → pytext.metrics.PairwiseRankingMetrics[source]

Computes metrics for pairwise ranking given sequences of predictions and scores

Parameters:
  • predictions – 1 if ranking was correct, 0 if ranking was incorrect
  • scores – score(higher-ranked-sample) - score(lower-ranked-sample)
Returns:

PairwiseRankingMetrics object

pytext.metrics.compute_prf1(tp: int, fp: int, fn: int) → Tuple[float, float, float][source]
pytext.metrics.compute_regression_metrics(predictions: Sequence[float], targets: Sequence[float]) → pytext.metrics.RegressionMetrics[source]

Computes metrics for regression tasks.abs

Parameters:
  • predictions – 1-D sequence of float predictions
  • targets – 1-D sequence of float labels
Returns:

RegressionMetrics object

pytext.metrics.compute_roc_auc(predictions: Sequence[pytext.metrics.LabelPrediction], target_class: int = 0) → Optional[float][source]

Computes area under the Receiver Operating Characteristic curve, for binary classification. Implementation based off of (and explained at) https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en.

pytext.metrics.compute_roc_auc_given_sorted_positives(y_true_sorted: numpy.ndarray) → Optional[float][source]
pytext.metrics.compute_soft_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]

Computes soft classification metrics given a list of label predictions.

Parameters:
  • predictions – Label predictions, including the confidence score for each label.
  • label_names – Indexed label names.
  • recall_at_precision_thresholds – precision thresholds at which to calculate recall
  • precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns:

Dict from label strings to their corresponding soft metrics.

pytext.metrics.precision_at_recall(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Tuple[Dict[float, float], Dict[float, float]][source]

Computes precision at various recall levels

Parameters:
  • y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
  • y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
  • thresholds – Sequence of floats indicating the requested recall thresholds
Returns:

Dictionary of maximum precision at requested recall thresholds. Dictionary of decision thresholds resulting in max precision at requested recall thresholds.

pytext.metrics.recall_at_precision(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Dict[float, float][source]

Computes recall at various precision levels

Parameters:
  • y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
  • y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
  • thresholds – Sequence of floats indicating the requested precision thresholds
Returns:

Dictionary of maximum recall at requested precision thresholds.

pytext.metrics.safe_division(n: Union[int, float], d: int) → float[source]
pytext.metrics.sort_by_score(y_true_list: Sequence[bool], y_score_list: Sequence[float])[source]