pytext.metrics package¶
Submodules¶
pytext.metrics.calibration_metrics module¶
-
class
pytext.metrics.calibration_metrics.AllCalibrationMetrics(calibration_metrics)[source]¶ Bases:
tuple-
calibration_metrics¶ Alias for field number 0
-
-
class
pytext.metrics.calibration_metrics.CalibrationMetrics(expected_error, max_error, total_error)[source]¶ Bases:
tuple-
expected_error¶ Alias for field number 0
-
max_error¶ Alias for field number 1
-
total_error¶ Alias for field number 2
-
-
pytext.metrics.calibration_metrics.calculate_error(n_samples: int, bucket_values: List[List[float]], bucket_confidence: List[List[float]], bucket_accuracy: List[List[float]]) → Tuple[float, float, float][source]¶ Computes several metrics used to measure calibration error, including expected calibration error (ECE), maximum calibration error (MCE), and total calibration error (TCE).
-
pytext.metrics.calibration_metrics.compute_calibration(label_predictions: List[pytext.metrics.LabelPrediction]) → Tuple[float, float, float][source]¶
-
pytext.metrics.calibration_metrics.get_bucket_accuracy(bucket_values: List[List[float]], y_true: List[float], y_pred: List[float]) → List[float][source]¶ Computes accuracy for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.
-
pytext.metrics.calibration_metrics.get_bucket_confidence(bucket_values: List[List[float]]) → List[float][source]¶ Computes average confidence for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.
-
pytext.metrics.calibration_metrics.get_bucket_scores(y_score: List[float], buckets: int = 10) → Tuple[List[List[float]], List[int]][source]¶ Organizes real-valued posterior probabilities into buckets. For example, if we have 10 buckets, the probabilities 0.0, 0.1, 0.2 are placed into buckets 0 (0.0 <= p < 0.1), 1 (0.1 <= p < 0.2), and 2 (0.2 <= p < 0.3), respectively.
pytext.metrics.dense_retrieval_metrics module¶
-
class
pytext.metrics.dense_retrieval_metrics.DenseRetrievalMetrics[source]¶ Bases:
tupleMetric class for dense passage retrieval.
-
num_examples¶ number of samples
Type: int
-
accuracy¶ how many times did we get the +ve doc from list of docs
Type: float
-
average_rank¶ average rank of positive passage
Type: float
-
mean_reciprocal_rank¶ average 1/rank of positive passage
Type: float
-
accuracy Alias for field number 1
-
average_rank Alias for field number 2
-
mean_reciprocal_rank Alias for field number 3
-
num_examples Alias for field number 0
-
pytext.metrics.intent_slot_metrics module¶
-
class
pytext.metrics.intent_slot_metrics.AllMetrics[source]¶ Bases:
tupleAggregated class for intent-slot related metrics.
-
top_intent_accuracy¶ Accuracy of the top-level intent.
-
frame_accuracy¶ Frame accuracy.
-
frame_accuracies_by_depth¶ Frame accuracies bucketized by depth of the gold tree.
-
bracket_metrics¶ Bracket metrics for intents and slots. For details, see the function compute_intent_slot_metrics().
-
tree_metrics¶ Tree metrics for intents and slots. For details, see the function compute_intent_slot_metrics().
-
loss¶ Cross entropy loss.
-
bracket_metrics Alias for field number 4
-
frame_accuracies_by_depth Alias for field number 3
-
frame_accuracy Alias for field number 1
-
frame_accuracy_top_k¶ Alias for field number 2
-
loss Alias for field number 6
-
top_intent_accuracy Alias for field number 0
-
tree_metrics Alias for field number 5
-
-
pytext.metrics.intent_slot_metrics.FrameAccuraciesByDepth= typing.Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy]¶ Frame accuracies bucketized by depth of the gold tree.
-
class
pytext.metrics.intent_slot_metrics.FrameAccuracy[source]¶ Bases:
tupleFrame accuracy for a collection of intent frame predictions.
Frame accuracy means the entire tree structure of the predicted frame matches that of the gold frame.
-
frame_accuracy¶ Alias for field number 1
-
num_samples¶ Alias for field number 0
-
-
class
pytext.metrics.intent_slot_metrics.FramePredictionPair[source]¶ Bases:
tuplePair of predicted and gold intent frames.
-
expected_frame¶ Alias for field number 1
-
predicted_frame¶ Alias for field number 0
-
-
class
pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]¶ Bases:
tupleAggregated class for intent and slot confusions.
-
intent_confusions¶ Confusion counts for intents.
-
slot_confusions¶ Confusion counts for slots.
-
intent_confusions Alias for field number 0
-
slot_confusions Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]¶ Bases:
tuplePrecision/recall/F1 metrics for intents and slots.
-
intent_metrics¶ Precision/recall/F1 metrics for intents.
-
slot_metrics¶ Precision/recall/F1 metrics for slots.
-
overall_metrics¶ Combined precision/recall/F1 metrics for all nodes (merging intents and slots).
-
intent_metrics Alias for field number 0
-
overall_metrics Alias for field number 2
-
slot_metrics Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.IntentsAndSlots[source]¶ Bases:
tupleCollection of intents and slots in an intent frame.
-
intents¶ Alias for field number 0
-
slots¶ Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.Node(label: str, span: pytext.data.data_structures.node.Span, children: Optional[AbstractSet[Node]] = None, text: str = None)[source]¶ Bases:
pytext.data.data_structures.node.NodeSubclass of the base Node class, used for metric purposes. It is immutable so that hashing can be done on the class.
-
label¶ Label of the node.
Type: str
-
span¶ Span of the node.
Type: Span
-
children¶ frozenset of the node’s children, left empty when computing bracketing metrics.
Type: frozensetofNode
-
text¶ Text the node covers (=utterance[span.start:span.end])
Type: str
-
-
class
pytext.metrics.intent_slot_metrics.NodesPredictionPair[source]¶ Bases:
tuplePair of predicted and expected sets of nodes.
-
expected_nodes¶ Alias for field number 1
-
predicted_nodes¶ Alias for field number 0
-
-
pytext.metrics.intent_slot_metrics.compare_frames(predicted_frame: pytext.metrics.intent_slot_metrics.Node, expected_frame: pytext.metrics.intent_slot_metrics.Node, tree_based: bool, intent_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None, slot_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None) → pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]¶ Compares two intent frames and returns TP, FP, FN counts for intents and slots. Optionally collects the per label TP, FP, FN counts.
Parameters: - predicted_frame – Predicted intent frame.
- expected_frame – Gold intent frame.
- tree_based – Whether to get the tree-based confusions (if True) or bracket-based confusions (if False). For details, see the function compute_intent_slot_metrics().
- intent_per_label_confusions – If provided, update the per label confusions for intents as well. Defaults to None.
- slot_per_label_confusions – If provided, update the per label confusions for slots as well. Defaults to None.
Returns: IntentSlotConfusions, containing confusion counts for intents and slots.
-
pytext.metrics.intent_slot_metrics.compute_all_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], top_intent_accuracy: bool = True, frame_accuracy: bool = True, frame_accuracies_by_depth: bool = True, bracket_metrics: bool = True, tree_metrics: bool = True, overall_metrics: bool = False, all_predicted_frames: List[List[pytext.metrics.intent_slot_metrics.Node]] = None, calculated_loss: float = None, length_metrics: Dict[KT, VT] = None) → pytext.metrics.intent_slot_metrics.AllMetrics[source]¶ Given a list of predicted and gold intent frames, computes intent-slot related metrics.
Parameters: - frame_pairs – List of predicted and gold intent frames.
- top_intent_accuracy – Whether to compute top intent accuracy or not. Defaults to True.
- frame_accuracy – Whether to compute frame accuracy or not. Defaults to True.
- frame_accuracies_by_depth – Whether to compute frame accuracies by depth or not. Defaults to True.
- bracket_metrics – Whether to compute bracket metrics or not. Defaults to True.
- tree_metrics – Whether to compute tree metrics or not. Defaults to True.
- overall_metrics – If bracket_metrics or tree_metrics is true, decides whether to compute overall (merging intents and slots) metrics for them. Defaults to False.
Returns: AllMetrics which contains intent-slot related metrics.
-
pytext.metrics.intent_slot_metrics.compute_frame_accuracies_by_depth(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy][source]¶ Given a list of predicted and gold intent frames, splits the predictions into buckets according to the depth of the gold trees, and computes frame accuracy for each bucket.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: FrameAccuraciesByDepth, a map from depths to their corresponding frame accuracies.
-
pytext.metrics.intent_slot_metrics.compute_frame_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶ Computes frame accuracy given a list of predicted and gold intent frames.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: Frame accuracy. For a prediction, frame accuracy is achieved if the entire tree structure of the predicted frame matches that of the gold frame.
-
pytext.metrics.intent_slot_metrics.compute_frame_accuracy_top_k(frame_pairs: List[pytext.metrics.intent_slot_metrics.FramePredictionPair], all_frames: List[List[pytext.metrics.intent_slot_metrics.Node]]) → float[source]¶
-
pytext.metrics.intent_slot_metrics.compute_intent_slot_metrics(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], tree_based: bool, overall_metrics: bool = True) → pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]¶ Given a list of predicted and gold intent frames, computes precision, recall and F1 metrics for intents and slots, either in tree-based or bracket-based manner.
The following assumptions are taken on intent frames: 1. The root node is an intent, 2. Children of intents are always slots, and children of slots are always intents.
For tree-based metrics, a node (an intent or slot) in the predicted frame is considered a true positive only if the subtree rooted at this node has an exact copy in the gold frame, otherwise it is considered a false positive. A false negative is a node in the gold frame that does not have an exact subtree match in the predicted frame.
For bracket-based metrics, a node in the predicted frame is considered a true positive if there is a node in the gold frame having the same label and span (but not necessarily the same children). The definitions of false positives and false negatives are similar to the above.
Parameters: - frame_pairs – List of predicted and gold intent frames.
- tree_based – Whether to compute tree-based metrics (if True) or bracket-based metrics (if False).
- overall_metrics – Whether to compute overall (merging intents and slots) metrics or not. Defaults to True.
Returns: IntentSlotMetrics, containing precision/recall/F1 metrics for intents and slots.
-
pytext.metrics.intent_slot_metrics.compute_metric_at_k(references: List[pytext.metrics.intent_slot_metrics.Node], hypothesis: List[List[pytext.metrics.intent_slot_metrics.Node]], metric_fn: Callable[[pytext.metrics.intent_slot_metrics.Node, pytext.metrics.intent_slot_metrics.Node], bool] = <function <lambda>>) → List[float][source]¶ Computes a boolean metric at each position in the ranked list of hypothesis, and returns an average for each position over all examples. By default metric_fn is comparing if frames are equal.
-
pytext.metrics.intent_slot_metrics.compute_prf1_metrics(nodes_pairs: Sequence[pytext.metrics.intent_slot_metrics.NodesPredictionPair]) → Tuple[pytext.metrics.AllConfusions, pytext.metrics.PRF1Metrics][source]¶ Computes precision/recall/F1 metrics given a list of predicted and expected sets of nodes.
Parameters: nodes_pairs – List of predicted and expected node sets. Returns: A tuple, of which the first member contains the confusion information, and the second member contains the computed precision/recall/F1 metrics.
-
pytext.metrics.intent_slot_metrics.compute_top_intent_accuracy(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶ Computes accuracy of the top-level intent.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: Prediction accuracy of the top-level intent.
pytext.metrics.language_model_metrics module¶
pytext.metrics.mask_metrics module¶
pytext.metrics.seq2seq_metrics module¶
-
class
pytext.metrics.seq2seq_metrics.Seq2SeqMetrics(loss, exact_match, f1, bleu)[source]¶ Bases:
tuple-
bleu¶ Alias for field number 3
-
exact_match¶ Alias for field number 1
-
f1¶ Alias for field number 2
-
loss¶ Alias for field number 0
-
-
pytext.metrics.seq2seq_metrics.compute_f1(hypothesis_list, reference_list, eps=1e-08)[source]¶ Computes token F1 given a hypothesis and reference. This is defined as F1 = 2 * ((P * R) / (P + R + eps)) where P = precision, R = recall, and eps = epsilon for smoothing zero denominators. By default, eps = 1e-8.
pytext.metrics.squad_metrics module¶
Module contents¶
-
class
pytext.metrics.AllConfusions[source]¶ Bases:
objectAggregated class for per label confusions.
-
per_label_confusions¶ Per label confusion information.
-
confusions¶ Overall TP, FP and FN counts across the labels in per_label_confusions.
-
confusions
-
per_label_confusions
-
-
class
pytext.metrics.ClassificationMetrics[source]¶ Bases:
tupleMetric class for various classification metrics.
-
accuracy¶ Overall accuracy of predictions.
-
macro_prf1_metrics¶ Macro precision/recall/F1 scores.
-
per_label_soft_scores¶ Per label soft metrics.
-
mcc¶ Matthews correlation coefficient.
-
roc_auc¶ Area under the Receiver Operating Characteristic curve.
-
loss¶ Training loss (only used for selecting best model, no need to print).
-
accuracy Alias for field number 0
-
loss Alias for field number 5
-
macro_prf1_metrics Alias for field number 1
-
mcc Alias for field number 3
-
per_label_soft_scores Alias for field number 2
-
roc_auc Alias for field number 4
-
-
class
pytext.metrics.Confusions(TP: int = 0, FP: int = 0, FN: int = 0)[source]¶ Bases:
objectConfusion information for a collection of predictions.
-
TP¶ Number of true positives.
-
FP¶ Number of false positives.
-
FN¶ Number of false negatives.
-
FN
-
FP
-
TP
-
-
class
pytext.metrics.LabelListPrediction[source]¶ Bases:
tupleLabel list predictions of an example.
-
label_scores¶ Confidence scores that each label receives.
-
predicted_label¶ List of indices of the predicted label.
-
expected_label¶ List of indices of the true label.
-
expected_label Alias for field number 2
-
label_scores Alias for field number 0
-
predicted_label Alias for field number 1
-
-
class
pytext.metrics.LabelPrediction[source]¶ Bases:
tupleLabel predictions of an example.
-
label_scores¶ Confidence scores that each label receives.
-
predicted_label¶ Index of the predicted label. This is usually the label with the highest confidence score in label_scores.
-
expected_label¶ Index of the true label.
-
expected_label Alias for field number 2
-
label_scores Alias for field number 0
-
predicted_label Alias for field number 1
-
-
class
pytext.metrics.MacroPRF1Metrics[source]¶ Bases:
tupleAggregated metric class for macro precision/recall/F1 scores.
-
per_label_scores¶ Mapping from label string to the corresponding precision/recall/F1 scores.
-
macro_scores¶ Macro precision/recall/F1 scores across the labels in per_label_scores.
-
macro_scores Alias for field number 1
-
per_label_scores Alias for field number 0
-
-
class
pytext.metrics.MacroPRF1Scores[source]¶ Bases:
tupleMacro precision/recall/F1 scores (averages across each label).
-
num_label¶ Number of distinct labels.
-
precision¶ Equally weighted average of precisions for each label.
-
recall¶ Equally weighted average of recalls for each label.
-
f1¶ Equally weighted average of F1 scores for each label.
-
f1 Alias for field number 3
-
num_labels¶ Alias for field number 0
-
precision Alias for field number 1
-
recall Alias for field number 2
-
-
class
pytext.metrics.MultiLabelSoftClassificationMetrics[source]¶ Bases:
tupleClassification scores that are independent of thresholds.
-
average_label_precision¶ Alias for field number 0
-
average_label_recall¶ Alias for field number 2
-
average_overall_accuracy¶ Alias for field number 11
-
average_overall_auc¶ Alias for field number 9
-
average_overall_precision¶ Alias for field number 1
-
average_overall_recall¶ Alias for field number 3
-
decision_thresh_at_precision¶ Alias for field number 5
-
decision_thresh_at_recall¶ Alias for field number 7
-
label_accuracy¶ Alias for field number 10
-
precision_at_recall¶ Alias for field number 6
-
recall_at_precision¶ Alias for field number 4
-
roc_auc¶ Alias for field number 8
-
-
pytext.metrics.PRECISION_AT_RECALL_THRESHOLDS= [0.2, 0.4, 0.6, 0.8, 0.9]¶ Basic metric classes and functions for single-label prediction problems. Extending to multi-label support
-
class
pytext.metrics.PRF1Metrics[source]¶ Bases:
tupleMetric class for all types of precision/recall/F1 scores.
-
per_label_scores¶ Map from label string to the corresponding precision/recall/F1 scores.
-
macro_scores¶ Macro precision/recall/F1 scores across the labels in per_label_scores.
-
micro_scores¶ Micro (regular) precision/recall/F1 scores for the same collection of predictions.
-
macro_scores Alias for field number 1
-
micro_scores Alias for field number 2
-
per_label_scores Alias for field number 0
-
-
class
pytext.metrics.PRF1Scores[source]¶ Bases:
tuplePrecision/recall/F1 scores for a collection of predictions.
-
true_positives¶ Number of true positives.
-
false_positives¶ Number of false positives.
-
false_negatives¶ Number of false negatives.
-
precision¶ TP / (TP + FP).
-
recall¶ TP / (TP + FN).
-
f1¶ 2 * TP / (2 * TP + FP + FN).
-
f1 Alias for field number 5
-
false_negatives Alias for field number 2
-
false_positives Alias for field number 1
-
precision Alias for field number 3
-
recall Alias for field number 4
-
true_positives Alias for field number 0
-
-
class
pytext.metrics.PairwiseRankingMetrics[source]¶ Bases:
tupleMetric class for pairwise ranking
-
num_examples¶ number of samples
Type: int
-
accuracy¶ how many times did we rank in the correct order
Type: float
-
average_score_difference¶ average score(higherRank) - score(lowerRank)
Type: float
-
accuracy Alias for field number 1
-
average_score_difference Alias for field number 2
-
num_examples Alias for field number 0
-
-
class
pytext.metrics.PerLabelConfusions[source]¶ Bases:
objectPer label confusion information.
-
label_confusions_map¶ Map from label string to the corresponding confusion counts.
-
label_confusions_map
-
-
class
pytext.metrics.RealtimeMetrics[source]¶ Bases:
tupleRealtime Metrics for tracking training progress and performance.
-
samples¶ number of samples
Type: int
-
tps¶ tokens per second
Type: float
-
ups¶ updates per second
Type: float
-
samples Alias for field number 0
-
tps Alias for field number 1
-
ups Alias for field number 2
-
-
class
pytext.metrics.RegressionMetrics[source]¶ Bases:
tupleMetrics for regression tasks.
-
num_examples¶ number of examples
Type: int
-
pearson_correlation¶ correlation between predictions and labels
Type: float
-
mse¶ mean-squared error between predictions and labels
Type: float
-
mse Alias for field number 2
-
num_examples Alias for field number 0
-
pearson_correlation Alias for field number 1
-
-
class
pytext.metrics.SoftClassificationMetrics[source]¶ Bases:
tupleClassification scores that are independent of thresholds.
-
average_precision¶ Alias for field number 0
-
decision_thresh_at_precision¶ Alias for field number 2
-
decision_thresh_at_recall¶ Alias for field number 4
-
precision_at_recall¶ Alias for field number 3
-
recall_at_precision¶ Alias for field number 1
-
roc_auc¶ Alias for field number 5
-
-
pytext.metrics.average_precision_score(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray) → float[source]¶ Computes average precision, which summarizes the precision-recall curve as the precisions achieved at each threshold weighted by the increase in recall since the previous threshold.
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- Numpy array of confidence scores for the predictions in (y_score_sorted) – decreasing order.
Returns: Average precision score.
TODO: This is too slow, improve the performance
-
pytext.metrics.compute_average_recall(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], average_precisions: Dict[str, float]) → float[source]¶
-
pytext.metrics.compute_classification_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of label predictions.
Parameters: - predictions – Label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.compute_macro_avg(soft_metrics: Dict[str, pytext.metrics.SoftClassificationMetrics], metric: str)[source]¶
-
pytext.metrics.compute_matthews_correlation_coefficients(TP: int, FP: int, FN: int, TN: int) → float[source]¶ Computes Matthews correlation coefficient, a way to summarize all four counts (TP, FP, FN, TN) in the confusion matrix of binary classification.
Parameters: - TP – Number of true positives.
- FP – Number of false positives.
- FN – Number of false negatives.
- TN – Number of true negatives.
Returns: Matthews correlation coefficient, which is sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).
-
pytext.metrics.compute_multi_label_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of multi-label predictions.
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.compute_multi_label_full_vector_classification_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of multi-label predictions.
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.compute_multi_label_multi_class_soft_metrics(predictions: Sequence[Sequence[pytext.metrics.LabelPrediction]], label_names: Sequence[str], label_vocabs: Sequence[Sequence[str]], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.MultiLabelSoftClassificationMetrics[source]¶ Computes multi-label soft classification metrics with multi-class accommodation
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.compute_multi_label_soft_full_vector_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes multi-label soft classification metrics
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names. May contain duplicate label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.compute_multi_label_soft_metrics(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes multi-label soft classification metrics
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.compute_pairwise_ranking_metrics(predictions: Sequence[int], scores: Sequence[float]) → pytext.metrics.PairwiseRankingMetrics[source]¶ Computes metrics for pairwise ranking given sequences of predictions and scores
Parameters: - predictions – 1 if ranking was correct, 0 if ranking was incorrect
- scores – score(higher-ranked-sample) - score(lower-ranked-sample)
Returns: PairwiseRankingMetrics object
-
pytext.metrics.compute_regression_metrics(predictions: Sequence[float], targets: Sequence[float]) → pytext.metrics.RegressionMetrics[source]¶ Computes metrics for regression tasks.abs
Parameters: - predictions – 1-D sequence of float predictions
- targets – 1-D sequence of float labels
Returns: RegressionMetrics object
-
pytext.metrics.compute_roc_auc(predictions: Sequence[pytext.metrics.LabelPrediction], target_class: int = 0) → Optional[float][source]¶ Computes area under the Receiver Operating Characteristic curve, for binary classification. Implementation based off of (and explained at) https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en.
-
pytext.metrics.compute_roc_auc_given_sorted_positives(y_true_sorted: numpy.ndarray) → Optional[float][source]¶
-
pytext.metrics.compute_soft_metrics(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes soft classification metrics given a list of label predictions.
Parameters: - predictions – Label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.precision_at_recall(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Tuple[Dict[float, float], Dict[float, float]][source]¶ Computes precision at various recall levels
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
- thresholds – Sequence of floats indicating the requested recall thresholds
Returns: Dictionary of maximum precision at requested recall thresholds. Dictionary of decision thresholds resulting in max precision at requested recall thresholds.
-
pytext.metrics.recall_at_precision(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Dict[float, float][source]¶ Computes recall at various precision levels
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
- thresholds – Sequence of floats indicating the requested precision thresholds
Returns: Dictionary of maximum recall at requested precision thresholds.