pytext.metrics package¶
Submodules¶
pytext.metrics.calibration_metrics module¶
-
class
pytext.metrics.calibration_metrics.
AllCalibrationMetrics
(calibration_metrics)[source]¶ Bases:
tuple
-
calibration_metrics
¶ Alias for field number 0
-
-
class
pytext.metrics.calibration_metrics.
CalibrationMetrics
(expected_error, max_error, total_error)[source]¶ Bases:
tuple
-
expected_error
¶ Alias for field number 0
-
max_error
¶ Alias for field number 1
-
total_error
¶ Alias for field number 2
-
-
pytext.metrics.calibration_metrics.
calculate_error
(n_samples: int, bucket_values: List[List[float]], bucket_confidence: List[List[float]], bucket_accuracy: List[List[float]]) → Tuple[float, float, float][source]¶ Computes several metrics used to measure calibration error, including expected calibration error (ECE), maximum calibration error (MCE), and total calibration error (TCE).
-
pytext.metrics.calibration_metrics.
compute_calibration
(label_predictions: List[pytext.metrics.LabelPrediction]) → Tuple[float, float, float][source]¶
-
pytext.metrics.calibration_metrics.
get_bucket_accuracy
(bucket_values: List[List[float]], y_true: List[float], y_pred: List[float]) → List[float][source]¶ Computes accuracy for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.
-
pytext.metrics.calibration_metrics.
get_bucket_confidence
(bucket_values: List[List[float]]) → List[float][source]¶ Computes average confidence for each bucket. If a bucket does not have any predictions, uses -1 as a placeholder.
-
pytext.metrics.calibration_metrics.
get_bucket_scores
(y_score: List[float], buckets: int = 10) → Tuple[List[List[float]], List[int]][source]¶ Organizes real-valued posterior probabilities into buckets. For example, if we have 10 buckets, the probabilities 0.0, 0.1, 0.2 are placed into buckets 0 (0.0 <= p < 0.1), 1 (0.1 <= p < 0.2), and 2 (0.2 <= p < 0.3), respectively.
pytext.metrics.dense_retrieval_metrics module¶
-
class
pytext.metrics.dense_retrieval_metrics.
DenseRetrievalMetrics
[source]¶ Bases:
tuple
Metric class for dense passage retrieval.
-
num_examples
¶ number of samples
Type: int
-
accuracy
¶ how many times did we get the +ve doc from list of docs
Type: float
-
average_rank
¶ average rank of positive passage
Type: float
-
mean_reciprocal_rank
¶ average 1/rank of positive passage
Type: float
-
accuracy
Alias for field number 1
-
average_rank
Alias for field number 2
-
mean_reciprocal_rank
Alias for field number 3
-
num_examples
Alias for field number 0
-
pytext.metrics.intent_slot_metrics module¶
-
class
pytext.metrics.intent_slot_metrics.
AllMetrics
[source]¶ Bases:
tuple
Aggregated class for intent-slot related metrics.
-
top_intent_accuracy
¶ Accuracy of the top-level intent.
-
frame_accuracy
¶ Frame accuracy.
-
frame_accuracies_by_depth
¶ Frame accuracies bucketized by depth of the gold tree.
-
bracket_metrics
¶ Bracket metrics for intents and slots. For details, see the function compute_intent_slot_metrics().
-
tree_metrics
¶ Tree metrics for intents and slots. For details, see the function compute_intent_slot_metrics().
-
loss
¶ Cross entropy loss.
-
bracket_metrics
Alias for field number 4
-
frame_accuracies_by_depth
Alias for field number 3
-
frame_accuracy
Alias for field number 1
-
frame_accuracy_top_k
¶ Alias for field number 2
-
loss
Alias for field number 6
-
top_intent_accuracy
Alias for field number 0
-
tree_metrics
Alias for field number 5
-
-
pytext.metrics.intent_slot_metrics.
FrameAccuraciesByDepth
= typing.Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy]¶ Frame accuracies bucketized by depth of the gold tree.
-
class
pytext.metrics.intent_slot_metrics.
FrameAccuracy
[source]¶ Bases:
tuple
Frame accuracy for a collection of intent frame predictions.
Frame accuracy means the entire tree structure of the predicted frame matches that of the gold frame.
-
frame_accuracy
¶ Alias for field number 1
-
num_samples
¶ Alias for field number 0
-
-
class
pytext.metrics.intent_slot_metrics.
FramePredictionPair
[source]¶ Bases:
tuple
Pair of predicted and gold intent frames.
-
expected_frame
¶ Alias for field number 1
-
predicted_frame
¶ Alias for field number 0
-
-
class
pytext.metrics.intent_slot_metrics.
IntentSlotConfusions
[source]¶ Bases:
tuple
Aggregated class for intent and slot confusions.
-
intent_confusions
¶ Confusion counts for intents.
-
slot_confusions
¶ Confusion counts for slots.
-
intent_confusions
Alias for field number 0
-
slot_confusions
Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.
IntentSlotMetrics
[source]¶ Bases:
tuple
Precision/recall/F1 metrics for intents and slots.
-
intent_metrics
¶ Precision/recall/F1 metrics for intents.
-
slot_metrics
¶ Precision/recall/F1 metrics for slots.
-
overall_metrics
¶ Combined precision/recall/F1 metrics for all nodes (merging intents and slots).
-
intent_metrics
Alias for field number 0
-
overall_metrics
Alias for field number 2
-
slot_metrics
Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.
IntentsAndSlots
[source]¶ Bases:
tuple
Collection of intents and slots in an intent frame.
-
intents
¶ Alias for field number 0
-
slots
¶ Alias for field number 1
-
-
class
pytext.metrics.intent_slot_metrics.
Node
(label: str, span: pytext.data.data_structures.node.Span, children: Optional[AbstractSet[Node]] = None, text: str = None)[source]¶ Bases:
pytext.data.data_structures.node.Node
Subclass of the base Node class, used for metric purposes. It is immutable so that hashing can be done on the class.
-
label
¶ Label of the node.
Type: str
-
span
¶ Span of the node.
Type: Span
-
children
¶ frozenset of the node’s children, left empty when computing bracketing metrics.
Type: frozenset
ofNode
-
text
¶ Text the node covers (=utterance[span.start:span.end])
Type: str
-
-
class
pytext.metrics.intent_slot_metrics.
NodesPredictionPair
[source]¶ Bases:
tuple
Pair of predicted and expected sets of nodes.
-
expected_nodes
¶ Alias for field number 1
-
predicted_nodes
¶ Alias for field number 0
-
-
pytext.metrics.intent_slot_metrics.
compare_frames
(predicted_frame: pytext.metrics.intent_slot_metrics.Node, expected_frame: pytext.metrics.intent_slot_metrics.Node, tree_based: bool, intent_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None, slot_per_label_confusions: Optional[pytext.metrics.PerLabelConfusions] = None) → pytext.metrics.intent_slot_metrics.IntentSlotConfusions[source]¶ Compares two intent frames and returns TP, FP, FN counts for intents and slots. Optionally collects the per label TP, FP, FN counts.
Parameters: - predicted_frame – Predicted intent frame.
- expected_frame – Gold intent frame.
- tree_based – Whether to get the tree-based confusions (if True) or bracket-based confusions (if False). For details, see the function compute_intent_slot_metrics().
- intent_per_label_confusions – If provided, update the per label confusions for intents as well. Defaults to None.
- slot_per_label_confusions – If provided, update the per label confusions for slots as well. Defaults to None.
Returns: IntentSlotConfusions, containing confusion counts for intents and slots.
-
pytext.metrics.intent_slot_metrics.
compute_all_metrics
(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], top_intent_accuracy: bool = True, frame_accuracy: bool = True, frame_accuracies_by_depth: bool = True, bracket_metrics: bool = True, tree_metrics: bool = True, overall_metrics: bool = False, all_predicted_frames: List[List[pytext.metrics.intent_slot_metrics.Node]] = None, calculated_loss: float = None, length_metrics: Dict[KT, VT] = None) → pytext.metrics.intent_slot_metrics.AllMetrics[source]¶ Given a list of predicted and gold intent frames, computes intent-slot related metrics.
Parameters: - frame_pairs – List of predicted and gold intent frames.
- top_intent_accuracy – Whether to compute top intent accuracy or not. Defaults to True.
- frame_accuracy – Whether to compute frame accuracy or not. Defaults to True.
- frame_accuracies_by_depth – Whether to compute frame accuracies by depth or not. Defaults to True.
- bracket_metrics – Whether to compute bracket metrics or not. Defaults to True.
- tree_metrics – Whether to compute tree metrics or not. Defaults to True.
- overall_metrics – If bracket_metrics or tree_metrics is true, decides whether to compute overall (merging intents and slots) metrics for them. Defaults to False.
Returns: AllMetrics which contains intent-slot related metrics.
-
pytext.metrics.intent_slot_metrics.
compute_frame_accuracies_by_depth
(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → Dict[int, pytext.metrics.intent_slot_metrics.FrameAccuracy][source]¶ Given a list of predicted and gold intent frames, splits the predictions into buckets according to the depth of the gold trees, and computes frame accuracy for each bucket.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: FrameAccuraciesByDepth, a map from depths to their corresponding frame accuracies.
-
pytext.metrics.intent_slot_metrics.
compute_frame_accuracy
(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶ Computes frame accuracy given a list of predicted and gold intent frames.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: Frame accuracy. For a prediction, frame accuracy is achieved if the entire tree structure of the predicted frame matches that of the gold frame.
-
pytext.metrics.intent_slot_metrics.
compute_frame_accuracy_top_k
(frame_pairs: List[pytext.metrics.intent_slot_metrics.FramePredictionPair], all_frames: List[List[pytext.metrics.intent_slot_metrics.Node]]) → float[source]¶
-
pytext.metrics.intent_slot_metrics.
compute_intent_slot_metrics
(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair], tree_based: bool, overall_metrics: bool = True) → pytext.metrics.intent_slot_metrics.IntentSlotMetrics[source]¶ Given a list of predicted and gold intent frames, computes precision, recall and F1 metrics for intents and slots, either in tree-based or bracket-based manner.
The following assumptions are taken on intent frames: 1. The root node is an intent, 2. Children of intents are always slots, and children of slots are always intents.
For tree-based metrics, a node (an intent or slot) in the predicted frame is considered a true positive only if the subtree rooted at this node has an exact copy in the gold frame, otherwise it is considered a false positive. A false negative is a node in the gold frame that does not have an exact subtree match in the predicted frame.
For bracket-based metrics, a node in the predicted frame is considered a true positive if there is a node in the gold frame having the same label and span (but not necessarily the same children). The definitions of false positives and false negatives are similar to the above.
Parameters: - frame_pairs – List of predicted and gold intent frames.
- tree_based – Whether to compute tree-based metrics (if True) or bracket-based metrics (if False).
- overall_metrics – Whether to compute overall (merging intents and slots) metrics or not. Defaults to True.
Returns: IntentSlotMetrics, containing precision/recall/F1 metrics for intents and slots.
-
pytext.metrics.intent_slot_metrics.
compute_metric_at_k
(references: List[pytext.metrics.intent_slot_metrics.Node], hypothesis: List[List[pytext.metrics.intent_slot_metrics.Node]], metric_fn: Callable[[pytext.metrics.intent_slot_metrics.Node, pytext.metrics.intent_slot_metrics.Node], bool] = <function <lambda>>) → List[float][source]¶ Computes a boolean metric at each position in the ranked list of hypothesis, and returns an average for each position over all examples. By default metric_fn is comparing if frames are equal.
-
pytext.metrics.intent_slot_metrics.
compute_prf1_metrics
(nodes_pairs: Sequence[pytext.metrics.intent_slot_metrics.NodesPredictionPair]) → Tuple[pytext.metrics.AllConfusions, pytext.metrics.PRF1Metrics][source]¶ Computes precision/recall/F1 metrics given a list of predicted and expected sets of nodes.
Parameters: nodes_pairs – List of predicted and expected node sets. Returns: A tuple, of which the first member contains the confusion information, and the second member contains the computed precision/recall/F1 metrics.
-
pytext.metrics.intent_slot_metrics.
compute_top_intent_accuracy
(frame_pairs: Sequence[pytext.metrics.intent_slot_metrics.FramePredictionPair]) → float[source]¶ Computes accuracy of the top-level intent.
Parameters: frame_pairs – List of predicted and gold intent frames. Returns: Prediction accuracy of the top-level intent.
pytext.metrics.language_model_metrics module¶
pytext.metrics.mask_metrics module¶
pytext.metrics.seq2seq_metrics module¶
-
class
pytext.metrics.seq2seq_metrics.
Seq2SeqMetrics
(loss, exact_match, f1, bleu)[source]¶ Bases:
tuple
-
bleu
¶ Alias for field number 3
-
exact_match
¶ Alias for field number 1
-
f1
¶ Alias for field number 2
-
loss
¶ Alias for field number 0
-
-
pytext.metrics.seq2seq_metrics.
compute_f1
(hypothesis_list, reference_list, eps=1e-08)[source]¶ Computes token F1 given a hypothesis and reference. This is defined as F1 = 2 * ((P * R) / (P + R + eps)) where P = precision, R = recall, and eps = epsilon for smoothing zero denominators. By default, eps = 1e-8.
pytext.metrics.squad_metrics module¶
Module contents¶
-
class
pytext.metrics.
AllConfusions
[source]¶ Bases:
object
Aggregated class for per label confusions.
-
per_label_confusions
¶ Per label confusion information.
-
confusions
¶ Overall TP, FP and FN counts across the labels in per_label_confusions.
-
confusions
-
per_label_confusions
-
-
class
pytext.metrics.
ClassificationMetrics
[source]¶ Bases:
tuple
Metric class for various classification metrics.
-
accuracy
¶ Overall accuracy of predictions.
-
macro_prf1_metrics
¶ Macro precision/recall/F1 scores.
-
per_label_soft_scores
¶ Per label soft metrics.
-
mcc
¶ Matthews correlation coefficient.
-
roc_auc
¶ Area under the Receiver Operating Characteristic curve.
-
loss
¶ Training loss (only used for selecting best model, no need to print).
-
accuracy
Alias for field number 0
-
loss
Alias for field number 5
-
macro_prf1_metrics
Alias for field number 1
-
mcc
Alias for field number 3
-
per_label_soft_scores
Alias for field number 2
-
roc_auc
Alias for field number 4
-
-
class
pytext.metrics.
Confusions
(TP: int = 0, FP: int = 0, FN: int = 0)[source]¶ Bases:
object
Confusion information for a collection of predictions.
-
TP
¶ Number of true positives.
-
FP
¶ Number of false positives.
-
FN
¶ Number of false negatives.
-
FN
-
FP
-
TP
-
-
class
pytext.metrics.
LabelListPrediction
[source]¶ Bases:
tuple
Label list predictions of an example.
-
label_scores
¶ Confidence scores that each label receives.
-
predicted_label
¶ List of indices of the predicted label.
-
expected_label
¶ List of indices of the true label.
-
expected_label
Alias for field number 2
-
label_scores
Alias for field number 0
-
predicted_label
Alias for field number 1
-
-
class
pytext.metrics.
LabelPrediction
[source]¶ Bases:
tuple
Label predictions of an example.
-
label_scores
¶ Confidence scores that each label receives.
-
predicted_label
¶ Index of the predicted label. This is usually the label with the highest confidence score in label_scores.
-
expected_label
¶ Index of the true label.
-
expected_label
Alias for field number 2
-
label_scores
Alias for field number 0
-
predicted_label
Alias for field number 1
-
-
class
pytext.metrics.
MacroPRF1Metrics
[source]¶ Bases:
tuple
Aggregated metric class for macro precision/recall/F1 scores.
-
per_label_scores
¶ Mapping from label string to the corresponding precision/recall/F1 scores.
-
macro_scores
¶ Macro precision/recall/F1 scores across the labels in per_label_scores.
-
macro_scores
Alias for field number 1
-
per_label_scores
Alias for field number 0
-
-
class
pytext.metrics.
MacroPRF1Scores
[source]¶ Bases:
tuple
Macro precision/recall/F1 scores (averages across each label).
-
num_label
¶ Number of distinct labels.
-
precision
¶ Equally weighted average of precisions for each label.
-
recall
¶ Equally weighted average of recalls for each label.
-
f1
¶ Equally weighted average of F1 scores for each label.
-
f1
Alias for field number 3
-
num_labels
¶ Alias for field number 0
-
precision
Alias for field number 1
-
recall
Alias for field number 2
-
-
class
pytext.metrics.
MultiLabelSoftClassificationMetrics
[source]¶ Bases:
tuple
Classification scores that are independent of thresholds.
-
average_label_precision
¶ Alias for field number 0
-
average_label_recall
¶ Alias for field number 2
-
average_overall_accuracy
¶ Alias for field number 11
-
average_overall_auc
¶ Alias for field number 9
-
average_overall_precision
¶ Alias for field number 1
-
average_overall_recall
¶ Alias for field number 3
-
decision_thresh_at_precision
¶ Alias for field number 5
-
decision_thresh_at_recall
¶ Alias for field number 7
-
label_accuracy
¶ Alias for field number 10
-
precision_at_recall
¶ Alias for field number 6
-
recall_at_precision
¶ Alias for field number 4
-
roc_auc
¶ Alias for field number 8
-
-
pytext.metrics.
PRECISION_AT_RECALL_THRESHOLDS
= [0.2, 0.4, 0.6, 0.8, 0.9]¶ Basic metric classes and functions for single-label prediction problems. Extending to multi-label support
-
class
pytext.metrics.
PRF1Metrics
[source]¶ Bases:
tuple
Metric class for all types of precision/recall/F1 scores.
-
per_label_scores
¶ Map from label string to the corresponding precision/recall/F1 scores.
-
macro_scores
¶ Macro precision/recall/F1 scores across the labels in per_label_scores.
-
micro_scores
¶ Micro (regular) precision/recall/F1 scores for the same collection of predictions.
-
macro_scores
Alias for field number 1
-
micro_scores
Alias for field number 2
-
per_label_scores
Alias for field number 0
-
-
class
pytext.metrics.
PRF1Scores
[source]¶ Bases:
tuple
Precision/recall/F1 scores for a collection of predictions.
-
true_positives
¶ Number of true positives.
-
false_positives
¶ Number of false positives.
-
false_negatives
¶ Number of false negatives.
-
precision
¶ TP / (TP + FP).
-
recall
¶ TP / (TP + FN).
-
f1
¶ 2 * TP / (2 * TP + FP + FN).
-
f1
Alias for field number 5
-
false_negatives
Alias for field number 2
-
false_positives
Alias for field number 1
-
precision
Alias for field number 3
-
recall
Alias for field number 4
-
true_positives
Alias for field number 0
-
-
class
pytext.metrics.
PairwiseRankingMetrics
[source]¶ Bases:
tuple
Metric class for pairwise ranking
-
num_examples
¶ number of samples
Type: int
-
accuracy
¶ how many times did we rank in the correct order
Type: float
-
average_score_difference
¶ average score(higherRank) - score(lowerRank)
Type: float
-
accuracy
Alias for field number 1
-
average_score_difference
Alias for field number 2
-
num_examples
Alias for field number 0
-
-
class
pytext.metrics.
PerLabelConfusions
[source]¶ Bases:
object
Per label confusion information.
-
label_confusions_map
¶ Map from label string to the corresponding confusion counts.
-
label_confusions_map
-
-
class
pytext.metrics.
RealtimeMetrics
[source]¶ Bases:
tuple
Realtime Metrics for tracking training progress and performance.
-
samples
¶ number of samples
Type: int
-
tps
¶ tokens per second
Type: float
-
ups
¶ updates per second
Type: float
-
samples
Alias for field number 0
-
tps
Alias for field number 1
-
ups
Alias for field number 2
-
-
class
pytext.metrics.
RegressionMetrics
[source]¶ Bases:
tuple
Metrics for regression tasks.
-
num_examples
¶ number of examples
Type: int
-
pearson_correlation
¶ correlation between predictions and labels
Type: float
-
mse
¶ mean-squared error between predictions and labels
Type: float
-
mse
Alias for field number 2
-
num_examples
Alias for field number 0
-
pearson_correlation
Alias for field number 1
-
-
class
pytext.metrics.
SoftClassificationMetrics
[source]¶ Bases:
tuple
Classification scores that are independent of thresholds.
-
average_precision
¶ Alias for field number 0
-
decision_thresh_at_precision
¶ Alias for field number 2
-
decision_thresh_at_recall
¶ Alias for field number 4
-
precision_at_recall
¶ Alias for field number 3
-
recall_at_precision
¶ Alias for field number 1
-
roc_auc
¶ Alias for field number 5
-
-
pytext.metrics.
average_precision_score
(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray) → float[source]¶ Computes average precision, which summarizes the precision-recall curve as the precisions achieved at each threshold weighted by the increase in recall since the previous threshold.
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- Numpy array of confidence scores for the predictions in (y_score_sorted) – decreasing order.
Returns: Average precision score.
TODO: This is too slow, improve the performance
-
pytext.metrics.
compute_average_recall
(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], average_precisions: Dict[str, float]) → float[source]¶
-
pytext.metrics.
compute_classification_metrics
(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of label predictions.
Parameters: - predictions – Label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.
compute_macro_avg
(soft_metrics: Dict[str, pytext.metrics.SoftClassificationMetrics], metric: str)[source]¶
-
pytext.metrics.
compute_matthews_correlation_coefficients
(TP: int, FP: int, FN: int, TN: int) → float[source]¶ Computes Matthews correlation coefficient, a way to summarize all four counts (TP, FP, FN, TN) in the confusion matrix of binary classification.
Parameters: - TP – Number of true positives.
- FP – Number of false positives.
- FN – Number of false negatives.
- TN – Number of true negatives.
Returns: Matthews correlation coefficient, which is sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)).
-
pytext.metrics.
compute_multi_label_classification_metrics
(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of multi-label predictions.
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.
compute_multi_label_full_vector_classification_metrics
(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], loss: float, average_precisions: bool = True, recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.ClassificationMetrics[source]¶ A general function that computes classification metrics given a list of multi-label predictions.
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- average_precisions – Whether to compute average precisions for labels or not. Defaults to True.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: ClassificationMetrics which contains various classification metrics.
-
pytext.metrics.
compute_multi_label_multi_class_soft_metrics
(predictions: Sequence[Sequence[pytext.metrics.LabelPrediction]], label_names: Sequence[str], label_vocabs: Sequence[Sequence[str]], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → pytext.metrics.MultiLabelSoftClassificationMetrics[source]¶ Computes multi-label soft classification metrics with multi-class accommodation
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.
compute_multi_label_soft_full_vector_metrics
(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes multi-label soft classification metrics
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names. May contain duplicate label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.
compute_multi_label_soft_metrics
(predictions: Sequence[pytext.metrics.LabelListPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes multi-label soft classification metrics
Parameters: - predictions – multi-label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.
compute_pairwise_ranking_metrics
(predictions: Sequence[int], scores: Sequence[float]) → pytext.metrics.PairwiseRankingMetrics[source]¶ Computes metrics for pairwise ranking given sequences of predictions and scores
Parameters: - predictions – 1 if ranking was correct, 0 if ranking was incorrect
- scores – score(higher-ranked-sample) - score(lower-ranked-sample)
Returns: PairwiseRankingMetrics object
-
pytext.metrics.
compute_regression_metrics
(predictions: Sequence[float], targets: Sequence[float]) → pytext.metrics.RegressionMetrics[source]¶ Computes metrics for regression tasks.abs
Parameters: - predictions – 1-D sequence of float predictions
- targets – 1-D sequence of float labels
Returns: RegressionMetrics object
-
pytext.metrics.
compute_roc_auc
(predictions: Sequence[pytext.metrics.LabelPrediction], target_class: int = 0) → Optional[float][source]¶ Computes area under the Receiver Operating Characteristic curve, for binary classification. Implementation based off of (and explained at) https://www.ibm.com/developerworks/community/blogs/jfp/entry/Fast_Computation_of_AUC_ROC_score?lang=en.
-
pytext.metrics.
compute_roc_auc_given_sorted_positives
(y_true_sorted: numpy.ndarray) → Optional[float][source]¶
-
pytext.metrics.
compute_soft_metrics
(predictions: Sequence[pytext.metrics.LabelPrediction], label_names: Sequence[str], recall_at_precision_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9], precision_at_recall_thresholds: Sequence[float] = [0.2, 0.4, 0.6, 0.8, 0.9]) → Dict[str, pytext.metrics.SoftClassificationMetrics][source]¶ Computes soft classification metrics given a list of label predictions.
Parameters: - predictions – Label predictions, including the confidence score for each label.
- label_names – Indexed label names.
- recall_at_precision_thresholds – precision thresholds at which to calculate recall
- precision_at_recall_thresholds – recall thresholds at which to calculate precision
Returns: Dict from label strings to their corresponding soft metrics.
-
pytext.metrics.
precision_at_recall
(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Tuple[Dict[float, float], Dict[float, float]][source]¶ Computes precision at various recall levels
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
- thresholds – Sequence of floats indicating the requested recall thresholds
Returns: Dictionary of maximum precision at requested recall thresholds. Dictionary of decision thresholds resulting in max precision at requested recall thresholds.
-
pytext.metrics.
recall_at_precision
(y_true_sorted: numpy.ndarray, y_score_sorted: numpy.ndarray, thresholds: Sequence[float]) → Dict[float, float][source]¶ Computes recall at various precision levels
Parameters: - y_true_sorted – Numpy array sorted according to decreasing confidence scores indicating whether each prediction is correct.
- y_score_sorted – Numpy array of confidence scores for the predictions in decreasing order.
- thresholds – Sequence of floats indicating the requested precision thresholds
Returns: Dictionary of maximum recall at requested precision thresholds.