handyspark.extensions package¶
Submodules¶
handyspark.extensions.common module¶
handyspark.extensions.evaluation module¶
-
handyspark.extensions.evaluation.
confusionMatrix
(self, threshold=0.5)[source]¶ Returns confusion matrix: predicted classes are in columns, they are ordered by class label ascending, as in “labels”.
Predicted classes are computed according to informed threshold.
Parameters: threshold (double, optional) – Threshold probability for the positive class. Default is 0.5. Returns: confusionMatrix Return type: DenseMatrix
-
handyspark.extensions.evaluation.
fMeasureByThreshold
(self, beta=1.0)[source]¶ Calls the fMeasureByThreshold method from the Java class
- Returns the (threshold, F-Measure) curve.
- @param beta the beta factor in F-Measure computation.
- @return an RDD of (threshold, F-Measure) pairs.
- @see <a href=”http://en.wikipedia.org/wiki/F1_score”>F1 score (Wikipedia)</a>
-
handyspark.extensions.evaluation.
pr
(self)[source]¶ Calls the pr method from the Java class
- Returns the precision-recall curve, which is an RDD of (recall, precision),
- NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
- associated with the lowest recall on the curve.
- @see <a href=”http://en.wikipedia.org/wiki/Precision_and_recall”>
- Precision and recall (Wikipedia)</a>
-
handyspark.extensions.evaluation.
precisionByThreshold
(self)[source]¶ Calls the precisionByThreshold method from the Java class
- Returns the (threshold, precision) curve.
-
handyspark.extensions.evaluation.
recallByThreshold
(self)[source]¶ Calls the recallByThreshold method from the Java class
- Returns the (threshold, recall) curve.
-
handyspark.extensions.evaluation.
roc
(self)[source]¶ Calls the roc method from the Java class
- Returns the receiver operating characteristic (ROC) curve,
- which is an RDD of (false positive rate, true positive rate)
- with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- @see <a href=”http://en.wikipedia.org/wiki/Receiver_operating_characteristic”>
- Receiver operating characteristic (Wikipedia)</a>
handyspark.extensions.types module¶
Module contents¶
-
class
handyspark.extensions.
BinaryClassificationMetrics
(scoreAndLabels)[source]¶ Bases:
pyspark.mllib.common.JavaModelWrapper
Evaluator for binary classification.
Parameters: scoreAndLabels – an RDD of (score, label) pairs >>> scoreAndLabels = sc.parallelize([ ... (0.1, 0.0), (0.1, 1.0), (0.4, 0.0), (0.6, 0.0), (0.6, 1.0), (0.6, 1.0), (0.8, 1.0)], 2) >>> metrics = BinaryClassificationMetrics(scoreAndLabels) >>> metrics.areaUnderROC 0.70... >>> metrics.areaUnderPR 0.83... >>> metrics.unpersist()
New in version 1.4.0.
-
areaUnderPR
¶ Computes the area under the precision-recall curve.
New in version 1.4.0.
-
areaUnderROC
¶ Computes the area under the receiver operating characteristic (ROC) curve.
New in version 1.4.0.
-
confusionMatrix
(threshold=0.5)¶ Returns confusion matrix: predicted classes are in columns, they are ordered by class label ascending, as in “labels”.
Predicted classes are computed according to informed threshold.
Parameters: threshold (double, optional) – Threshold probability for the positive class. Default is 0.5. Returns: confusionMatrix Return type: DenseMatrix
-
fMeasureByThreshold
(beta=1.0)¶ Calls the fMeasureByThreshold method from the Java class
- Returns the (threshold, F-Measure) curve.
- @param beta the beta factor in F-Measure computation.
- @return an RDD of (threshold, F-Measure) pairs.
- @see <a href=”http://en.wikipedia.org/wiki/F1_score”>F1 score (Wikipedia)</a>
-
getMetricsByThreshold
()¶
-
pr
()¶ Calls the pr method from the Java class
- Returns the precision-recall curve, which is an RDD of (recall, precision),
- NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
- associated with the lowest recall on the curve.
- @see <a href=”http://en.wikipedia.org/wiki/Precision_and_recall”>
- Precision and recall (Wikipedia)</a>
-
precisionByThreshold
()¶ Calls the precisionByThreshold method from the Java class
- Returns the (threshold, precision) curve.
-
recallByThreshold
()¶ Calls the recallByThreshold method from the Java class
- Returns the (threshold, recall) curve.
-
roc
()¶ Calls the roc method from the Java class
- Returns the receiver operating characteristic (ROC) curve,
- which is an RDD of (false positive rate, true positive rate)
- with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- @see <a href=”http://en.wikipedia.org/wiki/Receiver_operating_characteristic”>
- Receiver operating characteristic (Wikipedia)</a>
-
thresholds
()¶ - Returns thresholds in descending order.
-