handyspark.extensions package¶
Submodules¶
handyspark.extensions.common module¶
handyspark.extensions.evaluation module¶
-
handyspark.extensions.evaluation.confusionMatrix(self, threshold=0.5)[source]¶ Returns confusion matrix: predicted classes are in columns, they are ordered by class label ascending, as in “labels”.
Predicted classes are computed according to informed threshold.
Parameters: threshold (double, optional) – Threshold probability for the positive class. Default is 0.5. Returns: confusionMatrix Return type: DenseMatrix
-
handyspark.extensions.evaluation.fMeasureByThreshold(self, beta=1.0)[source]¶ Calls the fMeasureByThreshold method from the Java class
- Returns the (threshold, F-Measure) curve.
- @param beta the beta factor in F-Measure computation.
- @return an RDD of (threshold, F-Measure) pairs.
- @see <a href=”http://en.wikipedia.org/wiki/F1_score”>F1 score (Wikipedia)</a>
-
handyspark.extensions.evaluation.pr(self)[source]¶ Calls the pr method from the Java class
- Returns the precision-recall curve, which is an RDD of (recall, precision),
- NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
- associated with the lowest recall on the curve.
- @see <a href=”http://en.wikipedia.org/wiki/Precision_and_recall”>
- Precision and recall (Wikipedia)</a>
-
handyspark.extensions.evaluation.precisionByThreshold(self)[source]¶ Calls the precisionByThreshold method from the Java class
- Returns the (threshold, precision) curve.
-
handyspark.extensions.evaluation.recallByThreshold(self)[source]¶ Calls the recallByThreshold method from the Java class
- Returns the (threshold, recall) curve.
-
handyspark.extensions.evaluation.roc(self)[source]¶ Calls the roc method from the Java class
- Returns the receiver operating characteristic (ROC) curve,
- which is an RDD of (false positive rate, true positive rate)
- with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- @see <a href=”http://en.wikipedia.org/wiki/Receiver_operating_characteristic”>
- Receiver operating characteristic (Wikipedia)</a>
handyspark.extensions.types module¶
Module contents¶
-
class
handyspark.extensions.BinaryClassificationMetrics(scoreAndLabels)[source]¶ Bases:
pyspark.mllib.common.JavaModelWrapperEvaluator for binary classification.
Parameters: scoreAndLabels – an RDD of (score, label) pairs >>> scoreAndLabels = sc.parallelize([ ... (0.1, 0.0), (0.1, 1.0), (0.4, 0.0), (0.6, 0.0), (0.6, 1.0), (0.6, 1.0), (0.8, 1.0)], 2) >>> metrics = BinaryClassificationMetrics(scoreAndLabels) >>> metrics.areaUnderROC 0.70... >>> metrics.areaUnderPR 0.83... >>> metrics.unpersist()
New in version 1.4.0.
-
areaUnderPR¶ Computes the area under the precision-recall curve.
New in version 1.4.0.
-
areaUnderROC¶ Computes the area under the receiver operating characteristic (ROC) curve.
New in version 1.4.0.
-
confusionMatrix(threshold=0.5)¶ Returns confusion matrix: predicted classes are in columns, they are ordered by class label ascending, as in “labels”.
Predicted classes are computed according to informed threshold.
Parameters: threshold (double, optional) – Threshold probability for the positive class. Default is 0.5. Returns: confusionMatrix Return type: DenseMatrix
-
fMeasureByThreshold(beta=1.0)¶ Calls the fMeasureByThreshold method from the Java class
- Returns the (threshold, F-Measure) curve.
- @param beta the beta factor in F-Measure computation.
- @return an RDD of (threshold, F-Measure) pairs.
- @see <a href=”http://en.wikipedia.org/wiki/F1_score”>F1 score (Wikipedia)</a>
-
getMetricsByThreshold()¶
-
pr()¶ Calls the pr method from the Java class
- Returns the precision-recall curve, which is an RDD of (recall, precision),
- NOT (precision, recall), with (0.0, p) prepended to it, where p is the precision
- associated with the lowest recall on the curve.
- @see <a href=”http://en.wikipedia.org/wiki/Precision_and_recall”>
- Precision and recall (Wikipedia)</a>
-
precisionByThreshold()¶ Calls the precisionByThreshold method from the Java class
- Returns the (threshold, precision) curve.
-
recallByThreshold()¶ Calls the recallByThreshold method from the Java class
- Returns the (threshold, recall) curve.
-
roc()¶ Calls the roc method from the Java class
- Returns the receiver operating characteristic (ROC) curve,
- which is an RDD of (false positive rate, true positive rate)
- with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.
- @see <a href=”http://en.wikipedia.org/wiki/Receiver_operating_characteristic”>
- Receiver operating characteristic (Wikipedia)</a>
-
thresholds()¶ - Returns thresholds in descending order.
-