handyspark.ml package

Submodules

handyspark.ml.base module

class handyspark.ml.base.HandyFencer[source]

Bases: pyspark.ml.base.Transformer, handyspark.ml.base.HasDict, pyspark.ml.util.DefaultParamsReadable, pyspark.ml.util.DefaultParamsWritable

Fencer transformer for capping outliers according to lower and upper fences.

fences

dict – The fence values for each feature. If stratified, first level keys are filter clauses for stratification.

fences
class handyspark.ml.base.HandyImputer[source]

Bases: pyspark.ml.base.Transformer, handyspark.ml.base.HasDict, pyspark.ml.util.DefaultParamsReadable, pyspark.ml.util.DefaultParamsWritable

Imputation transformer for completing missing values.

statistics

dict – The imputation fill value for each feature. If stratified, first level keys are filter clauses for stratification.

statistics
class handyspark.ml.base.HandyTransformers(df)[source]

Bases: object

Generates transformers to be used in pipelines.

Available transformers: imputer: Transformer

Imputation transformer for completing missing values.
fencer: Transformer
Fencer transformer for capping outliers according to lower and upper fences.
fencer()[source]

Generates a transformer to fence outliers, using statistics from the HandyFrame

imputer()[source]

Generates a transformer to impute missing values, using values from the HandyFrame

class handyspark.ml.base.HasDict[source]

Bases: pyspark.ml.param.Params

Mixin for a Dictionary parameter. It dumps the dictionary into a JSON string for storage and reloads it whenever needed.

dictValues = Param(parent='undefined', name='dictValues', doc='Dictionary values')
getDictValues()[source]

Gets the value of dictValues or its default value.

setDictValues(value)[source]

Sets the value of dictValues.

Module contents

class handyspark.ml.HandyFencer[source]

Bases: pyspark.ml.base.Transformer, handyspark.ml.base.HasDict, pyspark.ml.util.DefaultParamsReadable, pyspark.ml.util.DefaultParamsWritable

Fencer transformer for capping outliers according to lower and upper fences.

fences

dict – The fence values for each feature. If stratified, first level keys are filter clauses for stratification.

fences
class handyspark.ml.HandyImputer[source]

Bases: pyspark.ml.base.Transformer, handyspark.ml.base.HasDict, pyspark.ml.util.DefaultParamsReadable, pyspark.ml.util.DefaultParamsWritable

Imputation transformer for completing missing values.

statistics

dict – The imputation fill value for each feature. If stratified, first level keys are filter clauses for stratification.

statistics