Data Pre Processing made easy

DataPrePrep

Better Missing Values

Missing Values

Treat Missing Values using Mean, Median, Mode, Knn Method

Manipulate your dataframe with ease.

import datapreprep as dp

import pandas as pd

# importing dataframe
df=pd.read_excel('Book1.xlsx')

# Treating missing values using Mean
dp.missing_values(df,'mean')   # see its soo easy

# Treating missing values using Median
dp.missing_values(df,'mean')

# Treating missing values using Mode
dp.missing_values(df,'mode')

# Treating missing values using Knn Method
dp.missing_values(df,1,'knn') # Here integer 1 signifies nearest neighbour 
                   # can be 2 or 3
				   
# Treating missing values using End of Distribution
dp.missing_values(df,'eod')

# Treating missing values using random sample imputation
dp.missing_values(df,'randomsampleimputation')

# Treating missing values using Capture NaN
dp.missing_values(df,'capturenan')


Better Outlier treatment

Treating Outliers

Treat Outliers using IQR, ZScore Methods.

# Treat Outliers using IQR
dp.outlier_treatment(df,'column_name','iqr')

 # column name from which 
 # outliers will be identified
 
 # Treat Outliers using Z-Score
 dp.outlier_treatment(df, 'column_name','zscore')

Better Feature Scaling

Feature Scaling

Feature Scaling using Standard Scalar, Min Max Scalar, Robust Scalar etc.

# Standard Scalar
dp.feature_scaling(df,'standard_scalar')

# Robust Scalar 
dp.feature_scaling(df,'robust_scalar') # see its one line of code

# Min Max Scalar
dp.feature_scaling(df,'minmax_scalar')

# Max Absolute Scalar
dp.feature_scaling(df,'maxabs_scalar')
  
Easy getting critical information about data

Information

dp.info(df)
The Percenatge of Value Missing in Given Data is : 0.000%

The Percenatge of Value Missing  in each column of  Given Data is :
Student_Id    0.0
Marks_10      0.0
Marks_12      0.0
Marks_Grad    0.0
State         0.0
Salary        0.0
z-score       0.0
dtype: float64
Data description :
        Student_Id   Marks_10   Marks_12  Marks_Grad     State     Salary  \
count    5.000000   5.000000   5.000000    5.000000  5.000000   5.000000   
mean     3.000000  66.200000  75.200000   69.600000  0.800000  15.000000   
std      1.581139  21.135279  11.300442   13.831124  0.447214   4.472136   
min      1.000000  45.000000  65.000000   56.000000  0.000000  10.000000   




Better Data Visualization

Data Visualization

Visualize your data with Bar Chart, heat map and much more

Just pass a dataframe and its done

import datapreprep as dp
# importing dataframe
df=pd.read_excel('Book1.xlsx')

# Visualize using Bar Chart
dp.bar(df)   # see its soo easy

# Using Matrix
dp.matrix(df)

# Using Heat map
dp.heatmap(df)





Getting Started

pip install datapreprep

Documentation

Use GUI Version