Treat Missing Values using Mean, Median, Mode, Knn Method
Manipulate your dataframe with ease.
import datapreprep as dp
import pandas as pd
# importing dataframe
df=pd.read_excel('Book1.xlsx')
# Treating missing values using Mean
dp.missing_values(df,'mean') # see its soo easy
# Treating missing values using Median
dp.missing_values(df,'mean')
# Treating missing values using Mode
dp.missing_values(df,'mode')
# Treating missing values using Knn Method
dp.missing_values(df,1,'knn') # Here integer 1 signifies nearest neighbour
# can be 2 or 3
# Treating missing values using End of Distribution
dp.missing_values(df,'eod')
# Treating missing values using random sample imputation
dp.missing_values(df,'randomsampleimputation')
# Treating missing values using Capture NaN
dp.missing_values(df,'capturenan')
Treat Outliers using IQR, ZScore Methods.
# Treat Outliers using IQR
dp.outlier_treatment(df,'column_name','iqr')
# column name from which
# outliers will be identified
# Treat Outliers using Z-Score
dp.outlier_treatment(df, 'column_name','zscore')
Feature Scaling using Standard Scalar, Min Max Scalar, Robust Scalar etc.
# Standard Scalar
dp.feature_scaling(df,'standard_scalar')
# Robust Scalar
dp.feature_scaling(df,'robust_scalar') # see its one line of code
# Min Max Scalar
dp.feature_scaling(df,'minmax_scalar')
# Max Absolute Scalar
dp.feature_scaling(df,'maxabs_scalar')
dp.info(df)
The Percenatge of Value Missing in Given Data is : 0.000%
The Percenatge of Value Missing in each column of Given Data is :
Student_Id 0.0
Marks_10 0.0
Marks_12 0.0
Marks_Grad 0.0
State 0.0
Salary 0.0
z-score 0.0
dtype: float64
Data description :
Student_Id Marks_10 Marks_12 Marks_Grad State Salary \
count 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
mean 3.000000 66.200000 75.200000 69.600000 0.800000 15.000000
std 1.581139 21.135279 11.300442 13.831124 0.447214 4.472136
min 1.000000 45.000000 65.000000 56.000000 0.000000 10.000000
Visualize your data with Bar Chart, heat map and much more
Just pass a dataframe and its done
import datapreprep as dp
# importing dataframe
df=pd.read_excel('Book1.xlsx')
# Visualize using Bar Chart
dp.bar(df) # see its soo easy
# Using Matrix
dp.matrix(df)
# Using Heat map
dp.heatmap(df)