TechAE Blogs - Explore now for new leading-edge technologies

TechAE Blogs - a global platform designed to promote the latest technologies like artificial intelligence, big data analytics, and blockchain.

Full width home advertisement

Post Page Advertisement [Top]

Introduction To Pandas - Part 2

Introduction To Pandas - Part 2

Continuing the previous part of the series, we shall now work on checking and filling in missing data using Pandas. Mostly, when you are provided with datasets, it has missing values which you have to fill by yourself in order to achieve fruitful results.

Table of Contents

  • What is Data Wrangling?
  • What is a Missing Value?
  • Operations on Missing Values
  • Conclusion

What is Data Wrangling?

Data Wrangling is a brief process from understanding data to analyzing our data as can be observed below:

Data Wrangling
Data Wrangling with Open Refine by Emily Esten

The main steps for Data Wrangling:

1. Data Structuring

The first step is to filter out the relevant data into multiple columns so that the analysis can be run by grouping common values in a separate way.

2. Data Cleaning

In this step, the data is cleaned up by handling Null values and standardizing the data format.

Data Cleaning
Photo by GeeksforGeeks

3. Data Enriching

Following cleaning, the data is enhanced by adding some variables and using new sources to enrich it for the subsequent stages of processing, a process called Data Augmentation.

For the time being, we'll go through how to deal with missing values, which is a critical step in data cleansing.

What is a Missing Value?

Missing data (or missing values) is the data values that are not stored in a column or row. There are three types of missing data:

💠 Missing completely at random (MCAR): occurs when the fact that data is missing is unrelated to the seen and unseen data.

💠 Missing at random (MAR): occurs when the absence of data is statistically related to the seen but not the unseen data.

💠 Missing not at random (MNAR): When data is missing because of occurrences or causes that the researcher did not measure.

Operations on Missing Values

There are some useful methods for detecting, removing, and replacing missing values in Pandas:

DataFrame.isnull()/DataFrame.isna():

It returns boolean values indicating missing values.


df = pd.DataFrame({
    "car": ['Mercedes', 'Maserati MC20', None],
    "speed": [420, 530, 450]
}, index=['a', 'b', 'c'])

df.isnull()
carspeed
aFalseFalse
bFalseFalse
cTrue*False
*True represents missing value.

DataFrame.fillna():

This function returns data with missing values filled or imputed with the desired strategy. Unwanted non-existent data can be handled in one of four ways, listed in that order:

1st way: Ignore the missing or undesirable data in some columns because there is vital or relevant data in other columns of the same rows for the study.

2nd way: Replace missing or undesirable data with values that provide a nullity indicator.

3rd way: Replace missing, nonexistent, or undesired data with interpolated values relating to the trend of the remaining data.

4th way: Delete the missing data with confidence that vital information will not be lost during data analysis.


df['car'].fillna('unknown') OR
df.fillna(method='ffill/bfill') # Forward fill or Backward fill

DataFrame.dropna():

It returns a filtered version of data where any missing values are removed.


df['car'].dropna()

This is how you can handle missing values, Lastly, I explored some common commands we need while pre-processing datasets.

How to select columns having "object" data type?

df.select_dtypes(['object']).columns

How to convert columns from object to DateTime data type?

df['column_name'].astype('datetime64[ns]')

Conclusion

To sum up, We have studied the essentials of the pandas' data analysis package, which allows us to effectively execute operations on stored data and manage missing data. We also went over some of the most important features of Pandas objects, as well as how to perform Data Wrangling.

See you next time,

@TechAE

No comments:

Post a Comment

Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!

Bottom Ad [Post Page]