Skip to content

AkiMadi16/DataManipulation_withPandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to my Coding Journey on Data Manipulation with pandas

Hello there! 👋🏽

What You'll Find Here 🦾

In this repository, I'm documenting my journey of learning and experimenting with code.

Power Bi Dashboard

Dashboard

#Data visualization with creating Data frames. Display plots call plt.show() df plot

Df 2

Let's start exploring data

When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. The following code cells show useful methods and attributes for this.

Return the first few rows of a DataFrame

+ print(avocado.head())

first Rows

Compute some summary statistics for numerical columns

+ print(avocado.describe())

Summary stats

Print the names of columns, the data types they contain, and whether they have any missing values

+ print(avocado.info())

missing values

Return the data values in a 2D NumPy array

+ print(avocado.values)

Sort by multiple variables by passing lists of column names and booleans

+ print(avocado.sort_values(["nb_sold", "date", "type"], ascending=[False, False, True]))

sorting

+ print(avocado[(avocado["size"] == "small") & (avocado["date"] > "2018-01-01")])

sort by size & date

+ print(avocado)

data

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once.

Define a custom function: this one computes the inter-quartile range (IQR)

  def iqr(column):
    return column.quantile(0.75) - column.quantile(0.25)

Apply multiple functions

+ print(avocado[["avg_price", "nb_sold"]].agg([iqr, np.median]))

agg method

Pivot tables are another way of calculating grouped summary statistics. The .pivot_table() method has:

a values argument which takes the name of the column you want to summarize. an index argument which takes the name of the column you want to group by. an aggfunc argument which takes in a list of functions to summarize the values. By default, .pivot_table() uses the mean. a columns argument which takes in the name of any other columns you want to group by. a fill_value argument to define what should replace missing values. a margins argument. Setting this to True enables summary statistics for multiple levels of the dataset.

  print(avocado.pivot_table(
    values="avg_price",
    index="year",
    aggfunc=[np.mean, np.max],
    columns="size",
    margins=True,
))

pivot table

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published