Welcome to my Coding Journey on Data Manipulation with pandas

Hello there! 👋🏽

What You'll Find Here 🦾

In this repository, I'm documenting my journey of learning and experimenting with code.

Power Bi Dashboard

#Data visualization with creating Data frames. Display plots call plt.show()

Let's start exploring data

When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. The following code cells show useful methods and attributes for this.

Return the first few rows of a DataFrame

+ print(avocado.head())

Compute some summary statistics for numerical columns

+ print(avocado.describe())

Print the names of columns, the data types they contain, and whether they have any missing values

+ print(avocado.info())

Return the data values in a 2D NumPy array

+ print(avocado.values)

Sort by multiple variables by passing lists of column names and booleans

+ print(avocado.sort_values(["nb_sold", "date", "type"], ascending=[False, False, True]))

+ print(avocado[(avocado["size"] == "small") & (avocado["date"] > "2018-01-01")])

+ print(avocado)

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once.

Define a custom function: this one computes the inter-quartile range (IQR)

  def iqr(column):
    return column.quantile(0.75) - column.quantile(0.25)

Apply multiple functions

+ print(avocado[["avg_price", "nb_sold"]].agg([iqr, np.median]))

Pivot tables are another way of calculating grouped summary statistics. The .pivot_table() method has:

a values argument which takes the name of the column you want to summarize. an index argument which takes the name of the column you want to group by. an aggfunc argument which takes in a list of functions to summarize the values. By default, .pivot_table() uses the mean. a columns argument which takes in the name of any other columns you want to group by. a fill_value argument to define what should replace missing values. a margins argument. Setting this to True enables summary statistics for multiple levels of the dataset.

  print(avocado.pivot_table(
    values="avg_price",
    index="year",
    aggfunc=[np.mean, np.max],
    columns="size",
    margins=True,
))

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
output_images		output_images
README.md		README.md
avocado.py		avocado.py
notebook.ipynb		notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to my Coding Journey on Data Manipulation with pandas

What You'll Find Here 🦾

Power Bi Dashboard

Let's start exploring data

Return the first few rows of a DataFrame

Compute some summary statistics for numerical columns

Print the names of columns, the data types they contain, and whether they have any missing values

Return the data values in a 2D NumPy array

Sort by multiple variables by passing lists of column names and booleans

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once.

Define a custom function: this one computes the inter-quartile range (IQR)

Apply multiple functions

Pivot tables are another way of calculating grouped summary statistics. The .pivot_table() method has:

About

Releases

Packages

Languages

AkiMadi16/DataManipulation_withPandas

Folders and files

Latest commit

History

Repository files navigation

Welcome to my Coding Journey on Data Manipulation with pandas

What You'll Find Here 🦾

Power Bi Dashboard

Let's start exploring data

Return the first few rows of a DataFrame

Compute some summary statistics for numerical columns

Print the names of columns, the data types they contain, and whether they have any missing values

Return the data values in a 2D NumPy array

Sort by multiple variables by passing lists of column names and booleans

The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once.

Define a custom function: this one computes the inter-quartile range (IQR)

Apply multiple functions

Pivot tables are another way of calculating grouped summary statistics. The .pivot_table() method has:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages