Hello there! 👋🏽
In this repository, I'm documenting my journey of learning and experimenting with code.
#Data visualization with creating Data frames. Display plots call plt.show()
When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. The following code cells show useful methods and attributes for this.
+ print(avocado.head())
+ print(avocado.describe())
+ print(avocado.info())
+ print(avocado.values)
+ print(avocado.sort_values(["nb_sold", "date", "type"], ascending=[False, False, True]))
+ print(avocado[(avocado["size"] == "small") & (avocado["date"] > "2018-01-01")])
+ print(avocado)
The .agg() method allows you to apply your own custom functions to a DataFrame, as well as apply functions to more than one column of a DataFrame at once.
def iqr(column):
return column.quantile(0.75) - column.quantile(0.25)
+ print(avocado[["avg_price", "nb_sold"]].agg([iqr, np.median]))
Pivot tables are another way of calculating grouped summary statistics. The .pivot_table() method has:
a values argument which takes the name of the column you want to summarize. an index argument which takes the name of the column you want to group by. an aggfunc argument which takes in a list of functions to summarize the values. By default, .pivot_table() uses the mean. a columns argument which takes in the name of any other columns you want to group by. a fill_value argument to define what should replace missing values. a margins argument. Setting this to True enables summary statistics for multiple levels of the dataset.
print(avocado.pivot_table(
values="avg_price",
index="year",
aggfunc=[np.mean, np.max],
columns="size",
margins=True,
))