Skip to content

kuntala-c/Statistical-Analysis-of-Iris-Dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

Statistical-Analysis-of-Iris-Dataset

Problem Statement: Perform statistical analysis on the Iris flower dataset.

Description: The iris flower data consists of 50 samples from 3 different species of iris flower namely setosa, versicolor and virginica. The dataset consists of 4 numerical/input features and 1 categorical feature/target variable. Input features are sepal length, sepal width, petal length and petal width whereas target variable is species.

alt text

Libraries Used: Numpy, Pandas, Scipy, Matplotlib, Scikit Learn, Statsmodels, Seaborn

What we have learned so far from this project:

  • We have four numerical columns and just one categorical column which is our target column
  • This dataset is a balanced dataset as every category has same number of instances
  • Very high correlation is there between petal length and petal width
  • The setosa species is the most easily distinguishable because it is less distributed
  • The versicolor and virginica species are difficult to distinguish due to the overlapping of attributes
  • All input features (sepal length, sepal width, petal length and petal width) are statistically significant in distinguishing the species of iris flower
  • The three species (setosa,versicolor, and virginica) have different petal lengths, with only partially overlapping values at the last two of them
  • We have verified that the species’ means are significantly different for all the four input features