Skip to content

Data distribution is a function that lists out all possible values the Data can take. It can be a continuous or discrete Data distribution. Several known standard Probability Distribution functions provide probabilities of occurrence of different possible outcomes in an experiment.

Notifications You must be signed in to change notification settings

Jimoh1993/UM6P-SCI-Data-Science-Lab-Project-Data-Distribution-Stat-Tests

Repository files navigation

UM6P-SCI-Data-Science-Lab-Project-Data-Distribution-Stat-Tests

Data Distribution

Data distribution is a function that lists out all possible values the Data can take. It can be a continuous or discrete Data distribution. Several known standard Probability Distribution functions provide probabilities of occurrence of different possible outcomes in an experiment.But, the challenge is the real-world Data may not follow any well-known Probability Distributions. In this case, we can approximate the most probable Probability Distribution and check it’s Goodness of fit.

Advantages of knowing the underlying Probability Distribution of Data

Good Practice Many Algorithms, like Linear Regression, assumes variables to follow a particular distribution. The cost of not meeting the assumptions could be high at times. Attaching a confidence Interval Knowing the underlying probability distribution, we can find it’s Probability density function. This helps us in attaching confidence intervals to the range of values Data is likely to take. Keep track of how the Distribution has changed over time or during special events/seasons Distribution has parameters. With these parameters, we can keep track of how the Distribution has changed over time or during a particular season/event. Well known statistical properties The standard probability distributions have well-known statistical properties that simplify the job for us. We can explain the Data and its behavior with just a few parameters.

The following data distribution and statistical tests are implemented in this project:

  1. The Normal distribution
  2. The Shapiro-Wilk Normality Test
  3. The Pearson's Correlation Test
  4. The Pearson's Correlation Test
  5. The Chi-Squared Test
  6. The Student's t-test
  7. The Bernouli Distribution
  8. The Binomial Distribution

About

Data distribution is a function that lists out all possible values the Data can take. It can be a continuous or discrete Data distribution. Several known standard Probability Distribution functions provide probabilities of occurrence of different possible outcomes in an experiment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published