XeroGraph

Description

XeroGraph is a Python package developed for researchers and data scientists to analyze and visualize missing data in datasets. It incorporates Little's MCAR test, among other statistical tools, to help users understand the mechanisms behind missing data. This package is particularly optimized for small to medium-sized datasets and offers extensive visualization options to elucidate data characteristics and integrity.

Key Features

Little's MCAR Test: Determines if the missing data in a dataset is missing completely at random.
Statistical Tests: Perform normality checks and Kolmogorov-Smirnov tests to evaluate the distribution of data.
Advanced Visualization: Generate histograms, density plots, box plots, Q-Q plots, and more to visualize data distributions and missing data patterns.
Missing Data Analysis: Tools to visualize and quantify the extent and patterns of missing data within your dataset.
Missing Value Imputation: Several options to perform missing value imputation.
Compare Missing Value Imputation Methods: Tools to compare different imputation methods.
Compare Distribution of Imputed Data: Tools to compare distribution of imputed data with original data.

Installation

Prerequisites

Ensure you have Python 3.9 or later installed. XeroGraph depends on the following Python libraries:

pandas
numpy
matplotlib
statsmodels
scikit-learn
xgboost
seaborn
torch
nimfa
optuna
tqdm
ipywidgets

These dependencies will be automatically installed during XeroGraph's installation process.

Setting Up a Virtual Environment

It is recommended to install XeroGraph within a virtual environment to manage dependencies effectively:

Create a virtual environment

python -m venv xeroenv

Activate the virtual environment

On Linux/Mac:

source xeroenv/bin/activate

On Windows:

xeroenv\Scripts\activate

Installing XeroGraph

You can install XeroGraph directly from PyPI using pip:

pip install XeroGraph

Alternatively, if you have access to the source code, navigate to the root directory of the source code and run:

python setup.py install

Getting Started

Quick Example

Here's a quick example to get you started with performing Little's MCAR test, visualizing the data and imputation. We use XeroAnalyzer application provided in XeroGraph.

# XeroAnalyzer can be imported as XA, xa, xeroanalyzer, xero_analyzer or XeroAnalyzer
from XeroGraph import xa
import pandas as pd

Example data

data = pd.DataFrame({
    'feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, None, 6, 4, 5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 1, 6, 4, 5],
    'feature2': [4, 6, 2, 4, 5, 6, 7, 8, 9, 2, 4, 3, 2, 2, 6, 4, 6, 2, 4, 5, 6, 7, 8, 9, 2, 4, 3, 2, 2, 6],
    'feature3': [1, 2, 4, 3, 6, 2, 6, 6, None, 1, 5, 0, 3, 2, 1, 1, 2, 4, 3, None, 2, 6, 6, 1, 1, 5, 0, 3, 2, 1],
    'feature4': [4, 3, 1, 2, 4, 5, 6, 7, 8, 9, 2, None, 3, 2, 1, 4, 3, 1, 2, 4, 5, 6, 7, 8, 9, 2, 1, 3, 2, 1],
    'feature5': [4, 3, 4, 2, None, 6, 2, 4, 5, 6, 7, 8, 9, 2, 4, 4, 3, 4, 2, 1, 6, 2, 4, 5, None, 7, 8, 9, 2, 4]
    
})
print(data.shape)

Initialize the XeroGraph analyzer

# Optional arguments:
# To save plot: save_plot=True, save_path='save path'
xg_test = xa(data, save_files=False, save_path="")

Perform normality test for each features

xg_test.normality()

Perform Kolmogorov-Smirnov test for each features

xg_test.ks()

Visualize histograms for each features

xg_test.histograms()

Visualize density plots for each features

xg_test.density_plots()

Visualize box plots for each features

xg_test.box_plots()

Visualize Q-Q plots for each features

xg_test.qq_plots()

Visualize missing data patterns

xg_test.missing_data()

Visualize missing percentages for both features and samples

xg_test.missing_percentage()

Perform Little's MCAR test

mcar_result = xg_test.mcar()
print(f"MCAR Test Result: {mcar_result}")

Perform imputation of continuous data

Some of the following tools can be used for imputation of categorical data but we will mainly focus on continuous data.

Mean Imputation

imp_data_mean = xg_test.mean_imputation()

Median Imputation

imp_data_median = xg_test.median_imputation()

Most Frequent

imp_data_most_frequent = xg_test.most_frequent_imputation()

KNN imputation

imp_data_knn = xg_test.knn_imputation()

Iterative Imputation

imp_data_ii = xg_test.iterative_imputation(plot_convergence=False) # Optional: plot_convergence=True

Imputation by Random Forest

imp_data_rf = xg_test.random_forest_imputation()

Imputation by LASSO CV

imp_data_lc = xg_test.lasso_cv_imputation()

Imputation by XGBoost

imp_data_xb = xg_test.xgboost_imputation()

Imputation by Xputer

imp_data_xp = xg_test.xputer_imputation()

Multiple Imputation by MICE

imp_data_mice = xg_test.mice_imp()

Check after imputation

Check Plausibility

xg_test.check_plausibility(imp_data_rf)

Compare with T-test and plot

xg_test.compare_with_ttest_and_plot(imp_data_ii)

Visualize feature combinations plots for each features

xg_test.feature_combinations()

Perform a test to check which imputation method fits for your data

We use XeroCompare application provided in XeroGraph to compare different imputation methods. For analysis, you may provide a dataset with minimum number of missing value as XeroCompare will remove rows with missing values.

Run with XeroAnalyzer

# MICE imputation is a slow process, if you want to include pass "run_mice=True".
summary = xg_test.compare_imputers(run_mice=False)()
print(summary)

Run independently as XeroCompare

# XeroCompare can be imported as XC, xc, xerocompare, xero_compare or XeroCompare
from XeroGraph import xc
# MICE imputation is a slow process, if you want to include pass "run_mice=True".
compare_imp = xc(data, run_mice=False) 
summary = compare_imp.compare()
print(summary)

Documentation

For more detailed information on all the features and usage instructions, refer to the full documentation available at ReadTheDoc(https://xerograph.readthedocs.io).

Contributing

Contributions to XeroGraph are welcome! Please refer to the CONTRIBUTING.md file for guidelines on how to make a contribution, including bug fixes, adding new features, and improving the documentation.

License

XeroGraph is released under the Apache License 2.0. For more details, see the LICENSE file included with the source code.

Contact

For help and support, please open an issue in the GitHub repository or contact the development team at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
XeroGraph		XeroGraph
docs		docs
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
XeroGraph_Examples.ipynb		XeroGraph_Examples.ipynb
readthedocs.yaml		readthedocs.yaml
requirements_dev.txt		requirements_dev.txt
setup.py		setup.py

License

kazilab/XeroGraph

Folders and files

Latest commit

History

Repository files navigation

XeroGraph

Description

Key Features

Installation

Prerequisites

Setting Up a Virtual Environment

Create a virtual environment

Activate the virtual environment

On Linux/Mac:

On Windows:

Installing XeroGraph

You can install XeroGraph directly from PyPI using pip:

Alternatively, if you have access to the source code, navigate to the root directory of the source code and run:

Getting Started

Quick Example

Example data

Initialize the XeroGraph analyzer

Perform normality test for each features

Perform Kolmogorov-Smirnov test for each features

Visualize histograms for each features

Visualize density plots for each features

Visualize box plots for each features

Visualize Q-Q plots for each features

Visualize missing data patterns

Visualize missing percentages for both features and samples

Perform Little's MCAR test

Perform imputation of continuous data

Mean Imputation

Median Imputation

Most Frequent

KNN imputation

Iterative Imputation

Imputation by Random Forest

Imputation by LASSO CV

Imputation by XGBoost

Imputation by Xputer

Multiple Imputation by MICE

Check after imputation

Check Plausibility

Compare with T-test and plot

Visualize feature combinations plots for each features

Perform a test to check which imputation method fits for your data

Run with XeroAnalyzer

Run independently as XeroCompare

Documentation

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages