Cryptocurrency Price Prediction

Introduction

This repository contains a project focused on predicting cryptocurrency prices using various machine learning models. The goal is to analyze historical cryptocurrency data and build predictive models to estimate future prices. The dataset used in this project is crypto-markets.csv, taken from kaggle https://www.kaggle.com/datasets/jessevent/all-crypto-currencies/data, which contains various features such as open, high, low, close prices, volume, and rank.

The project includes the following steps:

Data Preprocessing
Data Visualization
Feature Selection
Model Training and Evaluation
Results Visualization

Code Explanation

Importing Libraries

import numpy as np
import pandas as pd
import seaborn as sb
import seaborn as sns
from scipy import stats
import plotly.express as px
from sklearn.svm import SVR
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
import plotly.graph_objects as go
from sklearn.linear_model import BayesianRidge
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

Data Preprocessing

Loading Data:

df = pd.read_csv('crypto-markets.csv')
df.head()

Basic Data Analysis:
```
df.info()
df.describe()
df.isna().sum()
```

Handling Outliers:

z_scores = stats.zscore(df.select_dtypes(include=['float64', 'int64']))
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
df = df[filtered_entries]

Scaling Numerical Features:

scaler = StandardScaler()
df[df.select_dtypes(include=['float64', 'int64']).columns] = scaler.fit_transform(df.select_dtypes(include=['float64', 'int64']))

Data Visualization

Close Price Scatter Plot:

fig = go.Figure(data=[go.Scatter(y=df['close'])])
fig.update_layout(title='Crypto Close Price', yaxis_title='Price')
fig.show()

Closing Price Trend:

fig = go.Figure(data=[go.Scatter(x=df['name'], y=df['close'], name='Closing Price')])
fig.update_layout(title='Closing Price Trend', xaxis_title='Name', yaxis_title='Closing Price')
fig.show()

Closing Price Over Time:

fig = go.Figure(data=[go.Scatter(x=df['date'], y=df['close'], name='Closing Price')])
fig.update_layout(title='Closing Price Trend Over Time', xaxis_title='Date', yaxis_title='Closing Price')
fig.show()

Distribution of Closing Prices:

plt.figure(figsize=(10, 6))
sns.histplot(df['close'], bins=50, kde=True)
plt.title('Distribution of Closing Prices')
plt.xlabel('Closing Price')
plt.ylabel('Frequency')
plt.show()

Proportion of Rank Now by Name:

df_positive_ranknow = df[df['ranknow'] > 0]
ranknow_name = df_positive_ranknow.groupby('name')['ranknow'].sum()
sample_data = ranknow_name.sample(20)
colors = sns.color_palette('magma', len(sample_data))
fig = go.Figure(data=[go.Pie(labels=sample_data.index, values=sample_data.values, hole=0.5)])
fig.update_layout(title='Proportion of Rank Now by Name')
fig.show()

Feature Selection

Correlation Matrix:

numerical_features = df.select_dtypes(include=['float64', 'int64']).columns
corr_matrix = df[numerical_features].corr()
plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

Lasso Regression:

numerical_features = df.select_dtypes(include=['float64', 'int64']).columns
X = df[numerical_features].drop(columns=['close'])
y = df['close']
lasso = Lasso(alpha=0.01)
lasso.fit(X, y)
model = SelectFromModel(lasso, prefit=True)
selected_features = X.columns[model.get_support()]
print('Selected features by Lasso:', selected_features)

Models

The following machine learning models were used to predict cryptocurrency prices:

Linear Regression:

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Decision Tree:

tree_model = DecisionTreeRegressor(random_state=42)
tree_model.fit(X_train, y_train)
y_pred = tree_model.predict(X_test)

Random Forest:

rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)

Bayesian Ridge Regression:

bayesian_model = BayesianRidge()
bayesian_model.fit(X_train, y_train)
y_pred = bayesian_model.predict(X_test)

Support Vector Regression (SVR):

svr_model = SVR(kernel='rbf')
svr_model.fit(X_train, y_train)
y_pred = svr_model.predict(X_test)

Gradient Boosting:

gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)
y_pred = gb_model.predict(X_test)

XGBoost Regressor:

xgb_model = XGBRegressor(n_estimators=100, learning_rate=0.1, random_state=42)
xgb_model.fit(X_train, y_train)
y_pred = xgb_model.predict(X_test)

Results

The performance of each model was evaluated using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2) scores. The results were plotted to compare the models.

metrics = {
    'Model': ['Linear Regression', 'Decision Tree', 'Random Forest', 'Bayesian Regression', 'Gradient Boosting', 'XGBoost'],
    'MAE': [LRmae, DTmae, RFmae, BRmae, GBmae, XGmae],
    'MSE': [LRmse, DTmse, RFmse, BRmse, GBmse, XGmse],
    'RMSE': [LRrmse, DTrmse, RFrmse, BRrmse, GBrmse, XGrmse],
    'R-squared': [LRr2, DTr2, RFr2, BRr2, GBr2, XGr2]
}

plt.figure(figsize=(14, 10))

# MAE plot
plt.subplot(2, 2, 1)
plt.plot(metrics['Model'], metrics['MAE'], marker='o', linestyle='-', color='b', label='MAE')
plt.title('Mean Absolute Error (MAE) Comparison')
plt.xlabel('Models')
plt.ylabel('MAE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()

# MSE plot
plt.subplot(2, 2, 2)
plt.plot(metrics['Model'], metrics['MSE'], marker='o', linestyle='-', color='r', label='MSE')
plt.title('Mean Squared Error (MSE) Comparison')
plt.xlabel('Models')
plt.ylabel('MSE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()

# RMSE plot
plt.subplot(2, 2, 3)
plt.plot(metrics['Model'], metrics['RMSE'], marker='o', linestyle='-', color='g', label='RMSE')
plt.title('Root Mean Squared Error (RMSE) Comparison')
plt.xlabel('Models')
plt.ylabel('RMSE')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()

# R-squared plot
plt.subplot(2, 2, 4)
plt.plot(metrics['Model'], metrics['R-squared'], marker='o', linestyle='-', color='m', label='R-squared')
plt.title('R-squared Comparison')
plt.xlabel('Models')
plt.ylabel('R-squared')
plt.xticks(rotation=45)
plt.grid(True)
plt.legend()

plt.tight_layout()
plt.show()

Conclusion

This project demonstrates the process of predicting cryptocurrency prices using various machine learning models. By preprocessing the data, visualizing trends, selecting relevant features, and applying multiple regression techniques, I was able to build a predictive models to estimate future prices of cryptocurrency.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
README.md		README.md
cryptopriceprediction.py		cryptopriceprediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cryptocurrency Price Prediction

Introduction

Code Explanation

Importing Libraries

Data Preprocessing

Data Visualization

Feature Selection

Models

Results

Conclusion

About

Releases

Packages

Languages

farzeennimran/Cryptocurrency-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

Cryptocurrency Price Prediction

Introduction

Code Explanation

Importing Libraries

Data Preprocessing

Data Visualization

Feature Selection

Models

Results

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages