Skip to content

Each script and notebook in the repository is well-documented, with extensive logging for transparency and reproducibility. The repository is designed for researchers, data scientists, and conservationists interested in applying data-driven methods to marine ecology and species management.

Notifications You must be signed in to change notification settings

sabdulrazzaque/Tuna_Gillnet_Fisheries_Bycatch_Cetaceans_NIO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project Overview This project presents a detailed and comprehensive analysis of geospatial and temporal data concerning Dolphin species in the Northern Indian Ocean. The primary aim is to uncover significant trends, patterns, and anomalies within the data and to leverage these insights for predictive modeling and future forecasting. The project employs a combination of advanced statistical methods, machine learning techniques, and geospatial analysis to build a robust framework for understanding the temporal and spatial dynamics of dolphin populations in the region. By integrating multiple approaches, the project delivers a holistic view of the data, which can be used to inform conservation efforts, manage fisheries, and contribute to ecological research.

Data Loading and Preprocessing The analysis begins with the meticulous loading of data from an Excel file, which serves as the foundation for all subsequent analyses. The dataset is carefully selected and loaded into a Pandas DataFrame, where preprocessing steps are undertaken to ensure data quality and consistency. This includes the handling of missing values, conversion of key columns such as latitude, longitude, Dolphin Total Number (DTN), and Dolphin Total Weight (DTW) to appropriate numeric formats, and the correction of date formats to ensure accurate temporal analysis. The dataset is then filtered to include data from 2012 to 2017, allowing for a focused analysis period. Additional temporal features, such as month and year, are extracted from the date column to facilitate further exploration of seasonal and yearly trends.

Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) is a critical step in understanding the underlying structure of the data. This project undertakes a comprehensive EDA to explore both the temporal and spatial aspects of the dataset.

Temporal Analysis: The project first examines how Dolphin Total Number (DTN) and Dolphin Total Weight (DTW) have changed over time. Line plots are used to visualize these trends, providing a clear picture of how dolphin populations and their corresponding weights fluctuate annually. The analysis also calculates the mean values of DTN and DTW for each year, enabling a deeper understanding of inter-annual variations.

Correlation Analysis: To understand the relationship between DTN and DTW, a Pearson correlation analysis is conducted. The results are visualized through scatter plots, which reveal the strength and direction of the correlation between the number of dolphins and their total weight.

Density Mapping: Geographic density maps are created using Kernel Density Estimation (KDE) to visualize the spatial distribution of DTN and DTW. These maps highlight areas with high concentrations of dolphins, offering insights into their preferred habitats and migration patterns.

Statistical Summary: Descriptive statistics for DTN and DTW are calculated, including measures such as mean, median, standard deviation, and range. These statistics provide a summary of the central tendencies and variability within the data.

Box Plot Analysis: Box plots are generated to visualize the distribution of DTN and DTW across different months. This helps in identifying seasonal patterns and outliers, providing further insights into how dolphin populations vary throughout the year.

Heatmap of Correlations: A heatmap is generated to visualize the correlations between DTN, DTW, latitude, and longitude. This visual representation helps identify any spatial patterns and relationships among these variables.

Advanced Analytical Techniques To gain deeper insights, the project applies several advanced analytical techniques:

Cluster Analysis: Using KMeans clustering, the project identifies distinct geographical clusters based on latitude and longitude. This clustering helps in segmenting the data into different regions, each characterized by unique patterns in DTN and DTW. The characteristics of each cluster, such as mean DTN and DTW, are calculated and analyzed.

Principal Component Analysis (PCA): PCA is employed to reduce the dimensionality of the data, making complex relationships more interpretable. By focusing on the principal components, the project simplifies the data while retaining the most critical information, aiding in the identification of major factors influencing dolphin populations.

Interaction Effects: Linear regression models are used to explore the interaction effects between latitude and longitude on DTN and DTW. This analysis reveals how the geographical location influences dolphin numbers and their total weight, highlighting key areas of interaction that might be important for understanding dolphin behavior and distribution.

Time Series Analysis and Forecasting The project applies a range of time series analysis techniques to decompose and forecast trends in DTN and DTW:

Time Series Decomposition: The project breaks down the time series data for DTN and DTW into its underlying components: trend, seasonal, and residual. This decomposition helps in understanding the long-term trends, periodic seasonal patterns, and irregular fluctuations in the data.

STL Decomposition: Seasonal-Trend decomposition using LOESS (STL) is performed to further analyze the seasonal components of the data. This method provides a more flexible approach to separating seasonal effects from the trend and residual components.

Trend Analysis: Linear regression is used to analyze trends over time in DTN and DTW. By fitting linear models to the data, the project identifies significant upward or downward trends, providing valuable insights into the overall direction of dolphin population changes.

Anomaly Detection: Anomalies in the time series data are detected using Z-score-based methods, which identify significant deviations from the norm. These anomalies are visualized on scatter plots, highlighting periods where dolphin numbers or weights significantly differed from expectations.

Seasonal Analysis: Using ANOVA tests, the project investigates whether there are significant differences in DTN and DTW across different seasons. This analysis is crucial for understanding seasonal variations and their impact on dolphin populations.

Regression and Advanced Regression: Simple linear regression models are built to predict DTN and DTW based on geographic coordinates (latitude and longitude). The project also implements more advanced regression models, such as Random Forest, to capture non-linear relationships and improve predictive accuracy.

Time Series Forecasting: The project applies various forecasting techniques, including ARIMA, Prophet, and LSTM neural networks, to predict future values of DTN and DTW. These models take into account past data to provide reliable forecasts, which are essential for planning conservation efforts and managing fisheries.

Spatial and Geospatial Analysis Spatial analysis plays a vital role in understanding the distribution and movement of dolphin populations:

Spatial Autocorrelation: The project uses Moran's I statistic to assess the degree of spatial autocorrelation in DTN and DTW. This analysis helps in understanding whether similar values of DTN and DTW are clustered together geographically, indicating potential habitat preferences or migratory routes.

Spatial Lag Models: To further explore spatial dependencies, the project implements Maximum Likelihood Spatial Lag models. These models take into account the influence of neighboring regions on dolphin populations, providing insights into how local factors and interactions between regions affect DTN and DTW.

Geospatial Clustering: Geospatial clustering methods, including DBSCAN and Hierarchical Clustering, are used to identify clusters of high or low dolphin activity. These methods help in understanding how dolphin populations are distributed across different regions and how they may be affected by environmental or human-induced factors.

Bayesian Structural Time Series (BSTS) and Advanced Modeling The project incorporates Bayesian Structural Time Series (BSTS) models to capture and forecast trends and seasonality in DTN and DTW:

BSTS Modeling: Using TensorFlow Probability, the project constructs BSTS models that combine seasonal effects and trends to predict future dolphin numbers and weights. These models are particularly powerful for capturing complex temporal dynamics and providing probabilistic forecasts, which are useful for risk assessment and decision-making in conservation efforts.

Model Interpretation: The project also emphasizes model interpretability by generating partial dependence plots and calculating SHAP values. These tools help explain the importance of different features in the models, providing transparency and insights into how predictions are made.

Anomaly and Change Point Detection Anomaly detection and change point detection are critical for identifying significant shifts in dolphin populations:

Prophet Anomaly Detection: The project uses the Prophet model to detect anomalies in the time series data for DTN and DTW. Prophet is particularly well-suited for handling seasonality and trends, making it an effective tool for identifying periods where dolphin numbers or weights deviate from expected patterns.

Change Point Detection: The Binseg algorithm is applied to detect change points in the DTN time series. These change points indicate moments when there were significant shifts in the data, which could correspond to environmental changes, shifts in fishing practices, or other impactful events.

Logging and Debugging Throughout the project, logging is extensively used to track the progress of the analysis and to capture any errors or issues that arise. This logging framework is essential for debugging and ensuring that the analysis is transparent and reproducible. By documenting each step of the process, the project maintains a clear record of all actions taken, facilitating easier troubleshooting and improving the overall robustness of the analysis.

Conclusion This project provides a robust and comprehensive analytical framework for understanding and forecasting dolphin-related data in the Northern Indian Ocean. By integrating a wide range of advanced statistical methods, machine learning techniques, and geospatial analyses, the project offers valuable insights into the temporal and spatial dynamics of dolphin populations. These insights have significant applications in conservation, fisheries management, and ecological research, contributing to the sustainable management of marine resources and the protection of dolphin species.

About

Each script and notebook in the repository is well-documented, with extensive logging for transparency and reproducibility. The repository is designed for researchers, data scientists, and conservationists interested in applying data-driven methods to marine ecology and species management.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published