Skip to content

In this example, I highlight R functions and system commands that could be used to engineer efficient data science workflows.

Notifications You must be signed in to change notification settings

sydeaka/workflow-automation

Repository files navigation

workflow-automation

A data scientist's workflow generally consists of a series of small tasks and mundane decisions that often consume a lot of time and energy. To begin a predictive modeling workflow, for example, one might log into a database, download the data to a local file system, and then move the data into a different system where analytic tools are installed. After loading the dataset and applying data transformations, the data scientist creates descriptive summaries and builds a predictive model. Those results might be stored and manually organized into an annotated report that summarizes the analysis. Finally, the report is emailed to team members and stakeholders and possibly published to the web. In practice, the data scientist might have to iteratively re-execute the entire workflow multiple times in order to incorporate stakeholder feedback or include other analyses that are added as the project's scope expands.

The promise of automation is a future in which data scientists will yield responsibility of these burdensome tasks to programmable robots that will do our work. I am happy to report that the future has arrived! A number of open source and proprietary solutions are currently available to create, organize, and execute automated data science workflows. It is also quite possible to unleash the power of workflows with a handful of easy-to-use R packages and Linux/Bash commands.

A data-driven company that leverages this framework will see insights produced, published, and distributed more rapidly within their organization. Data scientists that embrace the automation mindset are more productive, yet have more time and energy available to inject creativity and innovation into their work.

In this example, I highlight R functions and system commands that could be used to engineer efficient data science workflows. A sample end-to-end solution similar to the complex workflow described above is used to demonstrate these principles.

For details, review the Powerpoint slides in the main directory.

About

In this example, I highlight R functions and system commands that could be used to engineer efficient data science workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages