dato is an open source library that provides a rapid, declarative ecosystem for reproducible data science within python.
dato accomplishes this by (1) enabling piping with
>> and (2) unifying common data science libraries under a common syntax.
df >> GroupBy('country') >> Sum >> Hist('revenue', col='age')
Dato has four major components:
Decorator that enables piping with
Sub-module with pipe-compatible
Sub-module with pipe-compatible plotting operations, following a consistent
pandas-inspired syntax with
seaborn-esque extended functionality.
Simplifies and standardizes syntax across popular ML libraries.
pip install dato
Although piping has some downside as a general programming paradigm (particularly in obscuring code errors and being naturally difficult to debug), we argue that these downsides are outweighed by a level of concision and maintainability it lends to data workflows. When working with data in development environments which contain hidden states (such as jupyter or R markdown), reproducibility of code can be difficult to consistently achieve. Piping mitigates this danger by (1) enforcing a consistent order of operations, and (2) disallowing hidden states. Consequently, the piping paradigm is naturally reproducible, production-ready, and stable as soon as it is written -- properties that are of paramount importance in data work.