Merge pull request #34 from fomorians/2.0

Pyoneer w/ 2.0 support
fomorians-oss · Jun 27, 2019 · ac2a00e · ac2a00e
2 parents 86e980c + 38959e6
commit ac2a00e
Show file tree

Hide file tree

Showing 73 changed files with 1,698 additions and 1,522 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
-/pyoneer.egg-info/
 /build/
 /dist/
+/*.egg-info/
 *.pyc
 *.swp
 .DS_Store

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -3,4 +3,4 @@ repos:
     rev: master
     hooks:
     - id: black
-      language_version: python3.6
+      language_version: python3.6
diff --git a/Pipfile b/Pipfile
@@ -5,12 +5,13 @@ name = "pypi"
 
 [packages]
 gym = "*"
-tensorflow = "*"
-tensorflow-probability = "*"
+tensorflow = "==2.0.0-beta1"
+tfp-nightly = "*"
 
 [requires]
 python_version = "3.6.5"
 
 [dev-packages]
 ipdb = "*"
 pre-commit = "*"
+twine = "*"
diff --git a/README.md b/README.md
@@ -9,34 +9,44 @@ For the top-level utilities, import like so:
     import pyoneer as pynr
     pynr.math.rescale(...)
 
-For the large sub-modules, such as reinforcement learning, we recommend:
+For the larger sub-modules, such as reinforcement learning, we recommend:
 
     import pyoneer.rl as pyrl
-    pyrl.losses.policy_gradient_loss(...)
+    loss_fn = pyrl.losses.PolicyGradient(...)
 
-In general the API tries to adhere to TensorFlow 2.0's API.
+In general, the Pyoneer API tries to adhere to the TensorFlow 2.0 API.
 
 ### Examples
 
-- [Eager Proximal Policy Optimization](https://github.com/fomorians/ppo)
+- [TF 2.0 Proximal Policy Optimization](https://github.com/fomorians/ppo)
 
 ## API
 
+### Activations ([`pynr.activations`](pyoneer/activations))
+
+- `pynr.activations.swish`
+
+### Debugging ([`pynr.debugging`](pyoneer/debugging))
+
+- `pynr.debugging.Stopwatch`
+
+### Distributions ([`pynr.distributions`](pyoneer/distributions))
+
+- `pynr.distributions.MultiCategorical`
+
 ### Initializers ([`pynr.initializers`](pyoneer/initializers))
 
 - `pynr.initializers.SoftplusInverse`
 
 ### Layers ([`pynr.layers`](pyoneer/layers))
 
-- `pynr.layers.Normalizer`
+- `pynr.layers.Swish`
 - `pynr.layers.OneHotEncoder`
 - `pynr.layers.AngleEncoder`
-- `pynr.layers.DictFeaturizer`
-- `pynr.layers.ListFeaturizer`
-- `pynr.layers.VecFeaturizer`
 
 ### Tensor Manipulation ([`pynr.manip`](pyoneer/manip))
 
+- `pynr.manip.flatten`
 - `pynr.manip.batched_index`
 - `pynr.manip.pad_or_truncate`
 - `pynr.manip.shift`
@@ -62,80 +72,76 @@ In general the API tries to adhere to TensorFlow 2.0's API.
 - `pynr.metrics.MAPE`
 - `pynr.metrics.SMAPE`
 
-### Neural Networks ([`pynr.nn`](pyoneer/nn))
+### Moments ([`pynr.moments`](pyoneer/moments))
 
-- `pynr.nn.swish`
-- `pynr.nn.moments_from_range`
-- `pynr.nn.StreamingMoments`
-- `pynr.nn.ExponentialMovingMoments`
+- `pynr.moments.range_moments`
+- `pynr.moments.StaticMoments`
+- `pynr.moments.StreamingMoments`
+- `pynr.moments.ExponentialMovingMoments`
 
-### Reinforcement Learning ([`pynr.rl`](pyoneer/rl))
+### Learning Rate Schedules ([`pynr.schedules`](pyoneer/schedules))
 
-Utilities for reinforcement learning.
+- `pynr.schedules.CyclicSchedule`
 
-#### Environments ([`pynr.rl.envs`](pyoneer/rl/envs))
+### Reinforcement Learning ([`pynr.rl`](pyoneer/rl))
 
-- `pynr.rl.envs.BatchEnv`
-- `pynr.rl.envs.ProcessEnv`
+Utilities for reinforcement learning.
 
 #### Losses ([`pynr.rl.losses`](pyoneer/rl/losses))
 
-- `pynr.rl.losses.policy_gradient_loss`
-- `pynr.rl.losses.clipped_policy_gradient_loss`
+- `pynr.rl.losses.policy_gradient`
+- `pynr.rl.losses.policy_entropy`
+- `pynr.rl.losses.clipped_policy_gradient`
+- `pynr.rl.losses.PolicyGradient`
+- `pynr.rl.losses.PolicyEntropy`
+- `pynr.rl.losses.ClippedPolicyGradient`
 
 #### Targets ([`pynr.rl.targets`](pyoneer/rl/targets))
 
-- `pynr.rl.targets.discounted_rewards`
-- `pynr.rl.targets.generalized_advantages`
+- `pynr.rl.targets.DiscountedReturns`
+- `pynr.rl.targets.GeneralizedAdvantages`
 
 #### Strategies ([`pynr.rl.strategies`](pyoneer/rl/strategies))
 
-- `pynr.rl.strategies.EpsilonGreedyStrategy`
-- `pynr.rl.strategies.ModeStrategy`
-- `pynr.rl.strategies.SampleStrategy`
+- `pynr.rl.strategies.EpsilonGreedy`
+- `pynr.rl.strategies.Mode`
+- `pynr.rl.strategies.Sample`
 
-### Training ([`pynr.training`](pyoneer/training))
+#### Wrappers ([`pynr.rl.wrappers`](pyoneer/rl/wrappers))
 
-- `pynr.training.CyclicSchedule`
-- `pynr.training.update_target_variables`
+- `pynr.rl.wrappers.ObservationCoordinates`
+- `pynr.rl.wrappers.ObservationNormalization`
+- `pynr.rl.wrappers.Batch`
+- `pynr.rl.wrappers.BatchProcess`
+- `pynr.rl.wrappers.Process`
 
 ## Installation
 
-There are a few options of installing:
-
-1. Install with `pipenv`:
-
-        pipenv install pyoneer
+There are a few options for installation:
 
-2. Install with `pip`:
+1. (Recommended) Install with `pipenv`:
 
-        pip install pyoneer
+        pipenv install fomoro-pyoneer
 
-3. Install locally for development with `pipenv`:
+2. Install locally for development with `pipenv`:
 
         git clone https://github.com/fomorians/pyoneer.git
         cd pyoneer
         pipenv install
         pipenv shell
 
-4. Install locally for development with `pip`:
-
-        git clone https://github.com/fomorians/pyoneer.git
-        cd pyoneer
-        pip install -e .
-
 ## Testing
 
 There are a few options for testing:
 
 1. Run all tests:
 
-        python -m unittest discover -p '*_test.py'
+        python -m unittest discover -bfp '*_test.py'
 
 2. Run specific tests:
 
         python -m pyoneer.math.logical_ops_test
 
 ## Contributing
 
-File an issue following the `ISSUE_TEMPLATE`, then submit a pull request from a branch describing the feature. This will eventually get merged into `master`.
+File an issue following the `ISSUE_TEMPLATE`. If the issue discussion warrants implementation, then submit a pull request from a branch describing the feature. This will eventually get merged into `master` after a few rounds of code review.
diff --git a/pyoneer/__init__.py b/pyoneer/__init__.py
@@ -2,11 +2,28 @@
 from __future__ import division
 from __future__ import print_function
 
+from pyoneer import activations
+from pyoneer import debugging
+from pyoneer import distributions
 from pyoneer import initializers
 from pyoneer import layers
 from pyoneer import manip
 from pyoneer import math
 from pyoneer import metrics
-from pyoneer import nn
+from pyoneer import moments
 from pyoneer import rl
-from pyoneer import training
+from pyoneer import schedules
+
+__all__ = [
+    "activations",
+    "debugging",
+    "distributions",
+    "initializers",
+    "layers",
+    "manip",
+    "math",
+    "metrics",
+    "moments",
+    "rl",
+    "schedules",
+]
diff --git a/pyoneer/rl/envs/__init__.py → pyoneer/activations/__init__.py b/pyoneer/rl/envs/__init__.py → pyoneer/activations/__init__.py
@@ -2,5 +2,6 @@
 from __future__ import division
 from __future__ import print_function
 
-from pyoneer.rl.envs.batch_env_impl import BatchEnv
-from pyoneer.rl.envs.process_env_impl import ProcessEnv
+from pyoneer.activations.activations_impl import swish
+
+__all__ = ["swish"]
diff --git a/pyoneer/nn/activation_ops.py → pyoneer/activations/activations_impl.py b/pyoneer/nn/activation_ops.py → pyoneer/activations/activations_impl.py
@@ -7,7 +7,7 @@
 
 def swish(x):
     """
-    Compute the Swish, self-gating, activation function: `x * sigmoid(x)`.
+    Compute Swish, self-gating, activation function: `x * sigmoid(x)`.
 
     Args:
         x: Tensor
@@ -16,5 +16,5 @@ def swish(x):
         Tensor of same dimension as `x`.
     """
     y = x * tf.sigmoid(x)
-    y = tf.check_numerics(y, "swish")
+    y = tf.debugging.check_numerics(y, "swish")
     return y
diff --git a/pyoneer/activations/activations_test.py b/pyoneer/activations/activations_test.py
@@ -0,0 +1,19 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from pyoneer.activations.activations_impl import swish
+
+
+class ActivationsTest(tf.test.TestCase):
+    def test_swish(self):
+        x = tf.constant([-1.0, 0.0, +1.0])
+        actual = swish(x)
+        expected = tf.constant([-0.268941, 0.0, 0.731059])
+        self.assertAllClose(actual, expected)
+
+
+if __name__ == "__main__":
+    tf.test.main()
diff --git a/pyoneer/debugging/__init__.py b/pyoneer/debugging/__init__.py
@@ -0,0 +1,7 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from pyoneer.debugging.debugging_impl import Stopwatch
+
+__all__ = ["Stopwatch"]
diff --git a/pyoneer/debugging/debugging_impl.py b/pyoneer/debugging/debugging_impl.py
@@ -0,0 +1,38 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+
+class Stopwatch(object):
+    """
+    Stopwatch for measuring how long operations take. Great for fast and easy profiling.
+
+    Example:
+    >>> x = tf.constant(1.0)
+    >>> y = tf.constant(2.0)
+    >>> with Stopwatch() as watch:
+    >>>    z = x + y
+    >>> tf.print(watch.duration)
+    >>> # 0.00021505355834960938
+    """
+
+    def __init__(self):
+        self.start_time = None
+        self.end_time = None
+        self.duration = None
+
+    def start(self):
+        self.start_time = tf.timestamp()
+
+    def stop(self):
+        self.end_time = tf.timestamp()
+        self.duration = self.end_time - self.start_time
+
+    def __enter__(self):
+        self.start()
+        return self
+
+    def __exit__(self, exc_type, exc_value, traceback):
+        self.stop()
diff --git a/pyoneer/debugging/debugging_test.py b/pyoneer/debugging/debugging_test.py
@@ -0,0 +1,20 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+from pyoneer.debugging.debugging_impl import Stopwatch
+
+
+class DebuggingTest(tf.test.TestCase):
+    def test_stopwatch(self):
+        with Stopwatch() as stopwatch:
+            pass
+        self.assertIsNotNone(stopwatch.start_time)
+        self.assertIsNotNone(stopwatch.end_time)
+        self.assertIsNotNone(stopwatch.duration)
+
+
+if __name__ == "__main__":
+    tf.test.main()
diff --git a/pyoneer/distributions/__init__.py b/pyoneer/distributions/__init__.py
@@ -0,0 +1,7 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from pyoneer.distributions.distributions_impl import MultiCategorical
+
+__all__ = ["MultiCategorical"]
diff --git a/pyoneer/distributions/distributions_impl.py b/pyoneer/distributions/distributions_impl.py
@@ -0,0 +1,35 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import tensorflow as tf
+
+
+class MultiCategorical(object):
+    """
+    Distribution composed of multiple distributions.
+
+    Useful for representing `gym.spaces.MultiDiscrete`.
+
+    Args:
+        distributions: list of distributions.
+    """
+
+    def __init__(self, distributions):
+        self.distributions = distributions
+
+    def log_prob(self, value):
+        values = tf.split(value, len(self.distributions), axis=-1)
+        log_probs = [
+            dist.log_prob(val[..., 0]) for dist, val in zip(self.distributions, values)
+        ]
+        return tf.math.add_n(log_probs)
+
+    def entropy(self):
+        return tf.math.add_n([dist.entropy() for dist in self.distributions])
+
+    def sample(self):
+        return tf.stack([dist.sample() for dist in self.distributions], axis=-1)
+
+    def mode(self):
+        return tf.stack([dist.mode() for dist in self.distributions], axis=-1)