Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added doc on simulator API. #565

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Added doc on simulator API. #565

wants to merge 4 commits into from

Conversation

mwhittaker
Copy link
Member

No description provided.

@mwhittaker mwhittaker self-assigned this Aug 25, 2023
@mwhittaker mwhittaker marked this pull request as ready for review August 25, 2023 17:15
Gen: func(*rand.Rand) int { ... },
Func: func(ctx context.Context, replies map[int]int, x int, a A) error {
if y, err := a.A(ctx, x); err == nil {
replies[x] = y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this option because it's difficult to explain what instance of replies will be passed to the function. In reality, for each "workload", you'll be creating a state, and running a series of ops on it. But this isn't as encapsulated as it is in Option 3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

return nil
}

func (*Workload) GenCallA(r *rand.Rand) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed possibly defining a generator as a different type, like we did for routing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to your other comment with an example of this proposal.


### Decision

NOTE(mwhittaker): I don't love any of these options. I have a slight preference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for Option 3 or a tweaked version of it, because it encapsulates a "run" or a "workload" really well, along with its state etc.

Copy link
Member Author

@mwhittaker mwhittaker Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few cosmetic tweaks to Option 3 that might make it more appealing. First, we can move the generator to a separate struct, as you suggested. Second, we can change the Init method to return all fakes using the existing weavertest.Fake function. Note that I'm showing the simulator in a sim package, but it will probably end up in the weavertest package as well.

type Workload struct {
    a weaver.Ref[A]
    b weaver.Ref[B]
    replies map[int]int
}

func (w *Workload) Init(context.Context) ([]weavertest.FakeComponent, error) {
    w.replies = map[int]int{}
    return []weavertest.FakeComponent{weavertest.Fake[B](&fakeb{})}, nil
}

func (w *Workload) CallA(ctx context.Context, x int) error {
    if y, err := w.a.Get().A(ctx, x); err == nil {
        w.replies[x] = y
    }
    return nil
}

func (w *Workload) CallB(ctx context.Context) error {
    w.b.Get().B(ctx)
    return nil
}

type Generator struct {}

func (Generator) CallA(r *rand.Rand) int {
    ...
}

func TestWorkload(t *testing.T) {
    s, err := sim.New[Workload, Generator](sim.Options{...})
    if err != nil {
        t.Fatal(err)
    }
    results, err := s.Simulate(10 * time.Second)
    if err != nil {
        t.Fatal(err)
    }
}

@mwhittaker
Copy link
Member Author

@ghemawat @spetrovic77 Added the options we discussed today to the doc.

Copy link
Collaborator

@ghemawat ghemawat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear writeup of the options and the pros and cons. Very helpful.

docs/randomized_testing.md Outdated Show resolved Hide resolved
docs/randomized_testing.md Outdated Show resolved Hide resolved
docs/randomized_testing.md Show resolved Hide resolved

**Cons.**
Magic struct tags. Also, it is awkward to fake a component to which we don't
need a direct reference.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we change to something like:

type Workload struct {
    a weaver.Ref[A]
    b weavertest.Fake[B,fakeb]
    replies map[int]int
}

Would Fake[I,F] be something applicable to weavertest as well, which might make it more palatable.

E.g., chain_test.go's fake could instead by supplied as follows:

runner.Test(t, func(t *testing.T, a weavertest.Fake[A, fake]) {
    f := a.Fake()  // type *fake
    i := a.Component()  // type A
    ...
})

We can then get rid of weavertest.Fake and weavertest.FakeComponent. We should try writing some user documentation for this and see how it looks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this option to the doc. I can work on writing user docs.

docs/randomized_testing.md Show resolved Hide resolved
docs/randomized_testing.md Outdated Show resolved Hide resolved
- The `Generator[T]` interface is clunky. A value of type `T` generates a value
of type `T`, which is weird. It means that op arguments are doubling as
generators and generated values. Plus, the user sometimes has to typecast
these arguments. We saw that in the `CallA` method above.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of type-casting, we should perhaps have a method. That will provide two benefits:

  1. The representation of Generator[T] can hold more state than T.
  2. It will reduce the chance of typos like the user saying byte(x) instead of int(x).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that typecasting is both un-ergonomic and error-prone. I described one approach to a Get method in Option 8 which separates the Generate method and Get method across two different interfaces, but we can maybe merge them into a single interface.

- If a workload has many ops that don't re-use types much, this approach is more
onerous that defining generator methods.
- The `Generator[T]` interface is clunky. A value of type `T` generates a value
of type `T`, which is weird. It means that op arguments are doubling as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we renamed the Generate() method to Init() and have it modify the receiver. E.g.,

func (i *myInt) Init(r *rand.Rand) {
     *i = myInt(r.Intn(100)) }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on addressing the other comments, but I've been thinking more about how to specify generators. The approach I like the most is a Frankenstein combination of the existing proposals. First, we introduce a handful of interfaces:

type Generator[T any] interface {
    Generate(r *rand.Rand) T
}

type Shrinker[T any] interface {
    Shrink(T) []T
}

type Formatter[T any] interface {
    Format(T) string
}
  • Generator[T] generates values of type T.
  • Shrinker[T] shrinks values, which is used for minimization.
  • Formatter[T] pretty prints values, which is used for visualization.

I think these interfaces are as simple as possible, without any subtleties or oddities of some of the other proposals.

Next, workload methods have plain argument types:

type Workload struct { ... }
func (w *Workload) Foo(ctx context.Context, x int, y string) error { ... }
func (w *Workload) Bar(ctx context.Context, z bool) error { ... }

There is no casting, no Get method, and no confusion about a type doubling as its generator. The user then has to define a separate struct with one method for every op in their workload struct. Methods return the Generators used to generate arguments to the ops.

type WorkloadGenerator struct {}
func (WorkloadGenerator) Foo() (Generator[int], Generator[string]) { ... }
func (WorkloadGenerator) Bar() (Generator[bool]) { ... }

To make implementing these methods easier, we provide a family of general purpose Generators. For example, sim.Int, sim.String, sim.ReadableString, and so on. We can also provide basic functions to build Generators. For example, Pick[T any](xs ...T) Generator[T] returns a Generator that randomly returns the provided elements. Here's how we might implement WorkloadGenerator:

type WorkloadGenerator struct {}
func (WorkloadGenerator) Foo() (Generator[int], Generator[string]) {
    return sim.NegativeInt, sim.Pick("a", "b", "c")
}
func (WorkloadGenerator) Bar() (Generator[bool]) {
    return sim.BiasedFlip(0.75)
}

Of course, users are free to implement their own custom instances of Generator[T] as well. Moreover, a returned Generator[T] may optionally implement the Shrinker[T] or Formatter[T] interfaces. The general purpose Generators we provide all implement the Shrinker and Formatter interfaces automatically.

If a user finds themselves re-using the same Generator across a bunch of methods, they can use helper functions to avoid some of the repetition. Imagine a key-value store based workload, for example:

func key() Generator[string] { return sim.Pick("a", "b", "c") }
func value() Generator[string] { return sim.ReadableString }

type StoreGenerator struct {}
func (StoreGenerator) Get() Generator[string] { return key() }
func (StoreGenerator) Set() (Generator[string], Generator[string]) { return key(), value() }
func (StoreGenerator) Swap() (Generator[string], Generator[string]) { return key(), key() }
func (StoreGenerator) Delete() Generator[string] { return key() }

The main downside of this approach is that it requires by far the most typing out of any approach. The more I thought about this though, the less it bothered me. I started to think of it similar to how Go implements errors and how code is littered with if err != nil { ... } code. It's more typing, but there's not really any subtleties or gotchas. I started to feel that this approach is the easiest to understand, but the most annoying to type out.

Finally, a small but related point. We've discussed the idea of generators having state. I started to think this is a bad idea and muddies the otherwise simple abstraction of a generator. I think all state should be in the workload, and the generator is left as a stateless and relatively independent entity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatter

can we use the normal "func (...) String() string" API instead?

Copy link
Member Author

@mwhittaker mwhittaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sanjay!

- If a workload has many ops that don't re-use types much, this approach is more
onerous that defining generator methods.
- The `Generator[T]` interface is clunky. A value of type `T` generates a value
of type `T`, which is weird. It means that op arguments are doubling as
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on addressing the other comments, but I've been thinking more about how to specify generators. The approach I like the most is a Frankenstein combination of the existing proposals. First, we introduce a handful of interfaces:

type Generator[T any] interface {
    Generate(r *rand.Rand) T
}

type Shrinker[T any] interface {
    Shrink(T) []T
}

type Formatter[T any] interface {
    Format(T) string
}
  • Generator[T] generates values of type T.
  • Shrinker[T] shrinks values, which is used for minimization.
  • Formatter[T] pretty prints values, which is used for visualization.

I think these interfaces are as simple as possible, without any subtleties or oddities of some of the other proposals.

Next, workload methods have plain argument types:

type Workload struct { ... }
func (w *Workload) Foo(ctx context.Context, x int, y string) error { ... }
func (w *Workload) Bar(ctx context.Context, z bool) error { ... }

There is no casting, no Get method, and no confusion about a type doubling as its generator. The user then has to define a separate struct with one method for every op in their workload struct. Methods return the Generators used to generate arguments to the ops.

type WorkloadGenerator struct {}
func (WorkloadGenerator) Foo() (Generator[int], Generator[string]) { ... }
func (WorkloadGenerator) Bar() (Generator[bool]) { ... }

To make implementing these methods easier, we provide a family of general purpose Generators. For example, sim.Int, sim.String, sim.ReadableString, and so on. We can also provide basic functions to build Generators. For example, Pick[T any](xs ...T) Generator[T] returns a Generator that randomly returns the provided elements. Here's how we might implement WorkloadGenerator:

type WorkloadGenerator struct {}
func (WorkloadGenerator) Foo() (Generator[int], Generator[string]) {
    return sim.NegativeInt, sim.Pick("a", "b", "c")
}
func (WorkloadGenerator) Bar() (Generator[bool]) {
    return sim.BiasedFlip(0.75)
}

Of course, users are free to implement their own custom instances of Generator[T] as well. Moreover, a returned Generator[T] may optionally implement the Shrinker[T] or Formatter[T] interfaces. The general purpose Generators we provide all implement the Shrinker and Formatter interfaces automatically.

If a user finds themselves re-using the same Generator across a bunch of methods, they can use helper functions to avoid some of the repetition. Imagine a key-value store based workload, for example:

func key() Generator[string] { return sim.Pick("a", "b", "c") }
func value() Generator[string] { return sim.ReadableString }

type StoreGenerator struct {}
func (StoreGenerator) Get() Generator[string] { return key() }
func (StoreGenerator) Set() (Generator[string], Generator[string]) { return key(), value() }
func (StoreGenerator) Swap() (Generator[string], Generator[string]) { return key(), key() }
func (StoreGenerator) Delete() Generator[string] { return key() }

The main downside of this approach is that it requires by far the most typing out of any approach. The more I thought about this though, the less it bothered me. I started to think of it similar to how Go implements errors and how code is littered with if err != nil { ... } code. It's more typing, but there's not really any subtleties or gotchas. I started to feel that this approach is the easiest to understand, but the most annoying to type out.

Finally, a small but related point. We've discussed the idea of generators having state. I started to think this is a bad idea and muddies the otherwise simple abstraction of a generator. I think all state should be in the workload, and the generator is left as a stateless and relatively independent entity.

Copy link
Member Author

@mwhittaker mwhittaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sanjay! I addressed the rest of your comments.

docs/randomized_testing.md Outdated Show resolved Hide resolved
Gen: func(*rand.Rand) int { ... },
Func: func(ctx context.Context, replies map[int]int, x int, a A) error {
if y, err := a.A(ctx, x); err == nil {
replies[x] = y
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

return nil
}

func (*Workload) GenCallA(r *rand.Rand) int {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to your other comment with an example of this proposal.

docs/randomized_testing.md Outdated Show resolved Hide resolved
docs/randomized_testing.md Show resolved Hide resolved
docs/randomized_testing.md Show resolved Hide resolved
docs/randomized_testing.md Outdated Show resolved Hide resolved
- The `Generator[T]` interface is clunky. A value of type `T` generates a value
of type `T`, which is weird. It means that op arguments are doubling as
generators and generated values. Plus, the user sometimes has to typecast
these arguments. We saw that in the `CallA` method above.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that typecasting is both un-ergonomic and error-prone. I described one approach to a Get method in Option 8 which separates the Generate method and Get method across two different interfaces, but we can maybe merge them into a single interface.


**Cons.**
Magic struct tags. Also, it is awkward to fake a component to which we don't
need a direct reference.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this option to the doc. I can work on writing user docs.

- People can easily forget to register a generator for every type.
- People can register two generators for the same type.

### Decision
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the decision we discussed on offline. I'll start implementing things, and we can tweak stuff as we go.

mwhittaker added a commit that referenced this pull request Sep 6, 2023
mwhittaker added a commit that referenced this pull request Sep 6, 2023
mwhittaker added a commit that referenced this pull request Sep 8, 2023
mwhittaker added a commit that referenced this pull request Sep 8, 2023
htiennv pushed a commit to htiennv/weaver that referenced this pull request Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants