Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record and play back Source data for testing #22

Open
elliottwilliams opened this issue Aug 20, 2016 · 4 comments
Open

Record and play back Source data for testing #22

elliottwilliams opened this issue Aug 20, 2016 · 4 comments

Comments

@elliottwilliams
Copy link
Member

After thinking about this last night, I (re-)discovered VCR. It can be configured to record all HTTP requests made in a given session to a cassette, and then can be told to play back responses from that cassette in sequence.

This is obviously useful for unit testing, since you can call actual source endpoints, store their responses for later use, and easily refresh test cassettes over time, but I think it could also be useful for functional testing of shark's data quality. We can examine and re-examine problem timeframes where Shark is behaving weirdly. We can test sporadic events like route updates and (de)activations. And we can code late at night when no vehicles are out. Here's how it would work:

Shark's sources would have three modes of operation:

  • ephemeral (default): Shark refreshes sources and uses live source data to update and manage objects, as usual. Data from sources are discarded after being parsed.
  • recording: Shark continues to use live source data, but records each response to a given cassette. These responses are stored in sequence with request and timestamp metadata. (VCR record mode :all)
  • playback: Shark loads a given cassette and all requests to sources must hit the cassette (VCR record mode :none). Additionally, shark can be given a timestamp to start from, and we can write a simple request matcher that only selects cassette responses if the URIs match and the time recorded is at least the given timestamp. Since VCR by default will never repeat a response playback, shark proceeds with time-travelled data until the cassette runs out.

Potential complications include the need to adjust Timetable's clock to be in sync with a time-travelling Shark, and operating on and storing the enormous cassettes that, say, 12 hours of transit data would produce. Regardless, I think this would be worth a shot!

@faultyserver
Copy link
Member

That's fantastic. I've never seen that library before. I'll definitely be taking a look into this as I start trying to write tests.

I'm not sure that Sources themselves need to have these three modes of operation, though. It seems like VCR is only meant to be run during tests (e.g., it hooks into WebMock, not Net::HTTP directly), which would make it rather difficult to configure for realtime usage. I do think caching the past 3-4 hours of source traffic would be useful in capturing those problem timeframes, but I don't think this is the right solution for that.

Maybe we could set up a separate instance for functional testing of everything after sourcing, with a single Source that just loops a given set of data, rather than making requests.

In terms of storage, as long as the tests are well-structured, make repeated use of the same data, and only make necessary requests, I'm not too concerned about storage size. Additionally, a lot of the unit tests in particular will probably end up using factories to generate data, just because that gives absolute control over the data, rather than having to find a specific case somewhere in a cassette.

With regard to Timetable, we might as well implement it with this in mind, allowing an optional timestamp parameter that will be used as the current time, rather than always assuming the current time.

@elliottwilliams
Copy link
Member Author

Sure. Maybe recording/playback for downstream functional testing could be done with (1) a middleware that can save timestramped objects to disk as they are updated, and (2) a source that can read those saved objects over time.

Though (correct me if I'm wrong) I was under the impression that webmock can be enabled outside of a testing environment, and can be set to forward all requests through VCR by not stubbing any responses. So it could be used to record sessions of data, although perhaps not very elegantly.

@faultyserver
Copy link
Member

You're right. I've never needed it elsewhere, so I was assuming it was built for RSpec, but it looks like WebMock can work anywhere. Reading the docs, though, it seems like getting a good proxy through to VCR would be a bit awkward. But, I can't say anything definitively, since I haven't tried it.

I'm more keen on the middleware/source idea, though, since it can implement a rolling window, which I couldn't figure out how to do with VCR in my hour or so of looking around.

@faultyserver
Copy link
Member

I'm starting work on this now, and my plan of action at this point is to create a data set from the events that get fired through Shark::Agency (i.e., all of the default events), which should be sufficient for testing middlewares.

Sources will probably end up getting their own repos at some point, so they'll have their own test suite, and pretty much every part of the main system (Shark::ObjectManager and Shark::Agency) can be tested with factories, since they're primarily concerned with collections, rather than the values inside of each object.

faultyserver added a commit that referenced this issue Aug 23, 2016
See #22 for discussion. The idea is to record a set of events that can be used to functionally test everything past `Shark::Agency` in the stack (i.e., middlewares).

This will probably be replaced with preference to factories before too long, though, since they give better control over every detail of an object, making unit tests more granular without having to scan for a desired series of events.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants