Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag data retrieval specification #818

Open
achilleas-k opened this issue May 4, 2020 · 2 comments
Open

Tag data retrieval specification #818

achilleas-k opened this issue May 4, 2020 · 2 comments

Comments

@achilleas-k
Copy link
Member

achilleas-k commented May 4, 2020

We need to clarify and specify unambiguously how data retrieval works for Tag and MultiTag referenced data. I'm going to ignore Untagged and Indexed features for now, because I think these are simple and clear. So this refers to referenced data and the Tagged features which behave equivalently.

In all the examples below, I'll be specifying positions, extents, and timestamps. Assume for simplicity that timestamps is a range dimension on the referenced data (or a data array with alias range dimension) with the same unit as the positions and extents.

Currently there's a bug where data is returned even when the tag region should be empty. Consider the following example:

positions = [0]
extents = [10]
timestamps = [22, 23, 31]

The first tag index should return an empty DataView but instead returns [22]. This seems to be because we first check if there's an index/tick that's greater or equal than the positions, to set it as a start index. This is fine when there's at least one value to return (the first value returned should be at an index that's >= to the start position), but should not apply when the end position (position + extent) is less than the first available index or tick.

I also want to clarify that at some point we decided that the end position should be inclusive. Is this correct? For example:

positions = [10]
extents = [10]
timestamps = [10, 12, 19, 20, 23]
  • Inclusive returns: [10, 12, 19, 20]
  • Non-inclusive returns: [10, 12, 19]

Currently we decided on inclusive but I'm wondering if we would like to revisit this. I think the inclusive slicing is a little unexpected, at least in certain cases. For instance:

positions = [0, 10, 20, 30]
extents = [10, 10, 10, 10]
timestamps = [0, 1, 2, 3, 4, ..., 39]

I think many would expect that retrieving the slices defined by each tag would have no overlap.

@Seafad
Copy link

Seafad commented May 5, 2020

@achilleas-k,

I found a bug "on the other side" of MultiTag retrieval. The last timestamp of Range (or AliasRange) dimension is returned for all following positions when they should be empty. For example, 8 will be returned for all positions when using the following code:
positions = [0 10 20] extents = [10 10 10] timestamps = [8]

Non-inclusive extents make more sense to me. From my perspective, it seems intuitive to expect that an extent of 1 should return a segment of length 1. Another point supporting non-inclusive extents in python was mentioned in #802. It is equally important is to have a consistent behavior for both MATLAB and Python APIs. Currently, due to different implementation between the APIs, reading a MATLAB-created NIX file with Python API causes OutOfBounds error.

@jgrewe
Copy link
Member

jgrewe commented May 6, 2020

the behaviour you describe @Seafad can be safely regarded as a bug... sorry.
A workaround for the moment is to read the ticks and apply position and extent in Matlab/Numpy via logical indexing.
Resolving this issue might involve some API changes which is not nice... still thinking about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants