Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified tagging of DataFrames and DataSets #717

Open
gicmo opened this issue Jan 10, 2018 · 1 comment
Open

Unified tagging of DataFrames and DataSets #717

gicmo opened this issue Jan 10, 2018 · 1 comment
Labels
Milestone

Comments

@gicmo
Copy link
Member

gicmo commented Jan 10, 2018

I will write more about how, eventually.

@gicmo
Copy link
Member Author

gicmo commented Oct 15, 2019

Currently we have:

  • DataSet: abstract base class/interface of a rectangular homogeneous n-dimensional data container, i.e. it has dataExtend() → NDSize (n-dim rectangular), and dataType() → DataType (homogeneous).
  • DataArray: is a DataSet and is the frontend class for actual backing store for DataSet like data. DataArrays have dimension descriptors and a unit for the data itself.
  • DataView: is a DataSet and represents a view of a subset of data in a DataArray, i.e. it is a hyperrectangle of size count (NDSize) starting at offset (NDSize).

Additionally, we have the new DataFrame, a rectangular data container consisting of n columns (name, unit, DataType) by m rows.

Tagging currently is done by having the tag with (multiple) position+extents and pointers to (reference) DataArrays which must match in dimensionality the position and extends.

To allow unified tagging, i.e. DataArray and DataFrame, the references must be changed to either:

  • a common base object, that DataArray and DataFrame derive from
  • a (new) intermediate object that would in turn then point to a DataArray or DataFrame, maybe with additional specifications of how position & extends is applied.

The latter is the more complicated, but more flexible solution, while the former is the more straight forward and easier to implement solution.

The common base object could be the existing DataSet, if it were to be extended to include Dimensions and units. The DataView then would need to be amended to include those. The tricky bit would be the dimensions, which would need to include a view (offset+count) applied to the Dimension of the underlying DataArray. The Tags would need to be changed to work only with DataSets for references and retrieveData.
Another new object would be needed representing a view of a DataFrame, much like DataView for DataArray: FrameView (name subject to change), implementing a DataSet (i.e. a FrameView is DataSet). The reference in the file format would need to be amended (attributes in hdf5) to specify everything that is needed to re-create that FrameView.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant