Skip to content

Commit

Permalink
Import methods use detect_types to detect if autoconversion from `s…
Browse files Browse the repository at this point in the history
…tr` to `int`, `float` and `bool` is needed
  • Loading branch information
mezantrop committed Jul 2, 2024
1 parent 4a13dc6 commit 8b189ad
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 51 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# CHANGELOG

* **2024.07.02 tSQLike-1.1.0**
* Import methods use `detect_types` to detect if autoconversion from `str` to `int`, `float` and `bool` is needed

* **2024.06.28 tSQLike-1.0.4**
* `select_lt()` respects empty arguments

Expand Down
48 changes: 30 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# tSQLike

[![Python package](https://github.com/mezantrop/tSQLike/actions/workflows/python-package.yml/badge.svg)](https://github.com/mezantrop/tSQLike/actions/workflows/python-package.yml)
[![CodeQL](https://github.com/mezantrop/tSQLike/actions/workflows/codeql.yml/badge.svg)](https://github.com/mezantrop/tSQLike/actions/workflows/codeql.yml)

## SQL-like interface to tabular structured data

**Not that early stage, but still in development: may contain bugs**
Expand All @@ -9,7 +11,7 @@

## Description

**tSQLike** is a Python3 module that is written with a hope to make tabular data process easier using SQL-like primitives.
**tSQLike** is a Python3 module that is written with a hope to make tabular data process easier using SQL-like primitives.

## Usage

Expand All @@ -34,16 +36,18 @@ t3.write_csv(dialect='unix')

## Installation

```
```sh
pip install tsqlike
```

## Functionality

### Table class
The main class of the module

The main class of the module

#### Data processing methods

| Name | Status | Description |
|-------------|---------|--------------------------------------------------------------------------|
| `join` | ☑ | Join two Tables (`self` and `table`) on an expression [*](#Warning) |
Expand All @@ -54,20 +58,23 @@ The main class of the module
| `group_by` | ☑ | GROUP BY primitive of SQL SELECT to apply aggregate function on a column |

#### Import methods

| Name | Status | Description |
|---------------------|---------|-------------------------------------------------------------------------|
| `import_dict_lists` | ☑ | Import a dictionary of lists into Table object |
| `import_dict_lists` | ☑ | Import a dictionary of lists into Table object |
| `import_list_dicts` | ☑ | Import a list of horizontal arranged dictionaries into the `Table` |
| `import_list_lists` | ☑ | Import `list(list_1(), list_n())` with optional first row as the header |

#### Export methods

| Name | Status | Description |
|---------------------|---------|-------------------------------------------------------------------------|
| `export_dict_lists` | ☑ | Export a dictionary of lists |
| `export_list_dicts` | ☑ | Export list of dictionaries |
| `export_list_lists` | ☑ | Export `list(list_1(), list_n())` with optional first row as the header |

#### Write methods

| Name | Status | Description |
|-----------------|---------|---------------------------------------------------------------------|
| `write_csv` | ☑ | Make `CSV` from the `Table` object and write it to a file or stdout |
Expand All @@ -76,12 +83,13 @@ The main class of the module
| `write_xml` | ☐ | Write `XML`. NB: Do we need this? |

#### Private methods

| Name | Status | Description |
|----------------|---------|-------------------------------------------|
| `_redimension` | ☑ | Recalculate dimensions of the Table.table |


### EvalCtrl class

Controls what arguments are available to `eval()` function

| Name | Status | Description |
Expand All @@ -91,23 +99,27 @@ Controls what arguments are available to `eval()` function
| `blacklist_remove` | ☑ | Remove the word from the blacklist |

### Standalone functions
| Name | Status | Description |
|--------------|---------|-------------------------------------------|
| `open_file` | ☑ | Open a file |
| `close_file` | ☑ | Close a file |
| `read_json` | ☑ | Read `JSON` file |
| `read_csv` | ☑ | Read `CSV` file |
| `read_xml` | ☐ | Read `XML`. NB: Do we need XML support? |

#### WARNING!
Methods `Table.join(on=)`, `Table.select(where=)` and `Table.write_json(export_f=)`, use `eval()` function
to run specified expressions within the program. **ANY** expression, including one that is potentially **DANGEROUS**

| Name | Status | Description |
|--------------|---------|----------------------------------------------------------|
| `open_file` | ☑ | Open a file |
| `close_file` | ☑ | Close a file |
| `read_json` | ☑ | Read `JSON` file |
| `read_csv` | ☑ | Read `CSV` file |
| `read_xml` | ☐ | Read `XML`. NB: Do we need XML support? |
| `to_type` | ☑ | Convert a string to a proper type: int, float or boolean |

#### WARNING

Methods `Table.join(on=)`, `Table.select(where=)` and `Table.write_json(export_f=)`, use `eval()` function
to run specified expressions within the program. **ANY** expression, including one that is potentially **DANGEROUS**
from security point of view, can be passed as the values of the above arguments. It is your duty to ensure correctness
and safety of these arguments and `EvalCtrl` helps to block potentially dangerous function/method names.
and safety of these arguments and `EvalCtrl` helps to block potentially dangerous function/method names.

Alternatively you can use `Table.join_lt()`, `Table.select_lt()` and `Table.write_json()`. They are significantly less
powerful, but do not use `eval()`.

## Contacts
If you have an idea, a question, or have found a problem, do not hesitate to open an issue or mail me directly:

If you have an idea, a question, or have found a problem, do not hesitate to open an issue or mail me directly:
Mikhail Zakharov <[email protected]>
2 changes: 1 addition & 1 deletion tsqlike/__about__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
""" Version number in a single place """

__version__ = "1.0.4"
__version__ = "1.1.0"
86 changes: 54 additions & 32 deletions tsqlike/tsqlike.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,16 +92,29 @@ def close_file(file):
if file and file is not sys.stdout and file is not sys.stdin:
file.close()

# ------------------------------------------------------------------------------------------------ #
def to_type(s):
""" Convert string s to a proper type: int, float or boolean """

if s in ('True', 'true', 'False', 'false'): # to boolean
return bool(s)

try:
return float(s) if '.' in s or ',' in s else int(s) # to float and int
except (ValueError, TypeError):
return s # no conversion possible -> string

# ------------------------------------------------------------------------------------------------ #
def read_csv(in_file=None, encoding=None, newline='', name='', dialect='excel', **fmtparams):
def read_csv(in_file=None, encoding=None, newline='', name='', detect_types=False,
dialect='excel', **fmtparams):
"""
Read CSV from a file and import into a Table object
:param in_file: Filename to read CSV from
:param encoding: Character encoding
:param newline: UNIX/Windows/Mac style line ending
:param name: Table name to assign
:param detect_types: Detect and correct types of data, default - False
:param dialect: CSV dialect, e.g: excel, unix
:**fmtparams: Various optional CSV parameters:
:param delimiter: CSV field delimiter
Expand All @@ -112,18 +125,19 @@ def read_csv(in_file=None, encoding=None, newline='', name='', dialect='excel',

f = open_file(in_file, file_mode='r', encoding=encoding, newline=newline)
_data = csv.reader(f, dialect=dialect, **fmtparams)
t = Table(data=list(_data), name=name)
t = Table(data=list(_data), name=name, detect_types=detect_types)
close_file(f)
return t


# -------------------------------------------------------------------------------------------- #
def read_json(in_file=None, name=''):
def read_json(in_file=None, name='', detect_types=False):
""" Read JSON data from file
:param in_file: Filename to read JSON from
:param name: Table name to assign
:return Table
:param in_file: Filename to read JSON from
:param name: Table name to assign
:param detect_types: Detect and correct types of data, default - False
:return Table
"""

_data = {}
Expand All @@ -132,7 +146,7 @@ def read_json(in_file=None, name=''):
_data = json.load(f)
except (IOError, OSError) as _err:
print(f'[email protected]_json(): Unable to load JSON structure: {_err}')
t = Table(data=_data, name=name)
t = Table(data=_data, name=name, detect_types=detect_types)
close_file(f)
return t

Expand Down Expand Up @@ -201,7 +215,7 @@ class Table:
"""

# -------------------------------------------------------------------------------------------- #
def __init__(self, data=None, name=None):
def __init__(self, data=None, name=None, detect_types=False):
self.timestamp = int(time.time())
self.name = name or str(self.timestamp)

Expand All @@ -212,13 +226,13 @@ def __init__(self, data=None, name=None):
self.cols = 0
elif isinstance(data, list) and len(data):
if isinstance(data[0], dict): # list(dicts())
self.import_list_dicts(data)
self.import_list_dicts(data, detect_types=detect_types)
if isinstance(data[0], list): # list(lists())
self.import_list_lists(data)
self.import_list_lists(data, detect_types=detect_types)
elif isinstance(data, dict) and len(data):
print(type(next(iter(data))))
if isinstance(data[next(iter(data))], list): # dict(lists()):
self.import_dict_lists(data)
self.import_dict_lists(data, detect_types=detect_types)
else:
raise ValueError('FATAL@Table.__init__: Unexpected data format')

Expand All @@ -244,14 +258,15 @@ def _redimension(self):
self.cols = self.rows and len(self.table[0]) or 0

# -- Import methods -------------------------------------------------------------------------- #
def import_list_dicts(self, data, name=None):
def import_list_dicts(self, data, name=None, detect_types=False):
"""
Import a list of dictionaries
:alias: import_thashes()
:param data: Data to import formatted as list of dictionaries
:param name: If not None, set it as the Table name
:return: self
:alias: import_thashes()
:param data: Data to import formatted as list of dictionaries
:param name: If not None, set it as the Table name
:param detect_types: Detect and correct types of data, default - False
:return: self
"""

# Set a new Table name if requested
Expand All @@ -262,7 +277,8 @@ def import_list_dicts(self, data, name=None):
self.header = [self.name + TNAME_COLUMN_DELIMITER + str(f)
if TNAME_COLUMN_DELIMITER not in str(f) else f for f in (data[0].keys())]

self.table = [list(r.values()) for r in data]
self.table = [list(r.values()) for r in data] if not detect_types else [[to_type(v) for v in r.values()] for r in data]

else:
raise ValueError('[email protected]_list_dicts: Unexpected data format')

Expand All @@ -272,7 +288,7 @@ def import_list_dicts(self, data, name=None):
return self

# -------------------------------------------------------------------------------------------- #
def import_dict_lists(self, data, name=None):
def import_dict_lists(self, data, name=None, detect_types=False):
"""
Import a dictionary of lists
"""
Expand All @@ -290,7 +306,7 @@ def import_dict_lists(self, data, name=None):

for c, f in enumerate(data.keys()):
for r, v in enumerate(data[f]):
self.table[r][c] = v
self.table[r][c] = v if not detect_types else to_type(v)
self._redimension()
else:
raise ValueError('[email protected]_dict_lists: Unexpected data format')
Expand All @@ -299,14 +315,15 @@ def import_dict_lists(self, data, name=None):
return self

# -------------------------------------------------------------------------------------------- #
def import_list_lists(self, data, header=True, name=None):
def import_list_lists(self, data, header=True, name=None, detect_types=False):
"""
Import list(list_1(), list_n()) with optional first row as the header
:param data: Data to import formatted as list of lists
:param header: If true, data to import HAS a header
:param name: If not None, set it as the Table name
:return: self
:param data: Data to import formatted as list of lists
:param header: If true, data to import HAS a header
:param name: If not None, set it as the Table name
:param detect_types: Detect and correct types of data, default - False
:return: self
"""

# Set a new Table name if requested
Expand All @@ -315,7 +332,11 @@ def import_list_lists(self, data, header=True, name=None):

if isinstance(data, list) and len(data) and isinstance(data[0], list):
# TODO: Check all rows to be equal length
self.table = data[1:] if header else data
if not detect_types:
self.table = data[1:] if header else data
else:
self.table = [[to_type(v) for v in r] for r in data[1:]]

self._redimension()

# If table header is not properly initiated, make each column: "name.column"
Expand Down Expand Up @@ -634,17 +655,18 @@ def select_lt(self, columns='*', where='', comp='==', val='', new_tname=''):
data=r_table + [[r[c] for c in r_columns] for r in self.table])

scol_idx = self.header.index(where)
_type = type(val)
return Table(name=new_tname if new_tname else
self.name + TNAME_TNAME_DELIMITER + str(self.timestamp),
data=r_table + [[r[c] for c in r_columns]
for r in self.table
if comp == '==' and r[scol_idx] == val or
comp == '!=' and r[scol_idx] != val or
comp == '>' and r[scol_idx] > val or
comp == '>=' and r[scol_idx] >= val or
comp == '<=' and r[scol_idx] <= val or
comp == 'in' and r[scol_idx] in val or
comp == 'not in' and r[scol_idx] not in val])
if comp == '==' and _type(r[scol_idx]) == val or
comp == '!=' and _type(r[scol_idx]) != val or
comp == '>' and _type(r[scol_idx]) > val or
comp == '>=' and _type(r[scol_idx]) >= val or
comp == '<=' and _type(r[scol_idx]) <= val or
comp == 'in' and _type(r[scol_idx]) in val or
comp == 'not in' and _type(r[scol_idx]) not in val])

# -------------------------------------------------------------------------------------------- #
def order_by(self, column='', direction=ORDER_BY_INC, new_tname=''):
Expand Down

0 comments on commit 8b189ad

Please sign in to comment.