Import methods use detect_types to detect if autoconversion from `s…

…tr` to `int`, `float` and `bool` is needed
mezantrop · Jul 2, 2024 · 8b189ad · 8b189ad
1 parent 4a13dc6
commit 8b189ad
Show file tree

Hide file tree

Showing 4 changed files with 88 additions and 51 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,8 @@
 # CHANGELOG
 
+* **2024.07.02    tSQLike-1.1.0**
+  * Import methods use `detect_types` to detect if autoconversion from `str` to `int`, `float` and `bool` is needed
+
 * **2024.06.28    tSQLike-1.0.4**
   * `select_lt()` respects empty arguments
 

diff --git a/README.md b/README.md
@@ -1,6 +1,8 @@
 # tSQLike
+
 [![Python package](https://github.com/mezantrop/tSQLike/actions/workflows/python-package.yml/badge.svg)](https://github.com/mezantrop/tSQLike/actions/workflows/python-package.yml)
 [![CodeQL](https://github.com/mezantrop/tSQLike/actions/workflows/codeql.yml/badge.svg)](https://github.com/mezantrop/tSQLike/actions/workflows/codeql.yml)
+
 ## SQL-like interface to tabular structured data
 
 **Not that early stage, but still in development: may contain bugs**
@@ -9,7 +11,7 @@
 
 ## Description
 
-**tSQLike** is a Python3 module that is written with a hope to make tabular data process easier using SQL-like primitives. 
+**tSQLike** is a Python3 module that is written with a hope to make tabular data process easier using SQL-like primitives.
 
 ## Usage
 
@@ -34,16 +36,18 @@ t3.write_csv(dialect='unix')
 
 ## Installation
 
-```
+```sh
 pip install tsqlike
 ```
 
 ## Functionality
 
 ### Table class
-The main class of the module 
+
+The main class of the module
 
 #### Data processing methods
+
 | Name        | Status  | Description                                                              |
 |-------------|---------|--------------------------------------------------------------------------|
 | `join`      | &#9745; | Join two Tables (`self` and `table`) on an expression [*](#Warning)      |
@@ -54,20 +58,23 @@ The main class of the module
 | `group_by`  | &#9745; | GROUP BY primitive of SQL SELECT to apply aggregate function on a column |
 
 #### Import methods
+
 | Name                | Status  | Description                                                             |
 |---------------------|---------|-------------------------------------------------------------------------|
-| `import_dict_lists` | &#9745; | Import a dictionary of lists into Table object                          | 
+| `import_dict_lists` | &#9745; | Import a dictionary of lists into Table object                          |
 | `import_list_dicts` | &#9745; | Import a list of horizontal arranged dictionaries into the `Table`      |
 | `import_list_lists` | &#9745; | Import `list(list_1(), list_n())` with optional first row as the header |
 
 #### Export methods
+
 | Name                | Status  | Description                                                             |
 |---------------------|---------|-------------------------------------------------------------------------|
 | `export_dict_lists` | &#9745; | Export a dictionary of lists                                            |
 | `export_list_dicts` | &#9745; | Export list of dictionaries                                             |
 | `export_list_lists` | &#9745; | Export `list(list_1(), list_n())` with optional first row as the header |
 
 #### Write methods
+
 | Name            | Status  | Description                                                         |
 |-----------------|---------|---------------------------------------------------------------------|
 | `write_csv`     | &#9745; | Make `CSV` from the `Table` object and write it to a file or stdout |
@@ -76,12 +83,13 @@ The main class of the module
 | `write_xml`     | &#9744; | Write `XML`. NB: Do we need this?                                   |
 
 #### Private methods
+
 | Name           | Status  | Description                               |
 |----------------|---------|-------------------------------------------|
 | `_redimension` | &#9745; | Recalculate dimensions of the Table.table |
 
-
 ### EvalCtrl class
+
 Controls what arguments are available to `eval()` function
 
 | Name               | Status  | Description                                              |
@@ -91,23 +99,27 @@ Controls what arguments are available to `eval()` function
 | `blacklist_remove` | &#9745; | Remove the word from the blacklist                       |
 
 ### Standalone functions
-| Name         | Status  | Description                               |
-|--------------|---------|-------------------------------------------|
-| `open_file`  | &#9745; | Open a file                               |
-| `close_file` | &#9745; | Close a file                              |
-| `read_json`  | &#9745; | Read `JSON` file                          |
-| `read_csv`   | &#9745; | Read `CSV` file                           |
-| `read_xml`   | &#9744; | Read `XML`. NB: Do we need XML support?   |
-
-#### WARNING!
-Methods `Table.join(on=)`, `Table.select(where=)` and `Table.write_json(export_f=)`, use `eval()` function 
-to run specified expressions within the program. **ANY** expression, including one that is potentially **DANGEROUS** 
+
+| Name         | Status  | Description                                              |
+|--------------|---------|----------------------------------------------------------|
+| `open_file`  | &#9745; | Open a file                                              |
+| `close_file` | &#9745; | Close a file                                             |
+| `read_json`  | &#9745; | Read `JSON` file                                         |
+| `read_csv`   | &#9745; | Read `CSV` file                                          |
+| `read_xml`   | &#9744; | Read `XML`. NB: Do we need XML support?                  |
+| `to_type`    | &#9745; | Convert a string to a proper type: int, float or boolean |
+
+#### WARNING
+
+Methods `Table.join(on=)`, `Table.select(where=)` and `Table.write_json(export_f=)`, use `eval()` function
+to run specified expressions within the program. **ANY** expression, including one that is potentially **DANGEROUS**
 from security point of view, can be passed as the values of the above arguments. It is your duty to ensure correctness
-and safety of these arguments and `EvalCtrl` helps to block potentially dangerous function/method names. 
+and safety of these arguments and `EvalCtrl` helps to block potentially dangerous function/method names.
 
 Alternatively you can use `Table.join_lt()`, `Table.select_lt()` and `Table.write_json()`. They are significantly less
 powerful, but do not use `eval()`.
 
 ## Contacts
-If you have an idea, a question, or have found a problem, do not hesitate to open an issue or mail me directly: 
+
+If you have an idea, a question, or have found a problem, do not hesitate to open an issue or mail me directly:
 Mikhail Zakharov <[email protected]>
diff --git a/tsqlike/__about__.py b/tsqlike/__about__.py
@@ -1,3 +1,3 @@
 """ Version number in a single place """
 
-__version__ = "1.0.4"
+__version__ = "1.1.0"
diff --git a/tsqlike/tsqlike.py b/tsqlike/tsqlike.py
@@ -92,16 +92,29 @@ def close_file(file):
     if file and file is not sys.stdout and file is not sys.stdin:
         file.close()
 
+# ------------------------------------------------------------------------------------------------ #
+def to_type(s):
+    """ Convert string s to a proper type: int, float or boolean """
+
+    if s in ('True', 'true', 'False', 'false'):                 # to boolean
+        return bool(s)
+
+    try:
+        return float(s) if '.' in s or ',' in s else int(s)     # to float and int
+    except (ValueError, TypeError):
+        return s                                                # no conversion possible -> string
 
 # ------------------------------------------------------------------------------------------------ #
-def read_csv(in_file=None, encoding=None, newline='', name='', dialect='excel', **fmtparams):
+def read_csv(in_file=None, encoding=None, newline='', name='', detect_types=False,
+             dialect='excel', **fmtparams):
     """
     Read CSV from a file and import into a Table object
 
     :param in_file:         Filename to read CSV from
     :param encoding:        Character encoding
     :param newline:         UNIX/Windows/Mac style line ending
     :param name:            Table name to assign
+    :param detect_types:    Detect and correct types of data, default - False
     :param dialect:         CSV dialect, e.g: excel, unix
     :**fmtparams:           Various optional CSV parameters:
         :param delimiter:   CSV field delimiter
@@ -112,18 +125,19 @@ def read_csv(in_file=None, encoding=None, newline='', name='', dialect='excel',
 
     f = open_file(in_file, file_mode='r', encoding=encoding, newline=newline)
     _data = csv.reader(f, dialect=dialect, **fmtparams)
-    t = Table(data=list(_data), name=name)
+    t = Table(data=list(_data), name=name, detect_types=detect_types)
     close_file(f)
     return t
 
 
 # -------------------------------------------------------------------------------------------- #
-def read_json(in_file=None, name=''):
+def read_json(in_file=None, name='', detect_types=False):
     """ Read JSON data from file
 
-    :param in_file: Filename to read JSON from
-    :param name:    Table name to assign
-    :return         Table
+    :param in_file:         Filename to read JSON from
+    :param name:            Table name to assign
+    :param detect_types:    Detect and correct types of data, default - False
+    :return                 Table
     """
 
     _data = {}
@@ -132,7 +146,7 @@ def read_json(in_file=None, name=''):
         _data = json.load(f)
     except (IOError, OSError) as _err:
         print(f'[email protected]_json(): Unable to load JSON structure: {_err}')
-    t = Table(data=_data, name=name)
+    t = Table(data=_data, name=name, detect_types=detect_types)
     close_file(f)
     return t
 
@@ -201,7 +215,7 @@ class Table:
     """
 
     # -------------------------------------------------------------------------------------------- #
-    def __init__(self, data=None, name=None):
+    def __init__(self, data=None, name=None, detect_types=False):
         self.timestamp = int(time.time())
         self.name = name or str(self.timestamp)
 
@@ -212,13 +226,13 @@ def __init__(self, data=None, name=None):
             self.cols = 0
         elif isinstance(data, list) and len(data):
             if isinstance(data[0], dict):                   # list(dicts())
-                self.import_list_dicts(data)
+                self.import_list_dicts(data, detect_types=detect_types)
             if isinstance(data[0], list):                   # list(lists())
-                self.import_list_lists(data)
+                self.import_list_lists(data, detect_types=detect_types)
         elif isinstance(data, dict) and len(data):
             print(type(next(iter(data))))
             if isinstance(data[next(iter(data))], list):    # dict(lists()):
-                self.import_dict_lists(data)
+                self.import_dict_lists(data, detect_types=detect_types)
         else:
             raise ValueError('FATAL@Table.__init__: Unexpected data format')
 
@@ -244,14 +258,15 @@ def _redimension(self):
         self.cols = self.rows and len(self.table[0]) or 0
 
     # -- Import methods -------------------------------------------------------------------------- #
-    def import_list_dicts(self, data, name=None):
+    def import_list_dicts(self, data, name=None, detect_types=False):
         """
         Import a list of dictionaries
 
-        :alias:         import_thashes()
-        :param data:    Data to import formatted as list of dictionaries
-        :param name:    If not None, set it as the Table name
-        :return:        self
+        :alias:                 import_thashes()
+        :param data:            Data to import formatted as list of dictionaries
+        :param name:            If not None, set it as the Table name
+        :param detect_types:    Detect and correct types of data, default - False
+        :return:                self
         """
 
         # Set a new Table name if requested
@@ -262,7 +277,8 @@ def import_list_dicts(self, data, name=None):
             self.header = [self.name + TNAME_COLUMN_DELIMITER + str(f)
                            if TNAME_COLUMN_DELIMITER not in str(f) else f for f in (data[0].keys())]
 
-            self.table = [list(r.values()) for r in data]
+            self.table = [list(r.values()) for r in data] if not detect_types else [[to_type(v) for v in r.values()] for r in data]
+
         else:
             raise ValueError('[email protected]_list_dicts: Unexpected data format')
 
@@ -272,7 +288,7 @@ def import_list_dicts(self, data, name=None):
         return self
 
     # -------------------------------------------------------------------------------------------- #
-    def import_dict_lists(self, data, name=None):
+    def import_dict_lists(self, data, name=None, detect_types=False):
         """
         Import a dictionary of lists
         """
@@ -290,7 +306,7 @@ def import_dict_lists(self, data, name=None):
 
             for c, f in enumerate(data.keys()):
                 for r, v in enumerate(data[f]):
-                    self.table[r][c] = v
+                    self.table[r][c] = v if not detect_types else to_type(v)
             self._redimension()
         else:
             raise ValueError('[email protected]_dict_lists: Unexpected data format')
@@ -299,14 +315,15 @@ def import_dict_lists(self, data, name=None):
         return self
 
     # -------------------------------------------------------------------------------------------- #
-    def import_list_lists(self, data, header=True, name=None):
+    def import_list_lists(self, data, header=True, name=None, detect_types=False):
         """
         Import list(list_1(), list_n()) with optional first row as the header
 
-        :param data:    Data to import formatted as list of lists
-        :param header:  If true, data to import HAS a header
-        :param name:    If not None, set it as the Table name
-        :return:        self
+        :param data:            Data to import formatted as list of lists
+        :param header:          If true, data to import HAS a header
+        :param name:            If not None, set it as the Table name
+        :param detect_types:    Detect and correct types of data, default - False
+        :return:                self
         """
 
         # Set a new Table name if requested
@@ -315,7 +332,11 @@ def import_list_lists(self, data, header=True, name=None):
 
         if isinstance(data, list) and len(data) and isinstance(data[0], list):
             # TODO: Check all rows to be equal length
-            self.table = data[1:] if header else data
+            if not detect_types:
+                self.table = data[1:] if header else data
+            else:
+                self.table = [[to_type(v) for v in r] for r in data[1:]]
+
             self._redimension()
 
             # If table header is not properly initiated, make each column: "name.column"
@@ -634,17 +655,18 @@ def select_lt(self, columns='*', where='', comp='==', val='', new_tname=''):
                          data=r_table + [[r[c] for c in r_columns] for r in self.table])
 
         scol_idx = self.header.index(where)
+        _type = type(val)
         return Table(name=new_tname if new_tname else
                      self.name + TNAME_TNAME_DELIMITER + str(self.timestamp),
                      data=r_table + [[r[c] for c in r_columns]
                                      for r in self.table
-                                     if comp == '==' and r[scol_idx] == val or
-                                     comp == '!=' and r[scol_idx] != val or
-                                     comp == '>' and r[scol_idx] > val or
-                                     comp == '>=' and r[scol_idx] >= val or
-                                     comp == '<=' and r[scol_idx] <= val or
-                                     comp == 'in' and r[scol_idx] in val or
-                                     comp == 'not in' and r[scol_idx] not in val])
+                                     if comp == '==' and _type(r[scol_idx]) == val or
+                                     comp == '!=' and _type(r[scol_idx]) != val or
+                                     comp == '>' and _type(r[scol_idx]) > val or
+                                     comp == '>=' and _type(r[scol_idx]) >= val or
+                                     comp == '<=' and _type(r[scol_idx]) <= val or
+                                     comp == 'in' and _type(r[scol_idx]) in val or
+                                     comp == 'not in' and _type(r[scol_idx]) not in val])
 
     # -------------------------------------------------------------------------------------------- #
     def order_by(self, column='', direction=ORDER_BY_INC, new_tname=''):