helpers
⚓︎
Create helper function for schema generation and others.
Attributes⚓︎
Classes⚓︎
Functions⚓︎
init_opt_batch_size(file_size)
⚓︎
Init optimal batch size
Source code in src/db2ixf/helpers.py
get_pyarrow_schema(cols)
⚓︎
Creates a pyarrow schema of the columns extracted from IXF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cols |
List[OrderedDict]
|
List of column descriptors extracted from IXF file. |
required |
Returns:
Type | Description |
---|---|
Schema
|
Pyarrow Schema extracted from columns description. |
Source code in src/db2ixf/helpers.py
Python | |
---|---|
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
get_ccsid_from_column(column)
⚓︎
Get the coded character set identifiers for single and double bytes data type. Which means the code page for singular/double byte data type.
Source code in src/db2ixf/helpers.py
deltalake_fix_ns_timestamps(pyarrow_schema)
⚓︎
Fix issue with timestamps in deltalake.
Deltalake has issue with timestamps in nanoseconds and it does not yet support it, so this function changes the pyarrow timestamp datatype in nanoseconds to microseconds. pyarrow timestamp datatype in microseconds is supported.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pyarrow_schema |
Schema
|
Pyarrow schema |
required |
Returns:
Name | Type | Description |
---|---|---|
Schema |
Schema
|
Pyarrow schema with fix |
Source code in src/db2ixf/helpers.py
deltalake_fix_time(pyarrow_schema)
⚓︎
Fix issue with time in deltalake.
Deltalake does not support time datatype so we will try to use string to temporary fix the issue. Pyarrow schema has time64 and time32 datatypes but it is complicated for now to cast them to timestamp because the later is supported by deltalake. For this later reason, this function will use pyarrow string datatype to replace pyarrow time datatypes until casting pyarrow time datatype as a datetime is possible in deltalake or support of pyarrow time datatype in deltalake is added.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pyarrow_schema |
Schema
|
Pyarrow schema |
required |
Returns:
Type | Description |
---|---|
Schema
|
Pyarrow schema with the fix. |
Source code in src/db2ixf/helpers.py
apply_schema_fixes(pyarrow_schema)
⚓︎
Apply all fixes on pyarrow schema to adapt to deltalake.
Fixes issues in deltalake support for nanoseconds unit for time and timestamp datatype.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pyarrow_schema |
Schema
|
Pyarrow schema |
required |
Returns:
Name | Type | Description |
---|---|---|
Schema |
Schema
|
Pyarrow schema with all fixes |
Source code in src/db2ixf/helpers.py
to_pyarrow_record_batch(batch, pyarrow_schema)
⚓︎
Transforms to pyarrow record batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
DefaultOrderedDict
|
Dictionary of type Dict[str, list] |
required |
pyarrow_schema |
Schema
|
Pyarrow schema |
required |
Returns:
Type | Description |
---|---|
RecordBatch
|
Pyarrow record batch |
Source code in src/db2ixf/helpers.py
decode_cell(cell, cp, cpt='s')
⚓︎
Try to decode the cell using the provided codepage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cell |
str
|
Field containing data |
required |
cp |
int
|
IBM code page |
required |
cpt |
Literal['s', 'd']
|
Defaults to |
's'
|
Returns:
Name | Type | Description |
---|---|---|
str |
Decoded cell |