Getting Started
To begin using the IXF Parser package, follow the installation instructions below.
Installation⚓︎
Ensure that you have set up and activated a Python virtual environment. Then, use the following command to install the package:
Bash | |
---|---|
Examples⚓︎
Below are examples demonstrating how to use the IXF Parser package:
Parsing an IXF File⚓︎
You can parse an IXF file by providing a file-like object or a path to the file. Here’s an example using a file-like object:
In this example, the IXFParser
is initialized with a file-like object f
, and
the get_row
method is used to retrieve the parsed rows as a list of
dictionaries. get_row
is a python generator, it helps when you deal with big
files. get_all_rows
will load all rows into memory as a python list so use it
in case you have small files.
Converting to JSON⚓︎
You can convert the parsed data to JSON format and save it to a file. Here’s an example:
In this example, the parsed data is converted to JSON format using the
to_json
method and saved to the specified output file.
Converting to JSONLINE⚓︎
You can convert the parsed data to JSONLINE format and save it to a file. Here’s an example:
In this example, the parsed data is converted to JSONLINE format using the
to_jsonline
method and saved to the specified output file.
Converting to CSV⚓︎
You can also convert the parsed data to CSV format and save it to a file. Here’s an example:
In this example, the parsed data is converted to CSV format using the to_csv
method and saved to the specified output file. The sep
parameter specifies the
separator/delimiter to be used in the CSV file.
Converting to Parquet⚓︎
If you prefer to store the parsed data in Parquet format, you can use the following example:
Python | |
---|---|
In this example, the parsed data is converted to Parquet format using
the to_parquet
method and saved to the specified output file.
Converting to Deltalake⚓︎
If you prefer to store the parsed data in Deltalake format, you can use the following example:
Python | |
---|---|
In this example, the parsed data is converted to Deltalake format using
the to_deltalake
method and saved to the specified output path.
You can also use a string but Path is better in case you work on a local
filesystem. When we use a string, it is often for a remote storage and in this
case you can either use filesystem argument or let deltalake
package infer it
from the uri.
The IXF Parser package provides flexibility in terms of input and output options, allowing you to easily parse and process IXF files according to your needs.
Precautions⚓︎
There are cases where the parsing can fail and sometimes can lead to data loss:
- Completely corrupted ixf file: It is usually an extraction issue.
- Partially corrupted ixf file, it contains some corrupted Rows/Lines that the
parser can not parse.
- Parser calculates rate of corrupted rows then compares it to an accepted
rate of corrupted rows which you can set by this environment variable
DB2IXF_ACCEPTED_CORRUPTION_RATE
(int = 1)%. - If the rate of corrupted rows is bigger than the accepted rate the parser raises an exception.
- Parser calculates rate of corrupted rows then compares it to an accepted
rate of corrupted rows which you can set by this environment variable
- Unsupported data type : please contact the owners/maintainers/contributors so you can get help otherwise any PR is welcomed.
4. case: encoding issues
Parsing can lead to data loss in case the found or the detected encoding is not able to decode some extracted fields/columns.
Parser tries to decode using:
Text Only | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
Before use the package in production, try to test in debug mode so you can detect data loss.
CLI⚓︎
Start with this:
Bash Command | |
---|---|
The db2ixf
command-line tool (CLI) is used for parsing and converting IXF (IBM
DB2 Import/Export Format) files to various formats such as JSON, JSONLINE, CSV
and Parquet. It provides an easy way to parse and convert IXF files to meet your
data processing needs.
Options:
--version
or-v
: Show the version of the CLI.--install-completion
: Install completion for the current shell.--show-completion
: Show completion for the current shell, to copy it or customize the installation.--help
: Show this message and exit.
Commands:
csv
: Parse the specifiedixf
FILE and convert it to a CSV OUTPUT.json
: Parse the specifiedixf
FILE and convert it to a JSON OUTPUT.jsonline
: Parse the specifiedixf
FILE and convert it to a JSONLINE OUTPUT.parquet
: Parse the specifiedixf
FILE and convert it to a Parquet OUTPUT.
This CLI tool is made with love ! ❤️
Examples⚓︎
There are 4 commands and each one is related to an output format. db2ixf
supports only json
, jsonline
, csv
and parquet
.
Note
In the example above, the output file will be created in directory where you launch the command. The name of output file will be the same as the ixf file.
These are complete examples for all the commands:
Bash | |
---|---|
Bash | |
---|---|
Bash | |
---|---|
Bash | |
---|---|
Tip
Before using one of the examples, please, try db2ixf <command> --help
to
get details on how to use the command.
Info
CLI does not support the deltalake format. In case, you need support please create an issue in Github.