同步操作将从 Apache SeaTunnel/SeaTunnel 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Apache Iceberg source connector
Spark
Flink
SeaTunnel Zeta
Source connector for Apache Iceberg. It can support batch and stream mode.
Datasource | Dependent | Maven |
---|---|---|
Iceberg | flink-shaded-hadoop | Download |
Iceberg | hive-exec | Download |
Iceberg | libfb303 | Download |
In order to be compatible with different versions of Hadoop and Hive, the scope of hive-exec and flink-shaded-hadoop-2 in the project pom file are provided, so if you use the Flink engine, first you may need to add the following Jar packages to <FLINK_HOME>/lib directory, if you are using the Spark engine and integrated with Hadoop, then you do not need to add the following Jar packages.
flink-shaded-hadoop-x-xxx.jar
hive-exec-xxx.jar
libfb303-xxx.jar
Some versions of the hive-exec package do not have libfb303-xxx.jar, so you also need to manually import the Jar package.
Iceberg Data type | SeaTunnel Data type |
---|---|
BOOLEAN | BOOLEAN |
INTEGER | INT |
LONG | BIGINT |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
DATE | DATE |
TIME | TIME |
TIMESTAMP | TIMESTAMP |
STRING | STRING |
FIXED BINARY |
BYTES |
DECIMAL | DECIMAL |
STRUCT | ROW |
LIST | ARRAY |
MAP | MAP |
Name | Type | Required | Default | Description |
---|---|---|---|---|
catalog_name | string | yes | - | User-specified catalog name. |
catalog_type | string | yes | - | The optional values are: hive(The hive metastore catalog),hadoop(The hadoop catalog) |
uri | string | no | - | The Hive metastore’s thrift URI. |
warehouse | string | yes | - | The location to store metadata files and data files. |
namespace | string | yes | - | The iceberg database name in the backend catalog. |
table | string | yes | - | The iceberg table name in the backend catalog. |
schema | config | no | - | Use projection to select data columns and columns order. |
case_sensitive | boolean | no | false | If data columns where selected via schema [config], controls whether the match to the schema will be done with case sensitivity. |
start_snapshot_timestamp | long | no | - | Instructs this scan to look for changes starting from the most recent snapshot for the table as of the timestamp. timestamp – the timestamp in millis since the Unix epoch |
start_snapshot_id | long | no | - | Instructs this scan to look for changes starting from a particular snapshot (exclusive). |
end_snapshot_id | long | no | - | Instructs this scan to look for changes up to a particular snapshot (inclusive). |
use_snapshot_id | long | no | - | Instructs this scan to look for use the given snapshot ID. |
use_snapshot_timestamp | long | no | - | Instructs this scan to look for use the most recent snapshot as of the given time in milliseconds. timestamp – the timestamp in millis since the Unix epoch |
stream_scan_strategy | enum | no | FROM_LATEST_SNAPSHOT | Starting strategy for stream mode execution, Default to use FROM_LATEST_SNAPSHOT if don’t specify any value,The optional values are:TABLE_SCAN_THEN_INCREMENTAL: Do a regular table scan then switch to the incremental mode. FROM_LATEST_SNAPSHOT: Start incremental mode from the latest snapshot inclusive. FROM_EARLIEST_SNAPSHOT: Start incremental mode from the earliest snapshot inclusive. FROM_SNAPSHOT_ID: Start incremental mode from a snapshot with a specific id inclusive. FROM_SNAPSHOT_TIMESTAMP: Start incremental mode from a snapshot with a specific timestamp inclusive. |
common-options | no | - | Source plugin common parameters, please refer to Source Common Options for details. |
env {
execution.parallelism = 2
job.mode = "BATCH"
}
source {
Iceberg {
schema {
fields {
f2 = "boolean"
f1 = "bigint"
f3 = "int"
f4 = "bigint"
f5 = "float"
f6 = "double"
f7 = "date"
f9 = "timestamp"
f10 = "timestamp"
f11 = "string"
f12 = "bytes"
f13 = "bytes"
f14 = "decimal(19,9)"
f15 = "array<int>"
f16 = "map<string, int>"
}
}
catalog_name = "seatunnel"
catalog_type = "hadoop"
warehouse = "file:///tmp/seatunnel/iceberg/hadoop/"
namespace = "database1"
table = "source"
result_table_name = "iceberg"
}
}
transform {
}
sink {
Console {
source_table_name = "iceberg"
}
}
source {
Iceberg {
catalog_name = "seatunnel"
catalog_type = "hive"
uri = "thrift://localhost:9083"
warehouse = "hdfs://your_cluster//tmp/seatunnel/iceberg/"
namespace = "your_iceberg_database"
table = "your_iceberg_table"
}
}
source {
Iceberg {
catalog_name = "seatunnel"
catalog_type = "hadoop"
warehouse = "hdfs://your_cluster/tmp/seatunnel/iceberg/"
namespace = "your_iceberg_database"
table = "your_iceberg_table"
schema {
fields {
f2 = "boolean"
f1 = "bigint"
f3 = "int"
f4 = "bigint"
}
}
}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。