同步操作将从 Apache SeaTunnel/SeaTunnel 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Clickhouse sink connector
Spark
Flink
SeaTunnel Zeta
The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.
Used to write data to Clickhouse.
In order to use the Clickhouse connector, the following dependencies are required. They can be downloaded via install-plugin.sh or from the Maven central repository.
Datasource | Supported Versions | Dependency |
---|---|---|
Clickhouse | universal | Download |
SeaTunnel Data type | Clickhouse Data type |
---|---|
STRING | String / Int128 / UInt128 / Int256 / UInt256 / Point / Ring / Polygon MultiPolygon |
INT | Int8 / UInt8 / Int16 / UInt16 / Int32 |
BIGINT | UInt64 / Int64 / IntervalYear / IntervalQuarter / IntervalMonth / IntervalWeek / IntervalDay / IntervalHour / IntervalMinute / IntervalSecond |
DOUBLE | Float64 |
DECIMAL | Decimal |
FLOAT | Float32 |
DATE | Date |
TIME | DateTime |
ARRAY | Array |
MAP | Map |
Name | Type | Required | Default | Description |
---|---|---|---|---|
host | String | Yes | - |
ClickHouse cluster address, the format is host:port , allowing multiple hosts to be specified. Such as "host1:8123,host2:8123" . |
database | String | Yes | - | The ClickHouse database. |
table | String | Yes | - | The table name. |
username | String | Yes | - |
ClickHouse user username. |
password | String | Yes | - |
ClickHouse user password. |
clickhouse.config | Map | No | In addition to the above mandatory parameters that must be specified by clickhouse-jdbc , users can also specify multiple optional parameters, which cover all the parameters provided by clickhouse-jdbc . |
|
bulk_size | String | No | 20000 | The number of rows written through Clickhouse-jdbc each time, the default is 20000 . |
split_mode | String | No | false | This mode only support clickhouse table which engine is 'Distributed'.And internal_replication option-should be true .They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will counted. |
sharding_key | String | No | - | When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true. |
primary_key | String | No | - | Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table. |
support_upsert | Boolean | No | false | Support upsert row by query primary key. |
allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on *MergeTree table engine. |
common-options | No | - | Sink plugin common parameters, please refer to Sink Common Options for details. |
The following example demonstrates how to create a data synchronization job that writes randomly generated data to a Clickhouse database:
# Set the basic configuration of the task to be performed
env {
execution.parallelism = 1
job.mode = "BATCH"
checkpoint.interval = 1000
}
source {
FakeSource {
row.num = 2
bigint.min = 0
bigint.max = 10000000
split.num = 1
split.read-interval = 300
schema {
fields {
c_bigint = bigint
}
}
}
}
sink {
Clickhouse {
host = "127.0.0.1:9092"
database = "default"
table = "test"
username = "xxxxx"
password = "xxxxx"
}
}
1.SeaTunnel Deployment Document.
2.The table to be written to needs to be created in advance before synchronization.
3.When sink is writing to the ClickHouse table, you don't need to set its schema because the connector will query ClickHouse for the current table's schema information before writing.
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "xxxxx"
password = "xxxxx"
clickhouse.config = {
max_rows_to_read = "100"
read_overflow_mode = "throw"
}
}
}
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "xxxxx"
password = "xxxxx"
# split mode options
split_mode = true
sharding_key = "age"
}
}
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "xxxxx"
password = "xxxxx"
# cdc options
primary_key = "id"
support_upsert = true
}
}
sink {
Clickhouse {
host = "localhost:8123"
database = "default"
table = "fake_all"
username = "xxxxx"
password = "xxxxx"
# cdc options
primary_key = "id"
support_upsert = true
allow_experimental_lightweight_delete = true
}
}
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。