1 Star 1 Fork 1

OSS-STUDIO / flink-cdc-connectors

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
data-pipeline.md 3.39 KB
一键复制 编辑 原始数据 按行查看 历史
title weight type aliases
Data Pipeline
1
docs
/core-concept/data-pipeline/

Definition

Since events in Flink CDC flow from the upstream to the downstream in a pipeline manner, the whole ETL task is referred as a Data Pipeline.

Parameters

A pipeline corresponds to a chain of operators in Flink.
To describe a Data Pipeline, the following parts are required:

  • [source]({{< ref "docs/core-concept/data-source" >}})
  • [sink]({{< ref "docs/core-concept/data-sink" >}})
  • pipeline

the following parts are optional:

  • [route]({{< ref "docs/core-concept/route" >}})
  • [transform]({{< ref "docs/core-concept/transform" >}})

Example

Only required

We could use following yaml file to define a concise Data Pipeline describing synchronize all tables under MySQL app_db database to Doris :

   source:
     type: mysql
     hostname: localhost
     port: 3306
     username: root
     password: 123456
     tables: app_db.\.*

   sink:
     type: doris
     fenodes: 127.0.0.1:8030
     username: root
     password: ""

   pipeline:
     name: Sync MySQL Database to Doris
     parallelism: 2

With optional

We could use following yaml file to define a complicated Data Pipeline describing synchronize all tables under MySQL app_db database to Doris and give specific target database name ods_db and specific target table name prefix ods_ :

   source:
     type: mysql
     hostname: localhost
     port: 3306
     username: root
     password: 123456
     tables: app_db.\.*

   sink:
     type: doris
     fenodes: 127.0.0.1:8030
     username: root
     password: ""
   route:
     - source-table: app_db.orders
       sink-table: ods_db.ods_orders
     - source-table: app_db.shipments
       sink-table: ods_db.ods_shipments
     - source-table: app_db.products
       sink-table: ods_db.ods_products  

   pipeline:
     name: Sync MySQL Database to Doris
     parallelism: 2

Pipeline Configurations

The following config options of Data Pipeline level are supported:

parameter meaning optional/required
name The name of the pipeline, which will be submitted to the Flink cluster as the job name. optional
parallelism The global parallelism of the pipeline. required
local-time-zone The local time zone defines current session time zone id. optional
1
https://gitee.com/OSS-STUDIO/flink-cdc-connectors.git
git@gitee.com:OSS-STUDIO/flink-cdc-connectors.git
OSS-STUDIO
flink-cdc-connectors
flink-cdc-connectors
master

搜索帮助