同步操作将从 Gitee 极速下载/openmetadata 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
Below are the instructions for connecting a Postgress server. The installation steps should be the same for connecting all kinds of servers. Different servers would require different configurations in the .yaml or DAG files. See https://docs.open-metadata.org/integrations/connectors for your configuration.
Note: This procedure does not support Windows, because Windows does not implement "signal.SIGALRM". It is highly recommended to use WSL 2 if you are on Windows.
See https://docs.open-metadata.org/overview/run-openmetadata-with-prefect "Requirements" section
Clone this git hub repo:
git clone https://github.com/open-metadata/OpenMetadata.git
Cd to ~/.../openmetadata/docker/metadata
Start the OpenMetadata containers. This will allow you run OpenMetadata in Docker:
docker compose up -d
docker compose ps
(optional but highly recommended): Before installing this package, it is recommended to create and activate a virtual environment. To do this, run:
python -m venv env
and source env/bin/activate
To install the OpenMetadata ingestion package:
pip install --upgrade "openmetadata-ingestion[docker]==0.10.3"
(specify the release version to ensure compatibility)
pip3 install "openmetadata-ingestion[airflow-container]"==0.10.3
pip3 install "openmetadata-ingestion[postgres]"==0.10.3
pip3 install "openmetadata-airflow-managed-apis"==0.10.3
cp -r {AIRFLOW_HOME}/plugins/dag_templates {AIRFLOW_HOME}
mkdir -p {AIRFLOW_HOME}/dag_generated_configs
import pathlib
import json
from datetime import timedelta
from airflow import DAG
try:
from airflow.operators.python import PythonOperator
except ModuleNotFoundError:
from airflow.operators.python_operator import PythonOperator
from metadata.config.common import load_config_file
from metadata.ingestion.api.workflow import Workflow
from airflow.utils.dates import days_ago
default_args = {
"owner": "user_name",
"email": ["username@org.com"],
"email_on_failure": False,
"retries": 3,
"retry_delay": timedelta(minutes=5),
"execution_timeout": timedelta(minutes=60)
}
config = """
{
"source":{
"type": "postgres",
"serviceName": "postgres_demo",
"serviceConnection": {
"config": {
"type": "Postgres",
"username": "postgres", (change to your username)
"password": "postgres", (change to your password)
"hostPort": "192.168.1.55:5432", (change to your hostPort)
"database": "surveillance_hub" (change to your database)
}
},
"sourceConfig":{
"config": { (all of the following can switch to true or false)
"enableDataProfiler": "true" or "false",
"markDeletedTables": "true" or "false",
"includeTables": "true" or "false",
"includeViews": "true" or "false",
"generateSampleData": "true" or "false"
}
}
},
"sink":{
"type": "metadata-rest",
"config": {}
},
"workflowConfig": {
"openMetadataServerConfig": {
"hostPort": "http://localhost:8585/api",
"authProvider": "no-auth"
}
}
}
"""
def metadata_ingestion_workflow():
workflow_config = json.loads(config)
workflow = Workflow.create(workflow_config)
workflow.execute()
workflow.raise_from_status()
workflow.print_status()
workflow.stop()
with DAG(
"sample_data",
default_args=default_args,
description="An example DAG which runs a OpenMetadata ingestion workflow",
start_date=days_ago(1),
is_paused_upon_creation=False,
schedule_interval='*/5 * * * *',
catchup=False,
) as dag:
ingest_task = PythonOperator(
task_id="ingest_using_recipe",
python_callable=metadata_ingestion_workflow,
)
if __name__ == "__main__":
metadata_ingestion_workflow()
python postgres_demo.py
metadata ingest -c /Your_Path_To_Json/.json
The json configuration is exactly the same as the json configuration in the DAG.metadata ingest -c /Your_Path_To_Yaml/.yaml
The yaml configuration would be the exact same except without the curly brackets and the double quotes.Example yaml I was using:
source:
type: postgres
serviceName: your_service_name
serviceConnection:
config:
type: Postgres
username: your_username
password: your_password
hostPort:
database: your_database
sourceConfig:
config:
type: Profiler
processor:
type: orm-profiler
config:
test_suite:
name: demo_test
tests:
- table: your_table_name (FQN)
column_tests:
- columnName: id
testCase:
columnTestType: columnValuesToBeBetween
config:
minValue: 0
maxValue: 10
sink:
type: metadata-rest
config: {}
workflowConfig:
openMetadataServerConfig:
hostPort: http://localhost:8585/api
authProvider: no-auth
Note that the table name must be FQN and match exactly with the table path on the OpenMetadata UI.
metadata profile -c /path_to_yaml/.yaml
Make sure to refresh the OpenMetadata UI and click on the Data Quality tab to see the results.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。