1 Star 0 Fork 0

yutiansut / arrow-datafusion

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
DEVELOPERS.md 5.36 KB
一键复制 编辑 原始数据 按行查看 历史
Jiayu Liu 提交于 2021-06-03 20:33 . use prettier check in CI (#453)

Developer's guide

This section describes how you can get started at developing DataFusion.

For information on developing with Ballista, see the Ballista developer documentation.

Bootstrap environment

DataFusion is written in Rust and it uses a standard rust toolkit:

  • cargo build
  • cargo fmt to format the code
  • cargo test to test
  • etc.

How to add a new scalar function

Below is a checklist of what you need to do to add a new scalar function to DataFusion:

  • Add the actual implementation of the function:
    • here for string functions
    • here for math functions
    • here for datetime functions
    • create a new module here for other functions
  • In src/physical_plan/functions, add:
    • a new variant to BuiltinScalarFunction
    • a new entry to FromStr with the name of the function as called by SQL
    • a new line in return_type with the expected return type of the function, given an incoming type
    • a new line in signature with the signature of the function (number and types of its arguments)
    • a new line in create_physical_expr mapping the built-in to the implementation
    • tests to the function.
  • In tests/sql.rs, add a new test where the function is called through SQL against well known data and returns the expected result.
  • In src/logical_plan/expr, add:
    • a new entry of the unary_scalar_expr! macro for the new function.
  • In src/logical_plan/mod, add:
    • a new entry in the pub use expr::{} set.

How to add a new aggregate function

Below is a checklist of what you need to do to add a new aggregate function to DataFusion:

  • Add the actual implementation of an Accumulator and AggregateExpr:
    • here for string functions
    • here for math functions
    • here for datetime functions
    • create a new module here for other functions
  • In src/physical_plan/aggregates, add:
    • a new variant to BuiltinAggregateFunction
    • a new entry to FromStr with the name of the function as called by SQL
    • a new line in return_type with the expected return type of the function, given an incoming type
    • a new line in signature with the signature of the function (number and types of its arguments)
    • a new line in create_aggregate_expr mapping the built-in to the implementation
    • tests to the function.
  • In tests/sql.rs, add a new test where the function is called through SQL against well known data and returns the expected result.

How to display plans graphically

The query plans represented by LogicalPlan nodes can be graphically rendered using Graphviz.

To do so, save the output of the display_graphviz function to a file.:

// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());

Then, use the dot command line tool to render it into a file that can be displayed. For example, the following command creates a /tmp/plan.pdf file:

dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf

Specification

We formalize Datafusion semantics and behaviors through specification documents. These specifications are useful to be used as references to help resolve ambiguities during development or code reviews.

You are also welcome to propose changes to existing specifications or create new specifications as you see fit.

Here is the list current active specifications:

How to format .md document

We are using prettier to format .md files.

You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).

$ prettier --version
2.3.0

After you've confirmed your prettier version, you can format all the .md files:

prettier -w {ballista,datafusion,datafusion-examples,dev,docs,python}/**/*.md
1
https://gitee.com/yutiansut/arrow-datafusion.git
git@gitee.com:yutiansut/arrow-datafusion.git
yutiansut
arrow-datafusion
arrow-datafusion
master

搜索帮助