Scriptis mainly has the following features:
These functions are described in detail below.
Workspace is a file directory that a user have full permission to. At here, a user could do various operations such as managing files. The recommending directory structure is: script, data, log and res, since it is quite clear and thus easy for users to check and manage. The major functions of workspace are listed below:
sql: Correspond to SparkSQL in Spark engine, syntax guide: https://docs.databricks.com/spark/latest/spark-sql/index.html
hql: Correspond to Hive engine: syntax guide: https://cwiki.apache.org/confluence/display/Hive/LanguageManual
Scala: Correspond to scala in Spark engine, syntax guide: https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-scala.html
JDBC: sql standard syntax, not supported yet.
Python: Standalone python engine, compatible with python
PythonSpark: Correspond to python in Spark engine, syntax guide: https://docs.databricks.com/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html
Dataset module has the following functions.
This module not only makes it easy for user to classify and display UDF, but also enables users to manage and share UDF. The major functions are listed below:
Default top-level directory:
BDAP function: Provided by platform and can be used in sql, pyspark, scala and hive (written with sql) scripts.
System function: Functions that system provides and loaded by default. Can be used in sql, pyspark, scala and hive (written with sql) scripts.
Individual function: Self-define functions, include general functions and Spark exclusive functions.
Sharing function: Functions created by administrator and then shared to other users.
Apart from system functions, other types of functions must be loaded before using and a user must kill the started session after checking the functions.
In addition, if a function is checked and loaded, it would correspondingly shown in auto-complete options.
It is quite easy to create a new UDF as long as you've finished the code. The steps are as follows:
To create a general UDF, a user needs to compile the corresponding Jar package first. General means either hql in Hive or sql in Spark applies here.
To create a Spark exclusive UDF, a user needs to create a corresponding python or scala script. Besides, to ensure the correctness, it is better to test the scripts.
Add this UDF to Scriptis:
General UDF: Choose general then select the path in workspace for its Jar package. Next, fill in the full class path of UDF and add formatting as well as description:
Spark exclusive UDF -- written in scala: Check Spark then select the corresponding scala script and fill in the registration format. (Function name in script):
Spark exclusive UDF -- written in scala: Check Spark then select the corresponding python script and fill in the registration format. (Function name in script):
For a PythonUDF, a user only needs to define a function, and the scripts have to correspond with this function.
def hello(id):
return str(id) + ":hello"
The way to create a ScalaUDF is quite similar to creating a Python UDF, a user only needs to define a function:
def helloWord(str: String): String = "hello, " + str
Note: Python UDF and Scala UDF can only applied in scripts that corresponding to the Spark engine.
Function module is similar to UDF module, the only difference between them is that one is UDF and the other is self-defined function. Also notes that, functions defined by python can only be used in python and pyspark. Similarly, functions defined by scala can only be used in scala.
The functions of this module are mainly integrated in the script edit box:
Script editing: Support basic keyword highlighting, code formatting, code merging, auto-completion and shortcuts etc.
Running and stopping: Users can choose to run only a segment of code or the entire script. By clicking stop button, users can terminate the running scripts whenever they want.
Script edit box has configuration options for defining user-define functions that take effects within the script.
This module has the following functions:
For now, it supports showing results in a table, clicking the header to sort, double-clicking to copy the field name and all these functions are restricted to showing up to 5000 lines of records. More functions would be supported in future, such as displaying the selected columns and filed types.
Visual analysis: Click on visual analysis button to visualize the result through VSBI. (Soon to be released)
Downloading: Users can directly download the results as csv and excel files to local browser. Only support downloading 5000 lines for now.
Exporting: Results can be exported to the workspace (shared directory of BDAP) in either csv or excel format and would not be restricted to 5000 lines if you choose full export at first. To use full export, add a comment in front of sql: --set wds.linkis.engine.no.limit.allow=true
Go to Console--Configuration--Pipeline--Import and Export settings--Result export type to choose whether export results in csv format or excel format.
Script history shows all the running information of the script. A user can quickly find the log and running results of a script that was run before and therefore, avoid running the same script repeatedly.
Console has the following functions:
Global variables: A global variable is a custom variable that can be applied to all scripts. If its name is same as the name of a variable in a script, that variable would take effect.
Other functions: Global history, resource manager, FAQs.
Similar to the Windows task manager, users can quickly view and manage tasks, engines and queue resources here.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。