代码拉取完成,页面将自动刷新
同步操作将从 OpenHarmony-SIG/jsoup 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
npm install @ohos/jsoup --save
OpenHarmony npm环境配置等更多内容,请参考 如何安装OpenHarmony npm包 。
import { Jsoup, SanitizeHtml, Parser, DomHandler, Document, DomUtils } from '@ohos/jsoup'
const html = `
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<style>
.tagh1{
background-color: aquamarine;
color:'blue';
}
.one-div{
line-height: 30px;
}
</style>
<body>
<h1 class="tagh1">
kkkk
<p>hhhhh</p>
</h1>
<div style="color:red; height:100px;" class="one-div">cshi</div>
<img src="https:baidu.com" alt="wwww"/>
<p>wjdwekfe>>>>></p>
<em>dsjfw<<<<<p
<div>dksfmjk</div>
owqkdo</em>
</body>
</html>
`
解析方式一:
const parser = new Parser.Parser({
onopentag(name, attributes) {
console.info(`jsoup onopentag name --> ${name} attributes --> ${attributes}`)
},
ontext(text) {
console.info("jsoup text -->", text);
},
onopentagname(name) {
console.info("jsoup tagName -->", name);
},
onattribute(name, value) {
console.info(`jsoup attribName name --> ${name} value --> ${value}`)
},
onclosetag(tagname) {
console.info("jsoup closeTag --> ", tagname);
},
});
parser.write(html);
parser.end();
或
const handler = new DomHandler((error, dom) => {
if (error) {
// Handle error
} else {
// Parsing completed, do something
}
});
const parser = new Parser.Parser(handler, { decodeEntities: true });
parser.write(html);
parser.end();
解析方式二:
let dom: Document = Parser.parseDocument(html)
let httpRequest = http.createHttp()
httpRequest.request('http://106.15.92.248/share/html.txt')
.then((data) => {
console.log("jsoup url html=" + JSON.stringify(data))
if (data.result && typeof data.result === 'string') {
parser.write(data.result);
parser.end();
}
})
.catch((err) => {
console.error('jsoup connect error:' + JSON.stringify(err));
})
var dom = Jsoup.parseHtmlFromFile(stream, html.length)
// 注意:需要先在MainAbility中为该变量赋值: globalThis.Context = this.context;
if (!globalThis.Context) {
console.log('jsoup global Context is undefined');
return;
}
var filePath = globalThis.Context.filesDir + '/testHtml.html';
globalThis.Context.resourceManager.getRawFile(filePath)
.then((data) => {
var textDecoder = new util.TextDecoder("utf-8", {
ignoreBOM: true
})
var result: string = textDecoder.decode(data, {
stream: false
})
console.log("jsoup getHtmlFromRawFile text=" + result);
this.createFile(filePath);
this.writeFile(filePath, result);
})
.catch((err) => {
console.log("jsoup getHtmlFromRawFile err=" + err)
})
if (!globalThis.Context) {
console.log('jsoup global Context is undefined');
return;
}
var filePath = globalThis.Context.filesDir + '/testHtml.html';
fileio.readText(filePath)
.then((data) => {
console.log("jsoup getHtmlFromFilePath text=" + data);
parser.write(data);
parser.end();
})
.catch((err) => {
console.log("jsoup getHtmlFromFilePath err=" + err)
})
// 提取CSS
Jsoup.parseCSS(html)
对解析过的Dom对象进行提取操作:
// 根据标签名称获取元素
let element = DomUtils.getElementsByTagName('style', dom)
// 获取文本
let text = DomUtils.getText(element)
// 判断元素是否为tag
let isTag = DomUtils.isTag(element[0])
// 判断元素是否为CDATA
let isCDATA = DomUtils.isCDATA(element[0])
// 判断元素是否Text
let isText = DomUtils.isText(element[0])
// 判断元素是否为Comment
let isComment = DomUtils.isComment(element[0])
// 获取指定元素的子元素
let childrens = DomUtils.getChildren(body[0])
const clean = SanitizeHtml('before <img src="test.png" /> after', {
disallowedTagsMode: 'escape',
allowedTags: [],
allowedAttributes: false
})
解析字符串类型的HTML
方式一:
interface ParserOptions {
decodeEntities?: boolean;
lowerCaseTags?: boolean;
lowerCaseAttributeNames?: boolean;
recognizeCDATA?: boolean;
recognizeSelfClosing?: boolean;
}
interface Handler {
onparserinit(parser: Parser): void;
onreset(): void;
onend(): void;
onerror(error: Error): void;
onclosetag(name: string): void;
onopentagname(name: string): void;
onattribute(name: string, value: string, quote?: string | undefined | null): void;
onopentag(name: string, attribs: {
[s: string]: string;
}): void;
ontext(data: string): void;
oncomment(data: string): void;
oncdatastart(): void;
oncdataend(): void;
oncommentend(): void;
onprocessinginstruction(name: string, data: string): void;
}
const parser = new Parser.Parser(cbs: Partial<Handler> | null, options?: ParserOptions)
parser.write(html)
parser.end();
方式二:
parseDocument(data: string, options?: Options): Document
提取HTML属性
DomUtils接口定义参照:Doc
Jsoup.parseCSS(html: string): string
根据文件流获取HTML
Jsoup.parseHtmlFromFile(stream: fileio.Stream, htmlLength: number): string
清理HTML
SanitizeHtml(dirty: string, options?: sanitize.IOptions): string
可配置属性:
interface Attributes { [attr: string]: string; }
interface Tag { tagName: string; attribs: Attributes; text?: string | undefined; }
type Transformer = (tagName: string, attribs: Attributes) => Tag;
type AllowedAttribute = string | { name: string; multiple?: boolean | undefined; values: string[] };
allowedAttributes?: Record<string, AllowedAttribute[]> | false;
allowedStyles?: { [index: string]: { [index: string]: RegExp[] } };
allowedClasses?: { [index: string]: boolean | Array<string | RegExp> }
allowedIframeDomains?: string[];
allowedIframeHostnames?: string[];
allowIframeRelativeUrls?: boolean;
allowedSchemes?: string[] | boolean;
allowedSchemesByTag?: { [index: string]: string[] } | boolean;
allowedSchemesAppliedToAttributes?: string[];
allowedScriptDomains?: string[];
allowedScriptHostnames?: string[];
allowProtocolRelative?: boolean;
allowedTags?: string[] | false;
allowVulnerableTags?: boolean;
textFilter?: ((text: string, tagName: string) => string);
exclusiveFilter?: ((frame: IFrame) => boolean);
nonTextTags?: string[];
selfClosing?: string[];
transformTags?: { [tagName: string]: string | Transformer };
parser?: ParserOptions;
disallowedTagsMode?: discard' | 'escape' | 'recursiveEscape;
enforceHtmlBoundary?: boolean;
支持 OpenHarmony API version 9 及以上版本。
|---- jsoup
| |---- entry # 示例代码文件夹
| |----src
| |----addTag.ets
| |----index.ets
| |---- jsoup # jsoup库文件夹
| |----src
| |----main
| |----ets
| |----common 模板
| |----Cleaner.ts #html clean
| |----Jsoup.ts #html解析
| |---- index.ts # 对外接口
| |---- README.md # 安装使用方法
使用过程中发现任何问题都可以提 Issue 给我们,当然,我们也非常欢迎你给我们发 PR 。
本项目基于 MIT ,请自由地享受和参与开源。
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。