2 Star 0 Fork 0

xj / spiderDemo

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

使用指南

增加依赖

implementation "com.pince.maven:spider:1.0.8" ##使用

class MainActivity : AppCompatActivity(), SpiderCallback {

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
      get.setOnClickListener {
                  SpiderBuilder()
                      .connectionTimeout(3000)
                      .appKey("具体app对应的appkey")
                      .setCallBack(this)
                      .connection("https://gitee.com/xjdd/spiderDemo/blob/master/README.md")
                      .get()
              }


    }

    override fun onSuccess(pageTitle: String?, doc: SpiderDoc?) {
       get.text = pageTitle
   }

   override fun onError(e: Exception?) {
       Log.e("MainActivity", "Error ${e?.message}")
   }
}

其中caBack方法中的 host 为爬取的网页的域名,可以通过 spiderDoc.doc获取到 Jsoup 的 Document对象,实现更为丰富的功能,具体见官方文档https://www.open-open.com/jsoup/

混淆配置

-keepclassmembers class org.jsoup.* {
   public *;
}

爬虫流程分析

Document doc = ready().get(); 获取网页文档:

<!doctype html>
<html lang="zh-CN">
 <head> 
  <title>README.md · Kding/KdingSpider - 码云 Gitee.com</title> 
  <link href="https://assets.gitee.com/assets/favicon-9007bd527d8a7851c8330e783151df58.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"> 
  <meta content="gitee.com/kding123/spider git https://gitee.com/kding123/spider.git" name="go-import"> 
  <meta content="IE=edge" http-equiv="X-UA-Compatible"> 
  <meta content="authenticity_token" name="csrf-param"> 
  <meta content="jleuxh23etL0XVVrMpeC8b9zGQSsZNdvXV72D73Fv4I=" name="csrf-token"> 
  <link href="https://assets.gitee.com/assets/application-a744dd92ff8f96f111c242e5ba132640.css" media="all" rel="stylesheet"> 
  <script>
         <div class="clearfix"></div> 
         <div class="file_content markdown-body"> 
          <p>host-option= mierNDcqMTAyKjI1MioxOTg=mier zuanshiZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==zuanshi diamondliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==diamondlive</p> 
          <p>kuaimaolivingZDIyMip0dWJiemIqY29tJTIza3VhaW1hbyUyM21hc3Rlcg==kuaimaoliving(快猫live)</p> 
          <p>kuaimaovideoZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==kuaimaovideo(快猫视频)</p> 
          <p>feiboliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==feibolive</p> 
          <p>luoboliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==luobolive</p> 
          <p>qiyuZCplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXI=qiyu</p> 
          <p>tubaobaoliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==tubaobaolive</p> 
          <p>kuaimaoliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2t1YWltYW8lMjNtYXN0ZXI=kuaimaolive</p> 
          <p>fengmiliveZDQ0NCptZWltaW5nemFuKmNvbSUyM3F5ZG91bmFpJTIzbWFzdGVyfengmilive</p> 
          <p>fanqieliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==fanqielive</p> 
          <p>caiseliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==caiselive</p> 
          <p>fanxingliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==fanxinglive</p> 
          <p>kawayiliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==kawayilive</p> 
          <p>boluoliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==boluolive</p> 
          <p>babyliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==babylive</p> 
          <p>tianlangliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==tianlanglive</p> 
          <p>xiaogongjuliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==xiaogongjulive</p> 
          <p>youhuoliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==youhuolive</p> 
          <p>sixboliveZDQ0NCptZWltaW5nemFuKmNvbSUyM21hb21pJTIzbWFzdGVysixbolive</p> 
          <p>kuaihuliveZDIyMiptZWltaW5nemFuKmNvbSUyM2t1YWlodSUyM21hc3Rlcg==kuaihulive</p> 
          <p>xiaobanlvliveZDQ0NCptZWltaW5nemFuKmNvbSUyM3hpYW9iYW5sdiUyM21hc3Rlcg==xiaobanlvlive</p> 
          <p>adventurelive72dZDIyMiptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==adventurelive72d</p> 
          <p>adventurelive72d222ZDIyMiplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXIKadventurelive72d222</p> 
          <p>adventurelive72d333ZDMzMyplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXI=adventurelive72d333</p> 
          <p>adventurelive120ZDIyMiptZWltaW5nemFuKmNvbSUyM2p3JTIzbWFzdGVyadventurelive120</p> 
          <p>qiyuzhushouZCplb3dhaSpjb20lMjN4enMlMjNtYXN0ZXI=qiyuzhushou</p> 
          <p>mierNDcqMTAyKjI1MioxOTg=mier</p> 
          <p>deskmatecm9zKmh1bHVsaWFvKmNvbSUyMw==deskmate</p> 
          <p>luckyliveaWxrKmthbmthbnB0KmNvbQ==luckylive</p> 
          <p>rarechataWxrKmthbmthbnB0KmNvbQ==rarechat</p> 
          <p>kissliveaWxrKmthbmthbnB0KmNvbQ==kisslive</p> 
          <p>qixingzhibolivingZCp5dWFubWVpNTU1KmNvbSUyM2lxeCUyM21hc3Rlcg==qixingzhiboliving</p> 
          <p>bibiliveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVybibilive</p> 
          <p>mualiveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVymualive</p> 
          <p>qieziliveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVyqiezilive</p> 
          <p>huangguanZDIyMip5dW55YW56aGlibypjb20=huangguan</p>
         </div> 
         <script>
  toMathMlCode('','markdown-body');
</script> 
  <script defer src="//www.oschina.net/public/javascripts/cjl/ga.js?t=20160926" type="text/javascript"></script>   
 </body>
</html>

第二步,获取关键节点

final String result = handleHost(doc.getElementsByClass("file_content markdown-body").text());

doc.getElementsByClass("file_content markdown-body")结果:

host-option= mierNDcqMTAyKjI1MioxOTg=mier zuanshiZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==zuanshi diamondliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==diamondlive kuaimaolivingZDIyMip0dWJiemIqY29tJTIza3VhaW1hbyUyM21hc3Rlcg==kuaimaoliving(快猫live) kuaimaovideoZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==kuaimaovideo(快猫视频) feiboliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==feibolive luoboliveZDIyMiptZWltaW5nemFuKmNvbSUyM2l6cyUyM21hc3Rlcg==luobolive qiyuZCplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXI=qiyu tubaobaoliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==tubaobaolive kuaimaoliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2t1YWltYW8lMjNtYXN0ZXI=kuaimaolive fengmiliveZDQ0NCptZWltaW5nemFuKmNvbSUyM3F5ZG91bmFpJTIzbWFzdGVyfengmilive fanqieliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==fanqielive caiseliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==caiselive fanxingliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==fanxinglive kawayiliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==kawayilive boluoliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==boluolive babyliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==babylive tianlangliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==tianlanglive xiaogongjuliveZDQ0NCptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==xiaogongjulive youhuoliveZCp4cmtsaXZlKmNvbSUyM3R1YmFvYmFvJTIzbWFzdGVyIA==youhuolive sixboliveZDQ0NCptZWltaW5nemFuKmNvbSUyM21hb21pJTIzbWFzdGVysixbolive kuaihuliveZDIyMiptZWltaW5nemFuKmNvbSUyM2t1YWlodSUyM21hc3Rlcg==kuaihulive xiaobanlvliveZDQ0NCptZWltaW5nemFuKmNvbSUyM3hpYW9iYW5sdiUyM21hc3Rlcg==xiaobanlvlive adventurelive72dZDIyMiptZWltaW5nemFuKmNvbSUyM2lxeSUyM21hc3Rlcg==adventurelive72d adventurelive72d222ZDIyMiplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXIKadventurelive72d222 adventurelive72d333ZDMzMyplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXI=adventurelive72d333 adventurelive120ZDIyMiptZWltaW5nemFuKmNvbSUyM2p3JTIzbWFzdGVyadventurelive120 qiyuzhushouZCplb3dhaSpjb20lMjN4enMlMjNtYXN0ZXI=qiyuzhushou mierNDcqMTAyKjI1MioxOTg=mier deskmatecm9zKmh1bHVsaWFvKmNvbSUyMw==deskmate luckyliveaWxrKmthbmthbnB0KmNvbQ==luckylive rarechataWxrKmthbmthbnB0KmNvbQ==rarechat kissliveaWxrKmthbmthbnB0KmNvbQ==kisslive qixingzhibolivingZCp5dWFubWVpNTU1KmNvbSUyM2lxeCUyM21hc3Rlcg==qixingzhiboliving bibiliveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVybibilive mualiveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVymualive qieziliveZDIyMipxaXpob3V3ZW5odWEqY29tJTIzaXF4JTIzbWFzdGVyqiezilive huangguanZDIyMip5dW55YW56aGlibypjb20=huangguan

第三步
验证本地key

origin = origin.split(appKey)[1]; //就是分割下本地key 结果 ZDMzMyplb3dhaSpjb20lMjNpcXklMjNtYXN0ZXI=

第四部 Base64Utils解码 return Base64Utils.decode(origin.split("@")[0]) .replace("*", ".").replace("%23", "/");

得到 d333.eowai.com/iqy/master 先分割@ 解码 替换字符串

由此可以反推加密过程,替换字符串 ,拼接@ 拼接appkey,

整体看来并不是啥加密。。。只是编码,不过这个数据视乎也没必要加密。。。。。。

空文件

简介

取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/xjdd/spiderDemo.git
git@gitee.com:xjdd/spiderDemo.git
xjdd
spiderDemo
spiderDemo
master

搜索帮助