9 Star 121 Fork 45

taishan / 微信文章爬虫 Reptile

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MulanPSL-1.0

作者致言

最近工作比较忙,加微信和qq咨询的人也比较多,建议,大家尽量在评论区留言,如果比较着急的话加微信,说清楚问题,看到有时间,一定回复!

首先恭喜你看到了这个项目,在开发这个项目前,本人已经通过百度和gitee上搜索了很多关于微信公众号爬虫相关的项目,目前大致爬取微信公众号的方法重要有三种,下面有讲到,本人尝试前两种方法,第三种太多麻烦,需要耗费太多的时间和精力于是放弃,采取了性价比最高的第二种方法,本项目相对gitee其他的开源项目,最大的优势,功能相对更完善一些和代码近期推送的,随着微信公众号的这两年的改变,近期没有更新过的爬虫开源项目,大多不能正常运行。开发维护不易,觉得有用的点赞收藏吧,这也是我继续下去的动力。

我的博客

  1. JAVA OPC UA专栏https://blog.csdn.net/weixin_40986713/category_12356608.html
  2. AI绘画 | Stable diffusionhttps://blog.csdn.net/weixin_40986713/category_12481790.html
  3. java高级技术专栏https://blog.csdn.net/weixin_40986713/category_10796066.html
  4. java Selenium自动化爬虫https://blog.csdn.net/weixin_40986713/category_12165790.html
  5. java 推荐算法专栏https://blog.csdn.net/weixin_40986713/category_12268014.html
  6. Java视频图像处理专栏https://blog.csdn.net/weixin_40986713/category_11109931.html

项目背景

个人在业余时间,写的一个以微信公众号爬虫为主要功能,普通网页爬虫、浏览器控制、邮件群发功能为辅的简单DEMO。功能简单,给开发者巨大的学习和发挥的空间。对spring boot和html有一些经验的人来说,上手简单,学习成本低.

功能介绍

爬虫项目,微信公众号文章爬虫,网站文章爬虫,群发邮件系统

项目架构

springBoot 单项目架构

已知爬取微信公众号有三种方法:

第一种:用搜狗微信公众号搜过,这个只能收到前10条;(亲试,好多公众号连近10条都获取不到,放弃)

第二种:用fiddler或手机抓包,从访问链接去获得appmsg_token,发现虽然这个值就在html页面里,但只有抓包的数据里含有效值,直接访问的是空的,而且还有时效性。这样,每次都要抓包获取,就很麻烦。

第三种:就是这种用公众号搜公众号的,虽然速度慢点,但便捷了不少。(每天请求次数限制,约为100次)

使用须知

程序原理:

通过selenium登录获取token和cookie,再自动爬取和下载

使用前提:

1、修改项目中Chrome驱动的路径改为自己本地的

2、有自己的公众号,没有可以申请一个微信公众号(个人订阅版)(https://mp.weixin.qq.com)

3、修改reptile.properties文件中的账号和密码

安装教程

  1. git下载源码
  2. maven构建
  3. idea-java运行

使用说明

  1. core核心包 java主方法运行
  2. spring boot 主类运行

功能简介

简单的爬虫系统和邮件系统 1.爬虫分为微信公众号爬虫和和普通网页爬虫(主要通过selenium和jsoup实现) 2.java邮箱发送系统,可以实现邮件群发(主要通过javax.mail实现)

系统运行视图

输入图片说明

可视化页面

输入图片说明 输入图片说明 输入图片说明

常见问题

在使用selenium启动谷歌Chrome浏览器的时候,是需要用到chromedirver的,两者之间的版本是需要匹配的,否则会出现下面类似的报错:

Only local connections are allowed.
org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.18363 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 76 milliseconds
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'WIN-9T6EKDMSTI5', ip: '172.16.10.8', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_221'
Driver info: driver.version: ChromeDriver
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

解决方案,查看这篇文章

Selenium Chrome浏览器版本与chromedriver驱动兼容版本对照表

爬虫教学专栏

Selenium自动化爬虫

技术交流&问题反馈

  微信号:vxhqqh
木兰宽松许可证, 第1版 木兰宽松许可证, 第1版 2019年8月 http://license.coscl.org.cn/MulanPSL 您对“软件”的复制、使用、修改及分发受木兰宽松许可证,第1版(“本许可证”)的如下条款的约束: 0. 定义 “软件”是指由“贡献”构成的许可在“本许可证”下的程序和相关文档的集合。 “贡献者”是指将受版权法保护的作品许可在“本许可证”下的自然人或“法人实体”。 “法人实体”是指提交贡献的机构及其“关联实体”。 “关联实体”是指,对“本许可证”下的一方而言,控制、受控制或与其共同受控制的机构,此处的控制是指有受控方或共同受控方至少50%直接或间接的投票权、资金或其他有价证券。 “贡献”是指由任一“贡献者”许可在“本许可证”下的受版权法保护的作品。 1. 授予版权许可 每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的版权许可,您可以复制、使用、修改、分发其“贡献”,不论修改与否。 2. 授予专利许可 每个“贡献者”根据“本许可证”授予您永久性的、全球性的、免费的、非独占的、不可撤销的(根据本条规定撤销除外)专利许可,供您制造、委托制造、使用、许诺销售、销售、进口其“贡献”或以其他方式转移其“贡献”。前述专利许可仅限于“贡献者”现在或将来拥有或控制的其“贡献”本身或其“贡献”与许可“贡献”时的“软件”结合而将必然会侵犯的专利权利要求,不包括仅因您或他人修改“贡献”或其他结合而将必然会侵犯到的专利权利要求。如您或您的“关联实体”直接或间接地(包括通过代理、专利被许可人或受让人),就“软件”或其中的“贡献”对任何人发起专利侵权诉讼(包括反诉或交叉诉讼)或其他专利维权行动,指控其侵犯专利权,则“本许可证”授予您对“软件”的专利许可自您提起诉讼或发起维权行动之日终止。 3. 无商标许可 “本许可证”不提供对“贡献者”的商品名称、商标、服务标志或产品名称的商标许可,但您为满足第4条规定的声明义务而必须使用除外。 4. 分发限制 您可以在任何媒介中将“软件”以源程序形式或可执行形式重新分发,不论修改与否,但您必须向接收者提供“本许可证”的副本,并保留“软件”中的版权、商标、专利及免责声明。 5. 免责声明与责任限制 “软件”及其中的“贡献”在提供时不带任何明示或默示的担保。在任何情况下,“贡献者”或版权所有者不对任何人因使用“软件”或其中的“贡献”而引发的任何直接或间接损失承担责任,不论因何种原因导致或者基于何种法律理论,即使其曾被建议有此种损失的可能性。 条款结束。 如何将木兰宽松许可证,第1版,应用到您的软件 如果您希望将木兰宽松许可证,第1版,应用到您的新软件,为了方便接收者查阅,建议您完成如下三步: 1, 请您补充如下声明中的空白,包括软件名、软件的首次发表年份以及您作为版权人的名字; 2, 请您在软件包的一级目录下创建以“LICENSE”为名的文件,将整个许可证文本放入该文件中; 3, 请将如下声明文本放入每个源文件的头部注释中。 Copyright (c) [2019] [name of copyright holder] [Software Name] is licensed under the Mulan PSL v1. You can use this software according to the terms and conditions of the Mulan PSL v1. You may obtain a copy of Mulan PSL v1 at: http://license.coscl.org.cn/MulanPSL THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v1 for more details. Mulan Permissive Software License,Version 1 Mulan Permissive Software License,Version 1 (Mulan PSL v1) August 2019 http://license.coscl.org.cn/MulanPSL Your reproduction, use, modification and distribution of the Software shall be subject to Mulan PSL v1 (this License) with following terms and conditions: 0. Definition Software means the program and related documents which are comprised of those Contribution and licensed under this License. Contributor means the Individual or Legal Entity who licenses its copyrightable work under this License. Legal Entity means the entity making a Contribution and all its Affiliates. Affiliates means entities that control, or are controlled by, or are under common control with a party to this License, ‘control’ means direct or indirect ownership of at least fifty percent (50%) of the voting power, capital or other securities of controlled or commonly controlled entity. Contribution means the copyrightable work licensed by a particular Contributor under this License. 1. Grant of Copyright License Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable copyright license to reproduce, use, modify, or distribute its Contribution, with modification or not. 2. Grant of Patent License Subject to the terms and conditions of this License, each Contributor hereby grants to you a perpetual, worldwide, royalty-free, non-exclusive, irrevocable (except for revocation under this Section) patent license to make, have made, use, offer for sale, sell, import or otherwise transfer its Contribution where such patent license is only limited to the patent claims owned or controlled by such Contributor now or in future which will be necessarily infringed by its Contribution alone, or by combination of the Contribution with the Software to which the Contribution was contributed, excluding of any patent claims solely be infringed by your or others’ modification or other combinations. If you or your Affiliates directly or indirectly (including through an agent, patent licensee or assignee), institute patent litigation (including a cross claim or counterclaim in a litigation) or other patent enforcement activities against any individual or entity by alleging that the Software or any Contribution in it infringes patents, then any patent license granted to you under this License for the Software shall terminate as of the date such litigation or activity is filed or taken. 3. No Trademark License No trademark license is granted to use the trade names, trademarks, service marks, or product names of Contributor, except as required to fulfill notice requirements in section 4. 4. Distribution Restriction You may distribute the Software in any medium with or without modification, whether in source or executable forms, provided that you provide recipients with a copy of this License and retain copyright, patent, trademark and disclaimer statements in the Software. 5. Disclaimer of Warranty and Limitation of Liability The Software and Contribution in it are provided without warranties of any kind, either express or implied. In no event shall any Contributor or copyright holder be liable to you for any damages, including, but not limited to any direct, or indirect, special or consequential damages arising from your use or inability to use the Software or the Contribution in it, no matter how it’s caused or based on which legal theory, even if advised of the possibility of such damages. End of the Terms and Conditions How to apply the Mulan Permissive Software License,Version 1 (Mulan PSL v1) to your software To apply the Mulan PSL v1 to your work, for easy identification by recipients, you are suggested to complete following three steps: i. Fill in the blanks in following statement, including insert your software name, the year of the first publication of your software, and your name identified as the copyright owner; ii. Create a file named “LICENSE” which contains the whole context of this License in the first directory of your software package; iii. Attach the statement to the appropriate annotated syntax at the beginning of each source file. Copyright (c) [2019] [name of copyright holder] [Software Name] is licensed under the Mulan PSL v1. You can use this software according to the terms and conditions of the Mulan PSL v1. You may obtain a copy of Mulan PSL v1 at: http://license.coscl.org.cn/MulanPSL THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE. See the Mulan PSL v1 for more details.

简介

爬虫项目,微信公众号文章爬虫,网站文章爬虫,群发邮件系统 展开 收起
Java 等 5 种语言
MulanPSL-1.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Java
1
https://gitee.com/taisan/reptile.git
git@gitee.com:taisan/reptile.git
taisan
reptile
微信文章爬虫 Reptile
master

搜索帮助