1 Star 0 Fork 0

xmlhh / InfoSpider

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
README_EN.md 7.05 KB
一键复制 编辑 原始数据 按行查看 历史
kangvcar 提交于 2020-10-26 10:14 . change screenshot imgurl

logo


GitHub stars UW2eVx.png UW2eVx.png UW2eVx.png GitHub repo size GitHub repo size

A magic toolbox to get back your personal information.

Documentation | Video Demo

Donate

Support Me!

Paypal

Developer Memoir🌈

Click to expand👉 Developer Memoir🌈

Scenes 1

As usual, Xiao Ming opened the Chrome browser to browse the BBS, Tieba. Accidentally, Xiaoming opened the advertisement on the web page and jumped to JingDong Mall. When he went to close the window subconsciously, he found that (OS: it was just the product I needed!) How would JD know?Now that I've opened it, let's see the details of the product! Not bad. (OS: Give it a try!)

Scenes 2

Bai listens to the netease cloud music daily recommended song list can not get out of it (OS: wow! Why the playlist full of my favorite music styles? How great the netease cloud music! Love it so much! I have to buy a mumbership), strolling through ZhiHu's "How elegant XXX?, "What kind of experience is XXX?, "How do you evaluate XXX? (OS: Huh? This question is just what I want to ask, it has already been asked! What?? Thousands of answers!! Go inside and have a look!)

Scenes 3

Xiao Da never forget to enrich himself at work. As the major technical cnblog, CSDN, OSChina, JianShu, JueJin, etc., he find the homepage content recommendation is great (OS: these technical net posts are so great. I don't have to look for it as it came out). When he open the blog home page unconsciously,he found himself stick to write blog for three years, its technology stack is becoming more and more rich (OS: how to blog background does not provide a data analysis system? I want to see how many posts I've done over the years, when I've done it, which posts are hot, which technologies I've spent more time on, and which times I've been at my peak in the evenings? In the wee hours? I hope the system can give me more guidance data so that I can create better! Looking at the above scenes, you may sigh over the progress of technology, which has greatly improved our way of life. )

Idea

If you have a tool like this, it can help you get your personal information back, it can help you aggregate your personal information from various sites, it can help you analyze your personal data and give you Suggestions, it can help you visualize your personal data so that you can know yourself better.

Would you need such a tool? Would you like such a tool?

Based on the above, I started to develop INFO-SPIDER 👇👇👇

What is INFO-SPIDER

INFO-SPIDER is a crawler toolbox with numerous data sources. It aims to help users get their data back safely and quickly. The tool code is open source and the process is transparent. It also provides data analysis function and generates chart files based on user data, so that users can have a more intuitive and in-depth understanding of their own information. Currently supported data sources including GitHub, QQ mailbox, NetEase mailbox, Ali mailbox, Sina mailbox, Hotmail mailbox, Outlook mailbox, JingDong, TaoBao, Alipay, China Mobile, China Unicom, China Telecom, ZhiHu, Bilibili, NetEase Cloud Music, QQ Friends, QQ Groups, WeChat Moments Album, Browser History, 12306, Cnblog, CSDN, OSCHINA, JianShu.

Refer to the document or Video Demo for details

You can communicate with us on Gitter

Features

  • Safe and Reliable: this project is open source, all source code visible, local operation, safe and reliable.
  • Easy to Use: to provide a GUI interface, just click the data source you want to get and follow the prompts.
  • Clear Structure: All the data sources of the project are independent from each other, and their portability is high. All crawler scripts are under the Spiders catalogue.
  • Rich Data Sources: This project currently supports up to 24+ data sources, which are constantly updated.
  • Uniform Data Format: All crawled data will be stored in JSON format.
  • Rich Personal Data: This project will crawl as much personal data as possible for you, and later data processing can be reduced as needed.
  • Data Analysis: This project provides visual analysis of personal data, which is currently only partially supported.
  • Documentation: This project contains complete document or Video Demo .

Screenshot

screenshot.png

QuickStart

Requirements

  • Step1: Install python3 and Chrome
  • Step2: Install the same driver as the Chrome browser
  • Step3: Run pip install -r requirements.txt

Run the project

  • Step1: cd tools
  • Step2: python3 main.py
  • Step3: Click the Data Source button in the open window and select the data save path as prompted
  • Step4: The popup browser will automatically start crawling data after entering the user password, and the browser will automatically close after crawling.
  • Step5: In the corresponding directory, you can view the downloaded data (xxx. JSON), data analysis chart (XXx. HTML)

Plan

  • Provide web interface operation, adapt to multi-platform
  • Conduct statistical analysis of personal data
  • It integrates machine learning technology and natural language processing technology to analyze the data in depth
  • Chart the analysis results visually
  • Add more data sources...

Visitors

Contributors

Sponsors

Thank you to JetBrains, who provide Open Source License for PyCharm!

License

GPL-3.0

1
https://gitee.com/xmlhh/InfoSpider.git
git@gitee.com:xmlhh/InfoSpider.git
xmlhh
InfoSpider
InfoSpider
master

搜索帮助