关于快手视频的django项目,之前使用爬虫,可以获取快手视频网站的视频,输入用户主页地址,就可以获取用户的视频地址、粉丝数、点赞数等。现在这个项目准备实现:随机获取用户id并且去重,然后根据id获取主页视频信息,将获取的内容展示到网页上。后期还可以开发用户注册登录系统,可以对视频点赞和关注,并且一键下载,最后还可以练习安卓或者微信小程序。嗯,这是我的设想,本项目只是用来学习,切勿商用。嗯,开始。
python manage.py runserver
import os
LANGUAGE_CODE = 'zh-hans'
TIME_ZONE = 'Asia/Shanghai'
python manage.py migrate
python manage.py makemigrations
python manage.py createsuperuser
我把之前写的工程迁移过来了,做了使用模板,导入boostrap,使用POST请求,爬取到的数据导入数据库。
import requests
import json
import os
# 爬取个人主页关注用户的id和naame
URL = "https://video.kuaishou.com/graphql"
headers = {
"accept":"*/*",
"Content-Length":"<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
"Cookie": r'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; ktrace-context=1|MS43NjQ1ODM2OTgyODY2OTgyLjc1MjgyMjUyLjE2MTU0NDI5NDQ0MzYuMTU2OTE=|MS43NjQ1ODM2OTgyODY2OTgyLjIxMjcxODY4LjE2MTU0NDI5NDQ0MzYuMTU2OTI=|0|graphql-server|webservice|false|NA; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABUkHhV7V4kZgEsKH5ujlHNWEHV_KRDoBGhvSztjMMB54VfcpY6EJgzK_b3ZYFhM0obMSTVBDc7Csb-KuDKQpR8sobH5ozd82kEMIV5eb3S0QSJBxAemnSYimqR5IskD_IGA06cph50uA_oH2OftW2tSpaBuXl3vyYhFv6aS_24d8z0n9WILEo5JcTI0QpDdmDoRnXxHc_x7JHIR3s1pBlBhoSzFZBnBL4suA5hQVn0dPKLsMxIiDp66EsPPenAZG6MBgmJkQL2mrCKEDn1OPcTisxS6wffSgFMAE; kuaishou.server.web_ph=cb43dea88ab3a4c31dd231f2dc9cc29b8680',
"Host": "video.kuaishou.com",
"Origin": "https://video.kuaishou.com",
"Referer": "https://video.kuaishou.com/brilliant", # 这里要更改
"User-Agent": "PostmanRuntime/7.26.8"
}
payload = {"operationName":"brilliantTypeDataQuery","variables":{"hotChannelId":"00","page":"brilliant","pcursor":"1"},"query":"fragment feedContent on Feed {\n type\n author {\n id\n name\n headerUrl\n following\n headerUrls {\n url\n __typename\n }\n __typename\n }\n photo {\n id\n duration\n caption\n likeCount\n realLikeCount\n coverUrl\n photoUrl\n coverUrls {\n url\n __typename\n }\n timestamp\n expTag\n animatedCoverUrl\n distance\n videoRatio\n liked\n stereoType\n __typename\n }\n canAddComment\n llsid\n status\n currentPcursor\n __typename\n}\n\nfragment photoResult on PhotoResult {\n result\n llsid\n expTag\n serverExpTag\n pcursor\n feeds {\n ...feedContent\n __typename\n }\n webPageArea\n __typename\n}\n\nquery brilliantTypeDataQuery($pcursor: String, $hotChannelId: String, $page: String, $webPageArea: String) {\n brilliantTypeData(pcursor: $pcursor, hotChannelId: $hotChannelId, page: $page, webPageArea: $webPageArea) {\n ...photoResult\n __typename\n }\n}\n"}
def get_data2():
res = requests.post(URL, headers=headers, json=payload)
res.encoding="utf-8"
m_json = res.json() #字典格式
print(m_json)
if __name__ == "__main__":
get_data2()
def get_data2():
res = requests.post(URL, headers=headers, json=payload)
res.encoding="utf-8"
m_json = res.json() #字典格式
#----------筛选信息------------#
feeds_list = m_json["data"]["brilliantTypeData"]["feeds"]
for feeds in feeds_list:
Userid = feeds["author"]["id"]
Username = feeds["author"]["name"]
print("%s-----------%s"%(Userid,Username))
print(m_json)
发现:我把上面的代码运行了两次,得到不同的结果
if __name__ == "__main__":
for i in range(10):
get_data2()
思考:如果像我之前想得那样,这些数据存储到数据库中,我的腾讯云学生服务器会不会装不下。
数据库:有点不明白用户与关注主播,点赞视频之间的主键关系,去看看书再来继续django
看来之前model相关的知识,知道主键怎么用。现在有个问题用户界面的相片没有处理。
# -*- coding: utf-8 -*-
#请求mp4地址
import requests
import json
URL = "https://live.kuaishou.com/m_graphql"
headers = {
"accept":"*/*",
"Content-Length":"<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
"Cookie": r'clientid=3; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; kpn=GAME_ZONE; userId=427400950; kuaishou.live.bfb1s=7206d814e5c089a58c910ed8bf52ace5; userId=427400950; kuaishou.live.web_st=ChRrdWFpc2hvdS5saXZlLndlYi5zdBKgAYm9VZdJaOIjsJDqPoO-yLNw4ZuZul234nekkYMdMsNjIq-i5skiOlVnLhFSPv5PTbrQ45yitiFEkQMGUCDxpsbRcsDpHI0CDZfflQeD9Z14cuQ8x2YJORv-1Pz8JM4-_qmBhAxjVHJ8OSs4kMHRKpCvZja6UUYbXLunFhKT5fyhx1HViPCmuVjBcsSxZEtEpvponSa3DjtkZU2KQ3M9pUoaEm-zwBmcbUA4lm5ejQnh9kVjySIgjJsh3xaj6ckXgLNLF3iPjKs6sC7d1lWqH0SZbWeHTREoBTAB; kuaishou.live.web_ph=ed6156f0bc66780438d593dfc3b3f8fa6f63',
"Host": "live.kuaishou.com",
"Origin": "https://live.kuaishou.com",
"Referer": "https://live.kuaishou.com/profile/JTYYA13-",
"User-Agent": "PostmanRuntime/7.26.8"
}
payload = {"operationName":"publicFeedsQuery","variables":{"principalId":"JTYYA13-","pcursor":"1.602058185281E12","count":24},"query":"query publicFeedsQuery($principalId: String, $pcursor: String, $count: Int) {\n publicFeeds(principalId: $principalId, pcursor: $pcursor, count: $count) {\n pcursor\n live {\n user {\n id\n avatar\n name\n __typename\n }\n watchingCount\n poster\n coverUrl\n caption\n id\n playUrls {\n quality\n url\n __typename\n }\n quality\n gameInfo {\n category\n name\n pubgSurvival\n type\n kingHero\n __typename\n }\n hasRedPack\n liveGuess\n expTag\n __typename\n }\n list {\n id\n thumbnailUrl\n poster\n workType\n type\n useVideoPlayer\n imgUrls\n imgSizes\n magicFace\n musicName\n caption\n location\n liked\n onlyFollowerCanComment\n relativeHeight\n timestamp\n width\n height\n counts {\n displayView\n displayLike\n displayComment\n __typename\n }\n user {\n id\n eid\n name\n avatar\n __typename\n }\n expTag\n isSpherical\n __typename\n }\n __typename\n }\n}\n"}
def get_data():
res = requests.post(URL, headers=headers, json=payload)
res.encoding="utf-8"
m_json = res.json() #字典格式
print(m_json)
get_data()
字段 | 内容 | 备注 |
---|---|---|
animateCoverUrl | 视频预览的动画 | 0 |
caption | 视频文案 | 1 |
coversUrl | 封面图片地址 | 1 |
viedoID | 视频id | 1 |
likeCount | 简略点赞数 | 1 |
photoUrl | 视频地址,默认第一个cdn | 1 |
realLikeCount | 详细点赞数 | 1 |
import requests
import json
class userdetailSpider():
URL = "https://video.kuaishou.com/graphql"
# header中需要更改cookie和Referer
headers = {
"accept": "*/*",
"Content-Length": "<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
# 我添加的时间属性Max-Age=8640000
"Cookie": r'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207',
"Host": "video.kuaishou.com",
"Origin": "https://video.kuaishou.com",
"Referer": "https://video.kuaishou.com/profile/3xcidpetejrcagy",
"User-Agent": "PostmanRuntime/7.26.8"
}
# 这里的userID也要更改
payload = {"operationName": "visionProfilePhotoList",
"variables": {"userId": "3xcidpetejrcagy", "pcursor": "", "page": "profile"},
"query": "query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n webPageArea\n feeds {\n type\n author {\n id\n name\n following\n headerUrl\n headerUrls {\n cdn\n url\n __typename\n }\n __typename\n }\n tags {\n type\n name\n __typename\n }\n photo {\n id\n duration\n caption\n likeCount\n realLikeCount\n coverUrl\n coverUrls {\n cdn\n url\n __typename\n }\n photoUrls {\n cdn\n url\n __typename\n }\n photoUrl\n liked\n timestamp\n expTag\n animatedCoverUrl\n __typename\n }\n canAddComment\n currentPcursor\n llsid\n status\n __typename\n }\n hostName\n pcursor\n __typename\n }\n}\n"}
def __init__(self,userID,myCookie):
userdetailSpider.headers["Referer"] = "https://video.kuaishou.com/profile/"+userID
userdetailSpider.payload["variables"]["userId"] = userID
userdetailSpider.headers["Cookie"] = myCookie
def get_data(self):
#--------------请求页面--------------#
try:
res = requests.post(userdetailSpider.URL, headers=userdetailSpider.headers, json=userdetailSpider.payload)
res.encoding = "utf-8"
m_json = res.json() # 字典格式
#-------------诗筛选数据-------------#
#*******************************************************************#
# 这个result参数判断请求是否正确,如果不是1请求失败,后面继续执行会报错,程序结束
# print(m_json["data"]["visionProfilePhotoList"]["result"])
feeds_list = m_json["data"]["visionProfilePhotoList"]["feeds"]
# 获取pcursor并且填写到下一次的header中
pcursor = m_json["data"]["visionProfilePhotoList"]["pcursor"]
userdetailSpider.payload["variables"]["pcursor"] = pcursor
#-------------具体提取数据----------#写到这里想起了,我应该是通过live获取视频信息
result = {} #信息存储在字典中
for feeds in feeds_list:
result["caption"] = feeds["photo"]["caption"]
result["coversUrl"] = feeds["photo"]["coverUrl"]
result["videoID"] = feeds["photo"]["id"]
result["videoPath"] = feeds["photo"]["photoUrl"]
result["likeCount"] = feeds["photo"]["likeCount"]
result["realLikeCount"] = feeds["photo"]["realLikeCount"]
print(result)
#-----------待会再这里编写存储到数据库的函数--------------
#print(m_json)
#print(feeds_list)
print(pcursor)
if pcursor == "no_more":
return 0
except:
print("页面请求错误,请检查cookie是否过期,id是否正确")
def start_spider(self):
while(1):
temp = userdetailSpider.get_data(self)
if temp == 0:
break
if __name__ == "__main__":
theCookie = "kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207"
theUserID = "3xkm67762d5fwzc"
test = userdetailSpider(theUserID,theCookie)
test.start_spider()
我擦,我写着写着发现里面没有视频地址,没有发现联系。
https://txmov2.a.yximgs.com/bs2/newWatermark/Mzc5MDA0MzYwMjQ_zh_4.mp4
userID:3xcidpetejrcagy,6个相册,60个视频,一共66个做作品,是对头的。(我全都要)
# 输出快手ID和cookie
class userdetailLiveSpider():
URL = "https://live.kuaishou.com/m_graphql"
headers = {
"accept": "*/*",
"Content-Length": "<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
# 我添加的时间属性Max-Age=8640000
"Cookie": r'clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; kpn=GAME_ZONE; userId=427400950; kuaishou.live.bfb1s=7206d814e5c089a58c910ed8bf52ace5; userId=427400950; kuaishou.live.web_st=ChRrdWFpc2hvdS5saXZlLndlYi5zdBKgAYm9VZdJaOIjsJDqPoO-yLNw4ZuZul234nekkYMdMsNjIq-i5skiOlVnLhFSPv5PTbrQ45yitiFEkQMGUCDxpsbRcsDpHI0CDZfflQeD9Z14cuQ8x2YJORv-1Pz8JM4-_qmBhAxjVHJ8OSs4kMHRKpCvZja6UUYbXLunFhKT5fyhx1HViPCmuVjBcsSxZEtEpvponSa3DjtkZU2KQ3M9pUoaEm-zwBmcbUA4lm5ejQnh9kVjySIgjJsh3xaj6ckXgLNLF3iPjKs6sC7d1lWqH0SZbWeHTREoBTAB; kuaishou.live.web_ph=ed6156f0bc66780438d593dfc3b3f8fa6f63',
"Host": "live.kuaishou.com",
"Origin": "https://live.kuaishou.com",
"Referer": "https://live.kuaishou.com/profile/LY7452065",
"User-Agent": "PostmanRuntime/7.26.8"
}
payload = {"operationName": "publicFeedsQuery",
"variables": {"principalId": "JTYYA13-", "pcursor": "", "count": 24},
"query": "query publicFeedsQuery($principalId: String, $pcursor: String, $count: Int) {\n publicFeeds(principalId: $principalId, pcursor: $pcursor, count: $count) {\n pcursor\n live {\n user {\n id\n avatar\n name\n __typename\n }\n watchingCount\n poster\n coverUrl\n caption\n id\n playUrls {\n quality\n url\n __typename\n }\n quality\n gameInfo {\n category\n name\n pubgSurvival\n type\n kingHero\n __typename\n }\n hasRedPack\n liveGuess\n expTag\n __typename\n }\n list {\n id\n thumbnailUrl\n poster\n workType\n type\n useVideoPlayer\n imgUrls\n imgSizes\n magicFace\n musicName\n caption\n location\n liked\n onlyFollowerCanComment\n relativeHeight\n timestamp\n width\n height\n counts {\n displayView\n displayLike\n displayComment\n __typename\n }\n user {\n id\n eid\n name\n avatar\n __typename\n }\n expTag\n isSpherical\n __typename\n }\n __typename\n }\n}\n"}
def __init__(self,userId,myCookie):
self.headers["Referer"] = "https://live.kuaishou.com/profile/"+userId
self.payload["variables"]["principalId"] = userId
self.headers["Cookie"] = myCookie
def get_data(self):
try:
res = requests.post(self.URL, headers=self.headers, json=self.payload)
res.encoding = "utf-8"
m_json = res.json() # 字典格式
feeds_list = m_json["data"]["publicFeeds"]["list"]
pcursor = m_json["data"]["publicFeeds"]["pcursor"]
self.payload["variables"]["pcursor"] = pcursor
# print(m_json)
#-------------筛选数据---------------#
result = {}
for feeds in feeds_list:
result["caption"] = feeds["caption"]
# 播放量,点赞数,评论数
result["displayView"] = feeds["counts"]["displayView"]
result["displayLike"] = feeds["counts"]["displayLike"]
result["displayComment"] = feeds["counts"]["displayComment"]
# 相册,可能为空,可能为列表
result["imgUrls"] = feeds["imgUrls"]
result["liveID"] = feeds["id"]
print(result)
except:
print("页面请求错误,请检查cookie是否过期,id是否正确")
if __name__ == "__main__":
theCookie = "kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207"
theUserID = "3xkm67762d5fwzc"
test = userdetailLiveSpider(theUserID,theCookie)
test.get_data()
整合过程中,如果live中的id存在video的视频id,怎说明整合,如果没有就是相册,则添加。
好像可以在model中添加执行动作的函数。
添加视频信息时,要通过用户id主键接口添加
ksID时一个很重要的信息,如果有需要再添加把。
星座和地址也存着把
建表的时候添加一个字段,来表示一些状态。
https://www.cnblogs.com/shenjianping/p/11526538.html,这篇博客上面用到了多对多关系,使用
authors=models.ManyToManyField("Author")
今天把django相关的书看了一下,之前表之间一对多,多对多上面有写道,可能自己之前没注意。然后,感觉对django了解得差不多了。看了一下面试的题目,结果几乎不会,至少是一般书上是没有的。
今天本来按照思路写很顺利的,但开始就出现了问题:我就算是扫码登录后查看live开头的主播页面,看不到作品,然后跑userdetailLiveSpider对象不成功,像是网络问题。
好吧,网页上可以访问了,看看我的这个对象有什么问题,上面一个对象运行是成功的。
目前估计是cookie问题,live开头的cookie有效时间比较短,我尝试更换cookie。
不是cookie问题,我再去官网刷新没有数据,先不管了,写获取主播信息的逻辑
请求头的信息相同,不同的就是payload,还是单独写个类
{'data': {'sensitiveUserInfo': {'kwaiId': 'synsyn520521', 'originUserId': '943759388', 'constellation': '双子座', 'cityName': '山东 济宁市', 'counts': {'fan': '115.8w', 'follow': '151', 'photo': '66', 'liked': None, 'open': 66, 'playback': 0, 'private': None, '__typename': 'CountInfo'}, '__typename': 'User'}}}
class ksLiveSpider():
URL = "https://live.kuaishou.com/m_graphql"
headers = {
"accept": "*/*",
"Content-Length": "<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
# 我添加的时间属性Max-Age=8640000
"Cookie": r'clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; kpn=GAME_ZONE; userId=427400950; kuaishou.live.bfb1s=7206d814e5c089a58c910ed8bf52ace5; userId=427400950; kuaishou.live.web_st=ChRrdWFpc2hvdS5saXZlLndlYi5zdBKgAYm9VZdJaOIjsJDqPoO-yLNw4ZuZul234nekkYMdMsNjIq-i5skiOlVnLhFSPv5PTbrQ45yitiFEkQMGUCDxpsbRcsDpHI0CDZfflQeD9Z14cuQ8x2YJORv-1Pz8JM4-_qmBhAxjVHJ8OSs4kMHRKpCvZja6UUYbXLunFhKT5fyhx1HViPCmuVjBcsSxZEtEpvponSa3DjtkZU2KQ3M9pUoaEm-zwBmcbUA4lm5ejQnh9kVjySIgjJsh3xaj6ckXgLNLF3iPjKs6sC7d1lWqH0SZbWeHTREoBTAB; kuaishou.live.web_ph=ed6156f0bc66780438d593dfc3b3f8fa6f63',
"Host": "live.kuaishou.com",
"Origin": "https://live.kuaishou.com",
"Referer": "https://live.kuaishou.com/profile/LY7452065",
"User-Agent": "PostmanRuntime/7.26.8"
}
payload = {"operationName":"sensitiveUserInfoQuery","variables":{"principalId":"3xkm67762d5fwzc"},"query":"query sensitiveUserInfoQuery($principalId: String) {\n sensitiveUserInfo(principalId: $principalId) {\n kwaiId\n originUserId\n constellation\n cityName\n counts {\n fan\n follow\n photo\n liked\n open\n playback\n private\n __typename\n }\n __typename\n }\n}\n"}
def __init__(self,userId,myCookie):
self.headers["Referer"] = "https://live.kuaishou.com/profile/"+userId
self.payload["variables"]["principalId"] = userId
self.headers["Cookie"] = myCookie
def get_data(self):
try:
res = requests.post(self.URL, headers=self.headers, json=self.payload)
res.encoding = "utf-8"
m_json = res.json() # 字典格式
print(m_json)
result = {}
#---------提取有用数据--------#
result["ksId"] = m_json["data"]["sensitiveUserInfo"]["kwaiId"]
result["xinzuo"] = m_json["data"]["sensitiveUserInfo"]["constellation"]
result["cityName"] = m_json["data"]["sensitiveUserInfo"]["cityName"]
result["fan"] = m_json["data"]["sensitiveUserInfo"]["counts"]["fan"]
result["follow"] = m_json["data"]["sensitiveUserInfo"]["counts"]["follow"]
# 作品数
result["photo"] = m_json["data"]["sensitiveUserInfo"]["counts"]["photo"]
print(result)
except:
print("页面请求错误,请检查cookie是否过期,id是否正确")
if __name__ == "__main__":
theCookie = "kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207"
theUserID = "3xkm67762d5fwzc"
ksCookie = "clientid=3; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; kpn=GAME_ZONE; userId=427400950; userId=427400950; kuaishou.live.bfb1s=ac5f27b3b62895859c4c1622f49856a4; kuaishou.live.web_st=ChRrdWFpc2hvdS5saXZlLndlYi5zdBKgAfwzFw_Kb2uHnKBQgQQ9-nhGuO2rbpCerVYO54A3KmQUQ6JOiQO-mLFbcwABZ9A-Fl2X5WxQ9yuXHLsMV-RsuZygWUnugryt27cp6rgKzgLI7y6ar8R1RdP6CUPp1JTjbgZ6uzAdhQdayNbiM-isllV5Yyj9bb4IK_LPqzxYDjf_uy0QRa_YxWiMtTUPQd8CFinqBXb7gj-o9HNOZG_v1y0aEk2hY_LIikBot7IUVtJ3ydB6KCIgmvgxlD_4Ac99qgHpdvBfsxGugwTfosyEsfq-BaaFMG0oBTAB; kuaishou.live.web_ph=ae0615d67633a6c0debe8d4668be19e1d446"
test = ksLiveSpider(theUserID,ksCookie)
test.get_data()
class UserTitle(models.Model):
userID = models.CharField(max_length=256,unique=True,verbose_name="用户id")
userName = models.CharField(max_length=256,verbose_name="用户名")
createTime = models.DateTimeField(default=datetime.now,verbose_name="创建时间")
想起了头像地址字段,需要再viedo页面获取
{'data': {'visionProfile': {'result': 1, 'hostName': 'webservice-bjxy-rs9150.idcyz.hb1.kwaidc.com', 'userProfile': {'ownerCount': {'fan': '128.9w', 'photo': None, 'follow': 438, 'photo_public': 68, '__typename': 'VisionUserProfileOwnerCount'}, 'profile': {'gender': 'F', 'user_name': '南希阿-', 'user_id': '3xcidpetejrcagy', 'headurl': 'https://tx2.a.yximgs.com/uhead/AB/2020/08/17/09/BMjAyMDA4MTcwOTM2MDNfMjQ0NzAyMDZfMV9oZDM4Nl8xODU=_s.jpg', 'user_text': '谢谢你在世界的角落里找到我', 'user_profile_bg_url': '//s2-10623.kwimgs.com/kos/nlav10623/vision_images/profile_background.5bc08b1bf4fba1f4.svg', '__typename': 'VisionUserProfileUser'}, 'isFollowing': True, '__typename': 'VisionUserProfile'}, '__typename': 'VisionProfileResult'}}}
# 获取video页面的主播信息
class ksVideoSpider():
URL = "https://video.kuaishou.com/graphql"
# header中需要更改cookie和Referer
headers = {
"accept": "*/*",
"Content-Length": "<calculated when request is sent>",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"content-type": "application/json",
# 我添加的时间属性Max-Age=8640000
"Cookie": r'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207',
"Host": "video.kuaishou.com",
"Origin": "https://video.kuaishou.com",
"Referer": "https://video.kuaishou.com/profile/3xcidpetejrcagy",
"User-Agent": "PostmanRuntime/7.26.8"
}
# 这里的userID也要更改
payload = {"operationName":"visionProfile","variables":{"userId":"3xcidpetejrcagy"},"query":"query visionProfile($userId: String) {\n visionProfile(userId: $userId) {\n result\n hostName\n userProfile {\n ownerCount {\n fan\n photo\n follow\n photo_public\n __typename\n }\n profile {\n gender\n user_name\n user_id\n headurl\n user_text\n user_profile_bg_url\n __typename\n }\n isFollowing\n __typename\n }\n __typename\n }\n}\n"}
def __init__(self, userID, myCookie):
userdetailSpider.headers["Referer"] = "https://video.kuaishou.com/profile/"+userID
userdetailSpider.payload["variables"]["userId"] = userID
userdetailSpider.headers["Cookie"] = myCookie
def get_data(self):
try:
res = requests.post(self.URL, headers=self.headers, json=self.payload)
res.encoding = "utf-8"
m_json = res.json() # 字典格式
print(m_json)
result = {}
#---------提取有用数据--------#
result["user_text"] = m_json["data"]["visionProfile"]["userProfile"]["profile"]["user_text"]
result["gender"] = m_json["data"]["visionProfile"]["userProfile"]["profile"]["gender"]
result["userImg"] = m_json["data"]["visionProfile"]["userProfile"]["profile"]["headurl"]
print(result)
except:
print("页面请求错误,请检查cookie是否过期,id是否正确")
if __name__ == "__main__":
theCookie = "kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; clientid=3;Max-Age=8640000; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; userId=427400950; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABQrFWsr52Mhp5GfcmignSLoddGbbCBCTAkyedrcLkHqxI9IIdilOuxFUWwhS41WnVKwFJ0Win96_M-frAXGNXXDx78d0FjGOylLgeVtcXUGsIkgyxVkopf2IR_Pvps61IaXw1XTHZOdTrwQkDIdwESPDssQTuW9XNIfjJK9e88ZgJYNJI5bK5n38Zm37kl8omE8R8E8ZhL87TgGpaRZq3XRoSTdCMiCqspRXB3AhuFugv61B-IiBO8gZCTy1dvCTjyGg0IEN6MrmkUACDgSB3T2BYkkBQ-SgFMAE; kuaishou.server.web_ph=dfcba445b9b7f619411fdced6b1e61d6f207"
theUserID = "3xkm67762d5fwzc"
ksCookie = "clientid=3; did=web_ec874916e390b9741609686125a0452e; didv=1613879531823; client_key=65890b29; kpn=GAME_ZONE; userId=427400950; userId=427400950; kuaishou.live.bfb1s=ac5f27b3b62895859c4c1622f49856a4; kuaishou.live.web_st=ChRrdWFpc2hvdS5saXZlLndlYi5zdBKgAfwzFw_Kb2uHnKBQgQQ9-nhGuO2rbpCerVYO54A3KmQUQ6JOiQO-mLFbcwABZ9A-Fl2X5WxQ9yuXHLsMV-RsuZygWUnugryt27cp6rgKzgLI7y6ar8R1RdP6CUPp1JTjbgZ6uzAdhQdayNbiM-isllV5Yyj9bb4IK_LPqzxYDjf_uy0QRa_YxWiMtTUPQd8CFinqBXb7gj-o9HNOZG_v1y0aEk2hY_LIikBot7IUVtJ3ydB6KCIgmvgxlD_4Ac99qgHpdvBfsxGugwTfosyEsfq-BaaFMG0oBTAB; kuaishou.live.web_ph=ae0615d67633a6c0debe8d4668be19e1d446"
test = ksVideoSpider(theUserID,ksCookie)
test.get_data()
class UserTitle(models.Model):
#女为F,男为M
GENDER = [
(0,"未知"),
(1,"男"),
(2,"女")
]
STATE = [
(0,"初次爬取"),
(1,"测试")
]
USERIMG = "https://tx2.a.yximgs.com/uhead/AB/2020/08/17/09/BMjAyMDA4MTcwOTM2MDNfMjQ0NzAyMDZfMV9oZDM4Nl8xODU=_s.jpg"
userID = models.CharField(max_length=256,unique=True,verbose_name="用户id")
userName = models.CharField(max_length=256,verbose_name="用户名")
createTime = models.DateTimeField(default=datetime.now,verbose_name="创建时间")
stateUser = models.IntegerField(choices=STATE,verbose_name="用户信息状态",default=0)
ksID = models.CharField(max_length=128,verbose_name="快手id",default="xxxxxxxxxxxxxx")
user_text = models.CharField(max_length=2560,verbose_name="用户简述",default="xxxxxxxxxxxxx")
gender = models.IntegerField(choices=GENDER,verbose_name="性别",default=0)
fan = models.CharField(max_length=32,verbose_name="粉丝数",default="-1")
xinzuo = models.CharField(max_length=32,verbose_name="星座",default="未知")
cityName = models.CharField(max_length=32,verbose_name="地址",default="未知")
follow = models.CharField(max_length=32,verbose_name="关注的数量",default="-1")
photo = models.CharField(max_length=32,verbose_name="作品数量",default="-1")
userImg = models.CharField(max_length=256,verbose_name="图片地址",default=USERIMG)
def __str__(self):
return self.userName
class Mate:
verbose_name = verbose_name_plural = "用户ID和名字"
刚开始把ksID设置了uinque=True,结果migrate的时候报错,改回来了makemigrations再migrate还是保存,我把app01中的3.12创建的几个文件删除了,再执行命令一遍就成功了。要是以前遇到这样的问题,可能是删除数据库了。
from django.contrib import admin
# Register your models here.
from .models import UserTitle
class UserTitleAdmin(admin.ModelAdmin):
# 显示的字段
list_display = ["userName","stateUser"]
# 过滤器
list_filter = ["stateUser"]
# 搜索器
search_fields = ["userName"]
# 分页
list_per_page = 50
# 执行的动作需要这两个参数,第二个为.query.QuerySet对象,就是选中的数据,通过for循环,通过.调用属性
def mytest(self,request,queryset):
for qu in queryset:
print(qu.userName)
print(request,type(queryset))
mytest.short_description = "测试"
actions = [mytest,]
# Action选项都是在页面上方显示
actions_on_top = True
# Action选项都是在页面下方显示
actions_on_bottom = False
# 是否显示选择个数
actions_selection_counter = True
admin.site.register(UserTitle,UserTitleAdmin)
{'ksId': 'synsyn520521', 'xinzuo': '双子座', 'cityName': '山东 济宁市', 'fan': '115.9w', 'follow': '151', 'photo': '65'}
{'user_text': '谢谢你在世界的角落里找到我', 'gender': 'F', 'userImg': 'https://tx2.a.yximgs.com/uhead/AB/2020/08/17/09/BMjAyMDA4MTcwOTM2MDNfMjQ0NzAyMDZfMV9oZDM4Nl8xODU=_s.jpg'}
state | 描述 |
---|---|
0 | 初次爬取,只有username和userid |
1 | ksvideo |
2 | kslive |
3 | ksvideo+kslive |
目前状态相关的逻辑不完美,比如为3执行ksvideo就变成了状态1,代表还要进行kslive,但是kslive字段已经添加了,使用时注意没必要给状态3执行动作。
# 执行的动作需要这两个参数,第二个为.query.QuerySet对象,就是选中的数据,通过for循环,通过.调用属性
def myksVideo(self,request,queryset):
cData = currentData()
for qu in queryset:
ksVideo = ksVideoSpider(qu.userID,cData.ksCookie)
result = ksVideo.get_data()
#-----填写数据-------------#
qu.user_text = result["user_text"]
if result["gender"] == "F":
qu.gender = 2
elif result["gender"] == "M":
qu.gender = 1
else:
qu.gender = 0
qu.userImg = result["userImg"]
#---------完成----------#
if qu.stateUser == 2:
qu.stateUser = 3
else:
qu.stateUser = 1
# print(result)
#print(request,type(queryset))
myksVideo.short_description = "添加ksVideo字段"
def myksLive(self,request,queryset):
cData = currentData()
for qu in queryset:
ksLive = ksLiveSpider(qu.userID,cData.ksCookie)
result = ksLive.get_data()
#-----填写数据-------------#
qu.ksID = result["ksID"]
qu.xinzuo = result["xinzuo"]
qu.cityName = result["cityName"]
qu.fan = result["fan"]
qu.follow = result["follow"]
qu.photo = result["photo"]
#---------完成----------#
if qu.stateUser == 1:
qu.stateUser = 3
else:
qu.stateUser = 2
# print(result)
#print(request,type(queryset))
myksLive.short_description = "添加ksLive字段"
UserTitle.objects.filter(userID=qu.userID).update(user_text=qu.user_text,gender=qu.gender,userImg=qu.userImg,stateUser=qu.stateUser)
UserTitle.objects.filter(userID=qu.userID).update(ksID=qu.ksID,
xinzuo=qu.xinzuo,
cityName=qu.cityName,
fan=qu.fan,
follow=qu.follow,
photo=qu.photo,
stateUser=qu.stateUser)
if qu.ksID == None:
qu.ksID = "无法获取ksID"
UserTitle.objects.filter(userID=qu.userID).update(ksID=qu.ksID,
xinzuo=qu.xinzuo,
cityName=qu.cityName,
fan=qu.fan,
follow=qu.follow,
photo=qu.photo,
stateUser=qu.stateUser)
6 今天就到这了,明天和室友一起去景点旅游。
明天把功能录一个演示视频。
实现后端功能,是通过按钮触发js的函数,然后想后端的接口post实现相应的功能。这是再《跟老齐学django》上看到的。
今天下雨了,不出去玩了
userdetailSpider(theUserID,theCookie)
,刚开始是好的,然后就返回None
if m_json["data"]["visionProfilePhotoList"]["result"] == 1:
print("请求成功,开始筛选数据")
else:
print("请求数据失败,无法筛选,程序终止")
return -1
{'caption': '锦上添花我不需要 雪中送炭你做不到', 'coversUrl': 'https://tx2.a.yximgs.com/upic/2018/03/24/19/BMjAxODAzMjQxOTIyNDVfMTA1MjM4MjNfNTYwMzAxNTQwNF8xXzM=_B73ece8a2ba15635894bd1d22c88ab2ab.jpg?tag=1-1615596637-xpcwebprofile-0-c2teex2jjk-fe86cc6122e6e5cc&clientCacheKey=3xp87jw5zmeue69.jpg&di=75960068&bp=14734', 'videoID': '3xp87jw5zmeue69', 'videoPath': 'https://txmov2.a.yximgs.com/upic/2018/03/24/19/BMjAxODAzMjQxOTIyNDVfMTA1MjM4MjNfNTYwMzAxNTQwNF8xXzM=_b_B4e460c2dedc40be078e7a315389327f8.mp4?tag=1-1615596637-xpcwebprofile-0-nc4gujqkgj-65ff48b3ed0c2822&clientCacheKey=3xp87jw5zmeue69_b.mp4&tt=b&di=75960068&bp=14734', 'likeCount': '30', 'realLikeCount': 30, 'animatedCoverUrl': None}
class UserVideo(models.Model):
STATE = [
(1,"默认ksVideo"),
(2,"ksLive"),
(3,"ksVideo+ksLive")
]
# 当被参照删除时,自己也被删除
theUser = models.ForeignKey(UserTitle,on_delete=models.CASCADE)
videoID = models.CharField(max_length=128,default="xxxxxxxxxxxxxx",verbose_name="视频id")
caption = models.CharField(max_length=512,default="暂无",verbose_name="视频描述")
coversUrl = models.CharField(max_length=512,default="xxxxxxxxxxx",verbose_name="视频封面")
videoPath = models.CharField(max_length=512,default="xxxxxxxxxxxxx",verbose_name="视频地址")
realLikeCount = models.CharField(max_length=64,default="xxxxxxxxxxx",verbose_name="具体点赞数量")
animatedCoverUrl = models.CharField(max_length=512,default="xxxxxxxx",verbose_name="封面动画")
stateVideo = models.IntegerField(choices=STATE,default=1,verbose_name="状态")
displayView = models.CharField(max_length=64,default="-1",verbose_name="播放量")
displayComment = models.CharField(max_length=64,default="-1",verbose_name="评论数")
for result in results:
UserTitle.objects.get(userId = qu.userID).theUser.objects.create(videoID = result["videoID"],
caption = result["caption"],
coversUrl = result["coversUrl"],
videoPath = result["videoPath"],
realLikeCount = result["realLikeCount"],
animatedCoverUrl=result["animatedCoverUrl"],
)
animatedCoverUrl
字段报错,添加一个判断就可以了 if result["animatedCoverUrl"] == None:
result["animatedCoverUrl"] = "一直没有"
def myvideoMP4(self,request,queryset):
cData = currentData()
for qu in queryset:
thevideo = userdetailSpider(qu.userID,cData.theCookie)
thevideo.start_spider()
results = thevideo.endResult
ttUser = UserTitle.objects.get(userID=qu.userID)
for result in results:
if result["animatedCoverUrl"] == None:
result["animatedCoverUrl"] = "一直没有"
print(result["videoID"])
time.sleep(1)
temp = UserVideo.objects.create(videoID = result["videoID"],
caption = result["caption"],
coversUrl = result["coversUrl"],
videoPath = result["videoPath"],
realLikeCount = result["realLikeCount"],
animatedCoverUrl=result["animatedCoverUrl"],
theUser = ttUser)
temp.save()
del temp
ttUser.stateUser = 4
ttUser.save()
del ttUser
对象输出的数据是重复的,是爬虫的问题
#-------------具体提取数据----------#写到这里想起了,我应该是通过live获取视频信息
result = {} #信息存储在字典中
for feeds in feeds_list:
result["caption"] = feeds["photo"]["caption"]
result["coversUrl"] = feeds["photo"]["coverUrl"]
result["videoID"] = feeds["photo"]["id"]
result["videoPath"] = feeds["photo"]["photoUrl"]
result["likeCount"] = feeds["photo"]["likeCount"]
result["realLikeCount"] = feeds["photo"]["realLikeCount"]
result["animatedCoverUrl"] = feeds["photo"]["animatedCoverUrl"]
self.endResult.append(result)
print(result)
#-----------待会再这里编写存储到数据库的函数--------------
#-------------具体提取数据----------#写到这里想起了,我应该是通过live获取视频信息
# result = {} #信息存储在字典中
for feeds in feeds_list:
result = {}
result["caption"] = feeds["photo"]["caption"]
result["coversUrl"] = feeds["photo"]["coverUrl"]
result["videoID"] = feeds["photo"]["id"]
result["videoPath"] = feeds["photo"]["photoUrl"]
result["likeCount"] = feeds["photo"]["likeCount"]
result["realLikeCount"] = feeds["photo"]["realLikeCount"]
result["animatedCoverUrl"] = feeds["photo"]["animatedCoverUrl"]
self.endResult.append(result)
print(result)
del result
#-----------待会再这里编写存储到数据库的函数--------------
本来应该去做后台统计和获取热门页面的userid和name的,但是有些不甘心,想把获取相册的对象写完整,这样就可以产看是否否获取到了全部作品了。
{'caption': '“留下来 或者我跟你走”', 'displayView': '903.6w', 'displayLike': '48.1w', 'displayComment': '1.3w', 'imgUrls': [], 'liveID': '3xb7betx499z9um'}
{'caption': '不知不觉又长大了一岁,生日快乐🎂', 'displayView': '31w', 'displayLike': '1.1w', 'displayComment': '1949', 'imgUrls': ['http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_0.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_1.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_2.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_3.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_4.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_5.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_6.webp', 'http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzc1MTAzNjU0ODA=_7.webp'], 'liveID': '3xmm5g93pqgd8tc'}
忘记了播放量是个很重要的数据,还是要重新遍历一遍,然后添加字段
django.db.utils.OperationalError: table "app01_userphoto" already exists
def get_available_image_extensions():
try:
from PIL import Image
except ImportError:
return []
else:
Image.init()
return [ext.lower()[1:] for ext in Image.EXTENSION]
def validate_image_file_extension(value):
return FileExtensionValidator(allowed_extensions=get_available_image_extensions())(value)
class UserPhoto(models.Model):
photoID = models.CharField(max_length=128,verbose_name="相册id",default="xxxxxxxx")
caption = models.CharField(max_length=512,verbose_name="相册描述",default="暂无")
displayView = models.CharField(max_length=32,verbose_name="播放量",default="-1")
displayLike = models.CharField(max_length=32,verbose_name="点赞数",default="-1")
displayComment = models.CharField(max_length=32,verbose_name="评论数",default="-1")
imgUrls = models.CharField(max_length=5000,default=" ")
def __str__(self):
# print(self.videoID)
return self.photoID
class Mate:
verbose_name = verbose_name_plural = "相册信息"
django.db.utils.OperationalError: no such column: app01_userphoto.theUser_id
差点真的删除或者迁移数据库,解决办法是添加字段,但这个很奇怪,不知道怎么添加。然后我注释主键字段,访问成功。原来里面还有条数据,估计就是这条数据没有主键导致报错。
还是报这个错误。
解决了,我之前这个字段是theUser,和上面的是相同的,改一下名字就好了。
thephotoUser = models.ForeignKey(UserTitle,on_delete=models.CASCADE)
http://tx2.a.yximgs.com/ufile/atlas/9e53e8ba157f445c88009a4dce85fe16_0.webphttp://tx2.a.yximgs.com/ufile/atlas/9e53e8ba157f445c88009a4dce85fe16_1.webphttp://tx2.a.yximgs.com/ufile/atlas/9e53e8ba157f445c88009a4dce85fe16_2.webphttp://tx2.a.yximgs.com/ufile/atlas/9e53e8ba157f445c88009a4dce85fe16_3.webphttp://tx2.a.yximgs.com/ufile/atlas/9e53e8ba157f445c88009a4dce85fe16_4.webp
photoUrl = ','.join(result["imgUrls"])
http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_0.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_1.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_2.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_3.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_4.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_5.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_6.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_7.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_8.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_9.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_10.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_11.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_12.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_13.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_14.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_15.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_16.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_17.webp,http://tx2.a.yximgs.com/ufile/atlas/MTI3MjAyMjYyXzE2NjAyMDc0NjI3XzE1NjYyODAxNTQ5MTc=_18.webp
由于django在调试模式下不能加载静态文件,导入静态文件不能一般导入,可以使用外部链接。
STATICFILES_DIRS = [os.path.join(BASE_DIR,"static"),
]
STATIC_URL = '/static/'
{% load static %}
{% block title %}管理页面{% endblock title %}
{% block style %}<link rel="stylesheet" type="text/css" href="{% static 'css/showAdmin.css' %}">{% endblock %}
allCover = UserVideo.objects.values("coversUrl")
参考博客:https://blog.csdn.net/weixin_33893473/article/details/86278284
遇到了有的视频封面大小不一样的问题
观察了一下,live界面的视频左右两边会有一些空白,vi'deo是刚好填充好,还是按照live页面的样式写吧。
获取图片原始尺寸:https://blog.csdn.net/x550392236/article/details/78723297
再想想,live界面没有展示的视频,肯定是不好按照那种尺寸展示的视频
那么,还是先做主播主页的页面吧。
def showAdmin(request):
allUser = UserTitle.objects.values("userName","userID")
for uID in allUser:
theUsr= UserTitle.objects.get(userID=uID["userID"])
myVideo = UserVideo.objects.filter(theUser=theUsr.id)
print(uID)
print(len(myVideo))
return render(request,'pages/showAdmin.html',{"result":allUser})
这些逻辑后面要使用接口触发,不然一进这个视图就触发一次,太消耗时间。
http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_0.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_1.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_2.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_3.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_4.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_5.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_6.webp,http://tx2.a.yximgs.com/ufile/atlas/NTE5MzQ5NDg0MTc0OTc3NzcxOV8xNjE1NjI4ODcwNjA0_7.webp
执行爬虫脚本失败要终止,数据库不能存储到数据库。设计一个状态表示,不能重复存储数据。脚本还是设置延时参数,可以调节。
我做了进一步的处理,请求到result=0,可能不是爬取太快的问题,其他问题。如果reslut=0,再自己浏览器中多刷新几次,直到出现作品,再执行代码result=1.
爬取不完整就放弃,或者多次爬取相互补充。后面数据更新了,自己也要对爬虫进行跟新。
现在要决定一下工作,简单的就是爬取到数据,最后自己写脚本把需要的数据下载到本地。复杂的是写可持续爬虫的代码,官网跟新不修改代码就可以直接爬取过来。如果要作为找工作的项目,就还需要加入用户注册登录系统,可以点赞,关注。再复杂的是评论,上传视频等,这样一做就要考虑其他问题,如doss攻击,服务器内存等问题。代码逻辑新能问题就已经够呛了。
延时设置为5秒,爬取一波是正常的。3秒也正常。然后设置爬取数据不完整,不存储到数据库。
userdetailLiveSpider()
这个类是获取live界面的视频信息,暂时测试延时可以不要。
$("#openDialog1").dialog({
id: "superDialog", //必填,必须和已有id不同
title: "我的标题", //对话框的标题 默认值: 我的标题
type: 0, //0 对话框有确认按钮和取消按钮 1 对话框只有关闭按钮
easyClose: true, // 点击遮罩层也可以关闭窗口,默认值false
form: [{
description: "用户名",
type: "text",
name: "username",
value: "tom"
}, {
description: "密码",
type: "text",
name: "password",
value: "123456"
}, {
description: "姓名",
type: "text",
name: "name",
value: "tom"
}, {
description: "年龄",
type: "text",
name: "age",
value: "18"
}], //form 是填充表单的数据,必填
submit: function (data) {
//data是表单收集的数据
console.log(data);
$.ajax({url: 'http://127.0.0.1:8000/api/getUserRandom/3/',
type: 'POST',
{#dataType: 'json',#}
data:data,
beforeSend: function (xhr, setting) {
xhr.setRequestHeader("X-CSRFToken", "{{ csrf_token }}")},
success: function (msg) {
console.log(msg)},
})
//这个可自行删去
if (true) {
alert("提交成功\n(你自己可以去掉这个alert)");
//清空表单数据 传递参数=上述指定的id值
clearAllData("superDialog");
}
}
})
xhr.setRequestHeader("X-CSRFToken", "{{ csrf_token }}")},
这一步很重要
def getUserRandom(request,counts):
if request.method == 'POST':
print("-----------")
cDate = currentData()
theGetScript = getUserIDRandom(cDate.theCookie)
theGetScript.get_data()
results = theGetScript.endResult
messgae = json.dumps(results)
# return render(request,results)
myRes = json.dumps({"result":2})
return HttpResponse(myRes)
else:
myRes = json.dumps({"result": 1})
return HttpResponse(myRes)
我把result在函数外定义为全局变量还是不行。result每次的内容不会被清空,对象的endResult会被清空,我就直接把这个作为返回值
from app01 import KSCOOKIE
好吧,我说把kscookie复制到了THECOOKIE
如果video+photo不等于应该作品数量,就需要再live界面获取视频,对比入库。这是可能遇到的问题,先写一下。
我猜测passToken和sts?依次更新了cookie,然后在请求数据。下面第一个graphql是登录用户的个人信息,第二个是第一页面视频列表,下面v?又跟新了依次cookie,然后用这个cookie就可以一直获取视频列表了。
我现在知道cookie是怎么获取的了,不是自己构建,而是根据响应数据中set-Cookie参数设定的。之前健康打卡系统就是没有使用上一次生成的cookie而失败的,不能自己构建。
ok,模拟passtoken得到响应的"kuaishou.server.web.at“的值,作为下一次sts的params,然后sts就可以成功返回需要跟新的cookie三个字段,再和cookie固定的值凭借,就可以生成一个有效的cookie爬取一个主播的用户全部视频信息。
ok,下次就就需要把cookie拼接起来,封装成类就可以了
"server.web_st="+st+";"
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。