Hello everyone, welcome to Crossin’s programming classroom!
Bilibili, that is, commonly known as B station, I believe everyone is not strange. What video do you like to see on the B station? Chasing, gaming, funny video, or intellectual area? Of course, there is also a small sister in the dance area.
A development case to share today is: How to climb the video cover of B station small sister with Python
Note: This article is only discussed as programming technology, and the relevant code and data are not available for commercial purposes, otherwise the consequences are at your own risk.
Open the Bilibili search "small sister":
There is a total of 5 pages, take Page 2 as an example, F12 opens the web source code:
About the use of the F12 tool, you can refer to our previous article:
Search the first Title, we can find the corresponding XHR request, carefully analyze all the data exists in a JSON format, and our goals are in the Result list.
Check if Headers is as follows:
This is a GET request, requiring two items and keywords corresponding to the request.
View more pages to find the rules:
You can see that in addition to page 1, there is only one page parameter in the URL in other pages. So we try to request the URL of other pages in the first page, and the result will find the result that can be wanted (yourself Try).
Conclusion: All pages URL only Page parameters are different, and others are consistent.
2. Data crawling
2.1 Import Module
2.2 Get page information
Request data according to the URL of the analysis:
2.3 Get concrete picture information
The code parses the video title, time, playback, release time, author, picture link, etc. Here we take the title and picture link, other parameters can be increased, delete as needed.
2.4 Save Pictures
Here we use the title as the picture name to store, you need to note that the file name cannot contain four species, here is "/,. |" And other four types (a daily video is displayed, there may be entry, you need to adjust yourself, you can also use the title Make a name).