爬虫架构
2017-08-28 13:46:13 6 举报
一个爬虫的完整架构图,包括整个爬虫的各个必要模块
作者其他创作
大纲/内容
Page Repo
App Crawler
Crawl Logic
DPark
HBase
URL Repo
Simulator User Action Crawler
Logger
HDFS
URL Dispatcher
Session Crawler
URL Extraction Rules
. . . . . .
Robots File Handler
JS Enginer
Captcha Handler
Mobile Page Crawler
Content Parser
IP-Proxy Manager
Administrator
分布式存储
Field Extraction Rules
用户操作界面
Field Repo
Captcha Crawler
MESOS
Monitor
Content Acceptor
Noraml Crawler
0 条评论
下一页