请问各位用scrapy和redis方法爬取不到数据的问题(可悬赏),求大佬看下,感激不尽
分布式爬虫一直都是显示Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
github的原地址是https://github.com/CUHKSZ-TQL/ ... lysis
配置环境之后对代码修改之后是
链接:https://pan.baidu.com/s/1jHbz7ak8VqO-MMHeGj9_UA
提取码:iecl
运行第三个程序的结果是:
= RESTART: C:\Users\ap645\Desktop\WeiboSpider_SentimentAnalysis-master\WeiboSpider\sina\spiders\weibo_spider.py
2020-04-16 11:04:10 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: sina)
2020-04-16 11:04:10 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.9, Platform Windows-10-10.0.18362-SP0
2020-04-16 11:04:10 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-04-16 11:04:10 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'sina',
'DOWNLOAD_DELAY': 2,
'DUPEFILTER_CLASS': 'scrapy_redis_bloomfilter.dupefilter.RFPDupeFilter',
'NEWSPIDER_MODULE': 'sina.spiders',
'SCHEDULER': 'scrapy_redis_bloomfilter.scheduler.Scheduler',
'SPIDER_MODULES': ['sina.spiders']}
2020-04-16 11:04:10 [scrapy.extensions.telnet] INFO: Telnet Password: 3c9f648b6ca7a947
2020-04-16 11:04:10 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2020-04-16 11:04:10 [weibo_spider] INFO: Reading start URLs from redis key 'weibo_spider:start_urls' (batch size: 16, encoding: utf-8
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['sina.middlewares.RedirectMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'sina.middlewares.CookieMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled item pipelines:
['sina.pipelines.MongoDBPipeline']
2020-04-16 11:04:12 [scrapy.core.engine] INFO: Spider opened
2020-04-16 11:04:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-16 11:04:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-16 11:05:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
github的原地址是https://github.com/CUHKSZ-TQL/ ... lysis
配置环境之后对代码修改之后是
链接:https://pan.baidu.com/s/1jHbz7ak8VqO-MMHeGj9_UA
提取码:iecl
运行第三个程序的结果是:
= RESTART: C:\Users\ap645\Desktop\WeiboSpider_SentimentAnalysis-master\WeiboSpider\sina\spiders\weibo_spider.py
2020-04-16 11:04:10 [scrapy.utils.log] INFO: Scrapy 2.0.1 started (bot: sina)
2020-04-16 11:04:10 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 20.3.0, Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1f 31 Mar 2020), cryptography 2.9, Platform Windows-10-10.0.18362-SP0
2020-04-16 11:04:10 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2020-04-16 11:04:10 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'sina',
'DOWNLOAD_DELAY': 2,
'DUPEFILTER_CLASS': 'scrapy_redis_bloomfilter.dupefilter.RFPDupeFilter',
'NEWSPIDER_MODULE': 'sina.spiders',
'SCHEDULER': 'scrapy_redis_bloomfilter.scheduler.Scheduler',
'SPIDER_MODULES': ['sina.spiders']}
2020-04-16 11:04:10 [scrapy.extensions.telnet] INFO: Telnet Password: 3c9f648b6ca7a947
2020-04-16 11:04:10 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2020-04-16 11:04:10 [weibo_spider] INFO: Reading start URLs from redis key 'weibo_spider:start_urls' (batch size: 16, encoding: utf-8
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['sina.middlewares.RedirectMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'sina.middlewares.CookieMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-04-16 11:04:12 [scrapy.middleware] INFO: Enabled item pipelines:
['sina.pipelines.MongoDBPipeline']
2020-04-16 11:04:12 [scrapy.core.engine] INFO: Spider opened
2020-04-16 11:04:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-16 11:04:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-16 11:05:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
没有找到相关结果
已邀请:
1 个回复
李魔佛 - 公众号:可转债量化分析 【论坛注册:公众号后台留言邮箱】
赞同来自:
你可以试试把新浪改为其他站点,然后结构不变再试试。