scrapy源码分析<一>:入口函数以及是如何运行

李魔佛 发表了文章 • 0 个评论 • 91 次浏览 • 2019-08-31 10:47 • 来自相关话题

运行scrapy crawl example 命令的时候,就会执行我们写的爬虫程序。
下面我们从源码分析一下scrapy执行的流程:
 

执行scrapy crawl 命令时,调用的是Command类class Command(ScrapyCommand):

requires_project = True

def syntax(self):
return '[options]'

def short_desc(self):
return 'Runs all of the spiders - My Defined'

def run(self,args,opts):
print('==================')
print(type(self.crawler_process))
spider_list = self.crawler_process.spiders.list() # 找到爬虫类

for name in spider_list:
print('=================')
print(name)
self.crawler_process.crawl(name,**opts.__dict__)

self.crawler_process.start()
然后我们去看看crawler_process,这个是来自ScrapyCommand,而ScrapyCommand又是CrawlerProcess的子类,而CrawlerProcess又是CrawlerRunner的子类

在CrawlerRunner构造函数里面主要作用就是这个 def __init__(self, settings=None):
if isinstance(settings, dict) or settings is None:
settings = Settings(settings)
self.settings = settings
self.spider_loader = _get_spider_loader(settings) # 构造爬虫
self._crawlers = set()
self._active = set()
self.bootstrap_failed = False
1. 加载配置文件def _get_spider_loader(settings):

cls_path = settings.get('SPIDER_LOADER_CLASS')

# settings文件没有定义SPIDER_LOADER_CLASS,所以这里获取到的是系统的默认配置文件,
# 默认配置文件在接下来的代码块A
# SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader'

loader_cls = load_object(cls_path)
# 这个函数就是根据路径转为类对象,也就是上面crapy.spiderloader.SpiderLoader 这个
# 字符串变成一个类对象
# 具体的load_object 对象代码见下面代码块B

return loader_cls.from_settings(settings.frozencopy())
默认配置文件defautl_settting.py# 代码块A
#......省略若干
SCHEDULER = 'scrapy.core.scheduler.Scheduler'
SCHEDULER_DISK_QUEUE = 'scrapy.squeues.PickleLifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeues.LifoMemoryQueue'
SCHEDULER_PRIORITY_QUEUE = 'scrapy.pqueues.ScrapyPriorityQueue'

SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader' 就是这个值
SPIDER_LOADER_WARN_ONLY = False

SPIDER_MIDDLEWARES = {}

load_object的实现# 代码块B 为了方便,我把异常处理的去除
from importlib import import_module #导入第三方库

def load_object(path):
dot = path.rindex('.')
module, name = path[:dot], path[dot+1:]
# 上面把路径分为基本路径+模块名

mod = import_module(module)
obj = getattr(mod, name)
# 获取模块里面那个值

return obj

测试代码:In [33]: mod = import_module(module)

In [34]: mod
Out[34]: <module 'scrapy.spiderloader' from '/home/xda/anaconda3/lib/python3.7/site-packages/scrapy/spiderloader.py'>

In [35]: getattr(mod,name)
Out[35]: scrapy.spiderloader.SpiderLoader

In [36]: obj = getattr(mod,name)

In [37]: obj
Out[37]: scrapy.spiderloader.SpiderLoader

In [38]: type(obj)
Out[38]: type
在代码块A中,loader_cls是SpiderLoader,最后返回的的是SpiderLoader.from_settings(settings.frozencopy())
接下来看看SpiderLoader.from_settings, def from_settings(cls, settings):
return cls(settings)
返回类对象自己,所以直接看__init__函数即可class SpiderLoader(object):
"""
SpiderLoader is a class which locates and loads spiders
in a Scrapy project.
"""
def __init__(self, settings):
self.spider_modules = settings.getlist('SPIDER_MODULES')
# 获得settting中的模块名字,创建scrapy的时候就默认帮你生成了
# 你可以看看你的settings文件里面的内容就可以找到这个值,是一个list

self.warn_only = settings.getbool('SPIDER_LOADER_WARN_ONLY')
self._spiders = {}
self._found = defaultdict(list)
self._load_all_spiders() # 加载所有爬虫

核心就是这个_load_all_spiders:
走起:def _load_all_spiders(self):
for name in self.spider_modules:

for module in walk_modules(name): # 这个遍历文件夹里面的文件,然后再转化为类对象,
# 保存到字典:self._spiders = {}
self._load_spiders(module) # 模块变成spider

self._check_name_duplicates() # 去重,如果名字一样就异常

接下来看看_load_spiders
核心就是下面的。def iter_spider_classes(module):
from scrapy.spiders import Spider

for obj in six.itervalues(vars(module)): # 找到模块里面的变量,然后迭代出来
if inspect.isclass(obj) and \
issubclass(obj, Spider) and \
obj.__module__ == module.__name__ and \
getattr(obj, 'name', None): # 有name属性,继承于Spider
yield obj
这个obj就是我们平时写的spider类了。
原来分析了这么多,才找到了我们平时写的爬虫类

待续。。。。
 
原创文章
转载请注明出处
http://30daydo.com/article/530
  查看全部
运行scrapy crawl example 命令的时候,就会执行我们写的爬虫程序。
下面我们从源码分析一下scrapy执行的流程:
 

执行scrapy crawl 命令时,调用的是Command类
class Command(ScrapyCommand):

requires_project = True

def syntax(self):
return '[options]'

def short_desc(self):
return 'Runs all of the spiders - My Defined'

def run(self,args,opts):
print('==================')
print(type(self.crawler_process))
spider_list = self.crawler_process.spiders.list() # 找到爬虫类

for name in spider_list:
print('=================')
print(name)
self.crawler_process.crawl(name,**opts.__dict__)

self.crawler_process.start()

然后我们去看看crawler_process,这个是来自ScrapyCommand,而ScrapyCommand又是CrawlerProcess的子类,而CrawlerProcess又是CrawlerRunner的子类

在CrawlerRunner构造函数里面主要作用就是这个
      def __init__(self, settings=None):
if isinstance(settings, dict) or settings is None:
settings = Settings(settings)
self.settings = settings
self.spider_loader = _get_spider_loader(settings) # 构造爬虫
self._crawlers = set()
self._active = set()
self.bootstrap_failed = False

1. 加载配置文件
def _get_spider_loader(settings):

cls_path = settings.get('SPIDER_LOADER_CLASS')

# settings文件没有定义SPIDER_LOADER_CLASS,所以这里获取到的是系统的默认配置文件,
# 默认配置文件在接下来的代码块A
# SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader'

loader_cls = load_object(cls_path)
# 这个函数就是根据路径转为类对象,也就是上面crapy.spiderloader.SpiderLoader 这个
# 字符串变成一个类对象
# 具体的load_object 对象代码见下面代码块B

return loader_cls.from_settings(settings.frozencopy())

默认配置文件defautl_settting.py
# 代码块A
#......省略若干
SCHEDULER = 'scrapy.core.scheduler.Scheduler'
SCHEDULER_DISK_QUEUE = 'scrapy.squeues.PickleLifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeues.LifoMemoryQueue'
SCHEDULER_PRIORITY_QUEUE = 'scrapy.pqueues.ScrapyPriorityQueue'

SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader' 就是这个值
SPIDER_LOADER_WARN_ONLY = False

SPIDER_MIDDLEWARES = {}


load_object的实现
# 代码块B 为了方便,我把异常处理的去除
from importlib import import_module #导入第三方库

def load_object(path):
dot = path.rindex('.')
module, name = path[:dot], path[dot+1:]
# 上面把路径分为基本路径+模块名

mod = import_module(module)
obj = getattr(mod, name)
# 获取模块里面那个值

return obj


测试代码:
In [33]: mod = import_module(module)                                                                                                                                             

In [34]: mod
Out[34]: <module 'scrapy.spiderloader' from '/home/xda/anaconda3/lib/python3.7/site-packages/scrapy/spiderloader.py'>

In [35]: getattr(mod,name)
Out[35]: scrapy.spiderloader.SpiderLoader

In [36]: obj = getattr(mod,name)

In [37]: obj
Out[37]: scrapy.spiderloader.SpiderLoader

In [38]: type(obj)
Out[38]: type

在代码块A中,loader_cls是SpiderLoader,最后返回的的是SpiderLoader.from_settings(settings.frozencopy())
接下来看看SpiderLoader.from_settings,
    def from_settings(cls, settings):
return cls(settings)

返回类对象自己,所以直接看__init__函数即可
class SpiderLoader(object):
"""
SpiderLoader is a class which locates and loads spiders
in a Scrapy project.
"""
def __init__(self, settings):
self.spider_modules = settings.getlist('SPIDER_MODULES')
# 获得settting中的模块名字,创建scrapy的时候就默认帮你生成了
# 你可以看看你的settings文件里面的内容就可以找到这个值,是一个list

self.warn_only = settings.getbool('SPIDER_LOADER_WARN_ONLY')
self._spiders = {}
self._found = defaultdict(list)
self._load_all_spiders() # 加载所有爬虫


核心就是这个_load_all_spiders:
走起:
def _load_all_spiders(self):
for name in self.spider_modules:

for module in walk_modules(name): # 这个遍历文件夹里面的文件,然后再转化为类对象,
# 保存到字典:self._spiders = {}
self._load_spiders(module) # 模块变成spider

self._check_name_duplicates() # 去重,如果名字一样就异常


接下来看看_load_spiders
核心就是下面的。
def iter_spider_classes(module):
from scrapy.spiders import Spider

for obj in six.itervalues(vars(module)): # 找到模块里面的变量,然后迭代出来
if inspect.isclass(obj) and \
issubclass(obj, Spider) and \
obj.__module__ == module.__name__ and \
getattr(obj, 'name', None): # 有name属性,继承于Spider
yield obj

这个obj就是我们平时写的spider类了。
原来分析了这么多,才找到了我们平时写的爬虫类

待续。。。。
 
原创文章
转载请注明出处
http://30daydo.com/article/530
 

frontera运行link_follower.py 报错:doesn't define any object named 'FIFO'

李魔佛 发表了文章 • 0 个评论 • 172 次浏览 • 2019-07-18 11:29 • 来自相关话题

代码如下:
from __future__ import print_function

import re

import requests

from frontera.contrib.requests.manager import RequestsFrontierManager
# from frontera.contrib.requests.manager import RequestsFrontierManager
from frontera import Settings

from six.moves.urllib.parse import urljoin


SETTINGS = Settings()
SETTINGS.BACKEND = 'frontera.contrib.backends.memory.FIFO'
# SETTINGS.BACKEND = 'frontera.contrib.backends.memory.MemoryDistributedBackend'

SETTINGS.LOGGING_MANAGER_ENABLED = True
SETTINGS.LOGGING_BACKEND_ENABLED = True
SETTINGS.MAX_REQUESTS = 100
SETTINGS.MAX_NEXT_REQUESTS = 10

SEEDS = [
'http://www.imdb.com',
]

LINK_RE = re.compile(r'<a.+?href="(.*?)".?>', re.I)


def extract_page_links(response):
return [urljoin(response.url, link) for link in LINK_RE.findall(response.text)]

if __name__ == '__main__':

frontier = RequestsFrontierManager(SETTINGS)
frontier.add_seeds([requests.Request(url=url) for url in SEEDS])
while True:
next_requests = frontier.get_next_requests()
if not next_requests:
break
for request in next_requests:
try:
response = requests.get(request.url)
links = [
requests.Request(url=url)
for url in extract_page_links(response)
]
frontier.page_crawled(response)
print('Crawled', response.url, '(found', len(links), 'urls)')

if links:
frontier.links_extracted(request, links)
except requests.RequestException as e:
error_code = type(e).__name__
frontier.request_error(request, error_code)
print('Failed to process request', request.url, 'Error:', e)

 无论用的py2或者py3,都会报以下的错误。raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))
NameError: Module 'frontera.contrib.backends.memory' doesn't define any object named 'FIFO' 查看全部
代码如下:
from __future__ import print_function

import re

import requests

from frontera.contrib.requests.manager import RequestsFrontierManager
# from frontera.contrib.requests.manager import RequestsFrontierManager
from frontera import Settings

from six.moves.urllib.parse import urljoin


SETTINGS = Settings()
SETTINGS.BACKEND = 'frontera.contrib.backends.memory.FIFO'
# SETTINGS.BACKEND = 'frontera.contrib.backends.memory.MemoryDistributedBackend'

SETTINGS.LOGGING_MANAGER_ENABLED = True
SETTINGS.LOGGING_BACKEND_ENABLED = True
SETTINGS.MAX_REQUESTS = 100
SETTINGS.MAX_NEXT_REQUESTS = 10

SEEDS = [
'http://www.imdb.com',
]

LINK_RE = re.compile(r'<a.+?href="(.*?)".?>', re.I)


def extract_page_links(response):
return [urljoin(response.url, link) for link in LINK_RE.findall(response.text)]

if __name__ == '__main__':

frontier = RequestsFrontierManager(SETTINGS)
frontier.add_seeds([requests.Request(url=url) for url in SEEDS])
while True:
next_requests = frontier.get_next_requests()
if not next_requests:
break
for request in next_requests:
try:
response = requests.get(request.url)
links = [
requests.Request(url=url)
for url in extract_page_links(response)
]
frontier.page_crawled(response)
print('Crawled', response.url, '(found', len(links), 'urls)')

if links:
frontier.links_extracted(request, links)
except requests.RequestException as e:
error_code = type(e).__name__
frontier.request_error(request, error_code)
print('Failed to process request', request.url, 'Error:', e)

 无论用的py2或者py3,都会报以下的错误。
raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))
NameError: Module 'frontera.contrib.backends.memory' doesn't define any object named 'FIFO'

scrapy-rabbitmq 不支持python3 [修改源码使它支持]

李魔佛 发表了文章 • 0 个评论 • 174 次浏览 • 2019-07-17 17:24 • 来自相关话题

官方版本在2015年就没有更新了。
在python3上运行的收会报错。
 
需要修改以下地方:
 
待续。。
官方版本在2015年就没有更新了。
在python3上运行的收会报错。
 
需要修改以下地方:
 
待续。。

scrapy rabbitmq 分布式爬虫

李魔佛 发表了文章 • 0 个评论 • 301 次浏览 • 2019-07-17 16:59 • 来自相关话题

对于没接触过rabbitmq的同学,可以看这个文章:https://blog.csdn.net/hellozpc/article/details/81436980
rabbitmq是个不错的消息队列服务,可以配合scrapy作为消息队列.
 
下面是一个简单的demo:import re
import requests
import scrapy
from scrapy import Request
from rabbit_spider import settings
from scrapy.log import logger
import json
from rabbit_spider.items import RabbitSpiderItem
import datetime
from scrapy.selector import Selector
import pika

# from scrapy_rabbitmq.spiders import RabbitMQMixin
# from scrapy.contrib.spiders import CrawlSpider

class Website(scrapy.Spider):
name = "rabbit"

def start_requests(self):
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Host': '36kr.com',
'Referer': 'https://36kr.com/information/web_news',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
}

url = 'https://36kr.com/information/web_news'


yield Request(url=url,
headers=headers)

def parse(self, response):


credentials = pika.PlainCredentials('admin', 'admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101', 5672, '/', credentials))

channel = connection.channel()
channel.exchange_declare(exchange='direct_log', exchange_type='direct')

result = channel.queue_declare(exclusive=True, queue='')

queue_name = result.method.queue

# print(queue_name)
# infos = sys.argv[1:] if len(sys.argv)>1 else ['info']
info = 'info'

# 绑定多个值

channel.queue_bind(
exchange='direct_log',
routing_key=info,
queue=queue_name
)
print('start to receive [{}]'.format(info))

channel.basic_consume(
on_message_callback=self.callback_func,
queue=queue_name,
auto_ack=True,
)

channel.start_consuming()


def callback_func(self, ch, method, properties, body):
print(body)
 启动spider:from scrapy import cmdline
cmdline.execute('scrapy crawl rabbit'.split())
 然后往rabbitmq里面推送数据:import pika
import settings

credentials = pika.PlainCredentials('admin','admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))

channel = connection.channel()
channel.exchange_declare(exchange='direct_log',exchange_type='direct') # fanout 就是组播

routing_key = 'info'
message='https://36kr.com/pp/api/aggregation-entity?type=web_latest_article&b_id=59499&per_page=30'
channel.basic_publish(
exchange='direct_log',
routing_key=routing_key,
body=message
)

print('sending message {}'.format(message))
connection.close()
 
推送数据后,scrapy会马上接受到队里里面的数据。
注意不能在start_requests里面写等待队列的命令,因为start_requests函数需要返回一个生成器,否则程序会报错。
 
待续。。。
###### 2019-08-29 更新 ################### 
发现一个坑,就是rabbitMQ在接受到数据后,无法在回调函数里面使用yield生成器。
  查看全部
对于没接触过rabbitmq的同学,可以看这个文章:https://blog.csdn.net/hellozpc/article/details/81436980
rabbitmq是个不错的消息队列服务,可以配合scrapy作为消息队列.
 
下面是一个简单的demo:
import re
import requests
import scrapy
from scrapy import Request
from rabbit_spider import settings
from scrapy.log import logger
import json
from rabbit_spider.items import RabbitSpiderItem
import datetime
from scrapy.selector import Selector
import pika

# from scrapy_rabbitmq.spiders import RabbitMQMixin
# from scrapy.contrib.spiders import CrawlSpider

class Website(scrapy.Spider):
name = "rabbit"

def start_requests(self):
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Host': '36kr.com',
'Referer': 'https://36kr.com/information/web_news',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
}

url = 'https://36kr.com/information/web_news'


yield Request(url=url,
headers=headers)

def parse(self, response):


credentials = pika.PlainCredentials('admin', 'admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101', 5672, '/', credentials))

channel = connection.channel()
channel.exchange_declare(exchange='direct_log', exchange_type='direct')

result = channel.queue_declare(exclusive=True, queue='')

queue_name = result.method.queue

# print(queue_name)
# infos = sys.argv[1:] if len(sys.argv)>1 else ['info']
info = 'info'

# 绑定多个值

channel.queue_bind(
exchange='direct_log',
routing_key=info,
queue=queue_name
)
print('start to receive [{}]'.format(info))

channel.basic_consume(
on_message_callback=self.callback_func,
queue=queue_name,
auto_ack=True,
)

channel.start_consuming()


def callback_func(self, ch, method, properties, body):
print(body)

 启动spider:
from scrapy import cmdline
cmdline.execute('scrapy crawl rabbit'.split())

 然后往rabbitmq里面推送数据:
import pika
import settings

credentials = pika.PlainCredentials('admin','admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))

channel = connection.channel()
channel.exchange_declare(exchange='direct_log',exchange_type='direct') # fanout 就是组播

routing_key = 'info'
message='https://36kr.com/pp/api/aggregation-entity?type=web_latest_article&b_id=59499&per_page=30'
channel.basic_publish(
exchange='direct_log',
routing_key=routing_key,
body=message
)

print('sending message {}'.format(message))
connection.close()

 
推送数据后,scrapy会马上接受到队里里面的数据。
注意不能在start_requests里面写等待队列的命令,因为start_requests函数需要返回一个生成器,否则程序会报错。
 
待续。。。
###### 2019-08-29 更新 ################### 
发现一个坑,就是rabbitMQ在接受到数据后,无法在回调函数里面使用yield生成器。
 

twisted的getPage已经不建议使用,新接口为twisted.web.client.Agent

李魔佛 发表了文章 • 0 个评论 • 210 次浏览 • 2019-07-12 11:31 • 来自相关话题

Twisted-16.7.0 is coming soon, and it deprecates twisted.web.client.getPage (and client.HTTPClientFactory). We use these in some of the unit tests, to fetch one of the HTTP WAPI/WUI pages and make sure the contents look right.

We need to change these tests to use twisted.web.client.Agent instead, or a package named "treq", which is a Twisted flavor of the excellent (but blocking) requests library.

 
  查看全部


Twisted-16.7.0 is coming soon, and it deprecates twisted.web.client.getPage (and client.HTTPClientFactory). We use these in some of the unit tests, to fetch one of the HTTP WAPI/WUI pages and make sure the contents look right.

We need to change these tests to use twisted.web.client.Agent instead, or a package named "treq", which is a Twisted flavor of the excellent (but blocking) requests library.


 
 

喜马拉雅app 爬取音频文件

李魔佛 发表了文章 • 0 个评论 • 1011 次浏览 • 2019-06-30 12:24 • 来自相关话题

爬取喜马拉雅app上 杨继东的投资之道 的音频文件
运行环境:python3# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
url = 'https://www.ximalaya.com/revision/play/album?albumId=23057324&pageNum=1&sort=1&pageSize=60'
headers={'User-Agent':'Xiaomi'}

r = requests.get(url=url,headers=headers)
js_data = r.json()
data_list = js_data.get('data',{}).get('tracksAudioPlay',)
for item in data_list:
trackName=item.get('trackName')
trackName=re.sub(':','',trackName)
src_url = item.get('src')
try:
r0=requests.get(src_url,headers=headers)
except Exception as e:
print(e)
print(trackName)
else:
with open('{}.m4a'.format(trackName),'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))
保存为main.py
然后运行 python main.py
稍微等几分钟就自动下载好了。







附下载好的音频文件:
链接:https://pan.baidu.com/s/1t_vJhTvSJSeFdI1IaDS6fA 
提取码:e3zb 
 

原创文章
转载请注明出处
http://30daydo.com/article/503 查看全部
爬取喜马拉雅app上 杨继东的投资之道 的音频文件
运行环境:python3
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
url = 'https://www.ximalaya.com/revision/play/album?albumId=23057324&pageNum=1&sort=1&pageSize=60'
headers={'User-Agent':'Xiaomi'}

r = requests.get(url=url,headers=headers)
js_data = r.json()
data_list = js_data.get('data',{}).get('tracksAudioPlay',)
for item in data_list:
trackName=item.get('trackName')
trackName=re.sub(':','',trackName)
src_url = item.get('src')
try:
r0=requests.get(src_url,headers=headers)
except Exception as e:
print(e)
print(trackName)
else:
with open('{}.m4a'.format(trackName),'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))

保存为main.py
然后运行 python main.py
稍微等几分钟就自动下载好了。


喜马拉雅.PNG


附下载好的音频文件:
链接:https://pan.baidu.com/s/1t_vJhTvSJSeFdI1IaDS6fA 
提取码:e3zb 
 

原创文章
转载请注明出处
http://30daydo.com/article/503

关于懒人听书爬虫的请教

b842619045 回复了问题 • 3 人关注 • 2 个回复 • 687 次浏览 • 2019-05-22 23:04 • 来自相关话题

requests直接post图片文件

李魔佛 发表了文章 • 0 个评论 • 237 次浏览 • 2019-05-17 16:32 • 来自相关话题

代码如下:
file_path=r'9927_15562445086485238.png'
file=open(file_path, 'rb').read()
r=requests.post(url=code_url,data=file)
print(r.text) 查看全部
代码如下:
    file_path=r'9927_15562445086485238.png'
file=open(file_path, 'rb').read()
r=requests.post(url=code_url,data=file)
print(r.text)

正则表达式替换中文换行符【python】

李魔佛 发表了文章 • 0 个评论 • 177 次浏览 • 2019-05-13 11:02 • 来自相关话题

js里面的内容有中文的换行符。
使用正则表达式替换换行符。(也可以替换为任意字符)js=re.sub('\r\n','',js)
完毕。
js里面的内容有中文的换行符。
使用正则表达式替换换行符。(也可以替换为任意字符)
js=re.sub('\r\n','',js)

完毕。

request header显示Provisional headers are shown

李魔佛 发表了文章 • 0 个评论 • 423 次浏览 • 2019-05-13 10:07 • 来自相关话题

出现这个情况,一般是因为装了一些插件,比如屏蔽广告的插件 ad block导致的。
把插件卸载了问题就解决了。
出现这个情况,一般是因为装了一些插件,比如屏蔽广告的插件 ad block导致的。
把插件卸载了问题就解决了。