Raspberrypi 2 Install or upgrade Python3.6
Raspberrypi has python2. 7 and python3.4, but i want to upgrade to python3.6+.
Python3.6 support some new feature such as print(f'{name}') and x=1_000_242_200 expression.
How to upgrade ?
$ wget https://www.python.org/ftp/pyt ... 1.tgz
$ tar zxvf Python-3.6.1.tgz $ cd Python-3.6.1
then run command:
$ sudo ./configure && sudo make && sudo make install
wait for about 20mins (low perf of raspberrypi :( )
then you run command:
python3
it will using the new python3.6 version:
Python 3.6.1 (default, Jul 21 2019, 14:26:28)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
Enjoy it ! 收起阅读 »
frontera运行link_follower.py 报错:doesn't define any object named 'FIFO'
from __future__ import print_function
import re
import requests
from frontera.contrib.requests.manager import RequestsFrontierManager
# from frontera.contrib.requests.manager import RequestsFrontierManager
from frontera import Settings
from six.moves.urllib.parse import urljoin
SETTINGS = Settings()
SETTINGS.BACKEND = 'frontera.contrib.backends.memory.FIFO'
# SETTINGS.BACKEND = 'frontera.contrib.backends.memory.MemoryDistributedBackend'
SETTINGS.LOGGING_MANAGER_ENABLED = True
SETTINGS.LOGGING_BACKEND_ENABLED = True
SETTINGS.MAX_REQUESTS = 100
SETTINGS.MAX_NEXT_REQUESTS = 10
SEEDS = [
'http://www.imdb.com',
]
LINK_RE = re.compile(r'<a.+?href="(.*?)".?>', re.I)
def extract_page_links(response):
return [urljoin(response.url, link) for link in LINK_RE.findall(response.text)]
if __name__ == '__main__':
frontier = RequestsFrontierManager(SETTINGS)
frontier.add_seeds([requests.Request(url=url) for url in SEEDS])
while True:
next_requests = frontier.get_next_requests()
if not next_requests:
break
for request in next_requests:
try:
response = requests.get(request.url)
links = [
requests.Request(url=url)
for url in extract_page_links(response)
]
frontier.page_crawled(response)
print('Crawled', response.url, '(found', len(links), 'urls)')
if links:
frontier.links_extracted(request, links)
except requests.RequestException as e:
error_code = type(e).__name__
frontier.request_error(request, error_code)
print('Failed to process request', request.url, 'Error:', e)
无论用的py2或者py3,都会报以下的错误。
raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))收起阅读 »
NameError: Module 'frontera.contrib.backends.memory' doesn't define any object named 'FIFO'
scrapy-rabbitmq 不支持python3 [修改源码使它支持]
在python3上运行的收会报错。
需要修改以下地方:
待续。。
scrapy rabbitmq 分布式爬虫
rabbitmq是个不错的消息队列服务,可以配合scrapy作为消息队列.
下面是一个简单的demo:
import re
import requests
import scrapy
from scrapy import Request
from rabbit_spider import settings
from scrapy.log import logger
import json
from rabbit_spider.items import RabbitSpiderItem
import datetime
from scrapy.selector import Selector
import pika
# from scrapy_rabbitmq.spiders import RabbitMQMixin
# from scrapy.contrib.spiders import CrawlSpider
class Website(scrapy.Spider):
name = "rabbit"
def start_requests(self):
headers = {'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',
'Host': '36kr.com',
'Referer': 'https://36kr.com/information/web_news',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
}
url = 'https://36kr.com/information/web_news'
yield Request(url=url,
headers=headers)
def parse(self, response):
credentials = pika.PlainCredentials('admin', 'admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101', 5672, '/', credentials))
channel = connection.channel()
channel.exchange_declare(exchange='direct_log', exchange_type='direct')
result = channel.queue_declare(exclusive=True, queue='')
queue_name = result.method.queue
# print(queue_name)
# infos = sys.argv[1:] if len(sys.argv)>1 else ['info']
info = 'info'
# 绑定多个值
channel.queue_bind(
exchange='direct_log',
routing_key=info,
queue=queue_name
)
print('start to receive [{}]'.format(info))
channel.basic_consume(
on_message_callback=self.callback_func,
queue=queue_name,
auto_ack=True,
)
channel.start_consuming()
def callback_func(self, ch, method, properties, body):
print(body)
启动spider:
from scrapy import cmdline
cmdline.execute('scrapy crawl rabbit'.split())
然后往rabbitmq里面推送数据:
import pika
import settings
credentials = pika.PlainCredentials('admin','admin')
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))
channel = connection.channel()
channel.exchange_declare(exchange='direct_log',exchange_type='direct') # fanout 就是组播
routing_key = 'info'
message='https://36kr.com/pp/api/aggregation-entity?type=web_latest_article&b_id=59499&per_page=30'
channel.basic_publish(
exchange='direct_log',
routing_key=routing_key,
body=message
)
print('sending message {}'.format(message))
connection.close()
推送数据后,scrapy会马上接受到队里里面的数据。
注意不能在start_requests里面写等待队列的命令,因为start_requests函数需要返回一个生成器,否则程序会报错。
待续。。。
###### 2019-08-29 更新 ###################
发现一个坑,就是rabbitMQ在接受到数据后,无法在回调函数里面使用yield生成器。
收起阅读 »
exchange_declare() got an unexpected keyword argument 'type'
exchange_type instead of type
credentials = pika.PlainCredentials('admin','admin')收起阅读 »
connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))
channel = connection.channel()
channel.exchange_declare(exchange='logs',exchange_type='fanout')
twisted的getPage已经不建议使用,新接口为twisted.web.client.Agent
Twisted-16.7.0 is coming soon, and it deprecates twisted.web.client.getPage (and client.HTTPClientFactory). We use these in some of the unit tests, to fetch one of the HTTP WAPI/WUI pages and make sure the contents look right.
We need to change these tests to use twisted.web.client.Agent instead, or a package named "treq", which is a Twisted flavor of the excellent (but blocking) requests library.
收起阅读 »
twisted reactor运行后,添加了addBoth函数,但是还是无法停止
from scrapy.selector import Selector
def get_response_callback(content):
txt = str(content,encoding='utf-8')
resp = Selector(text=txt)
title = resp.xpath('//title/text()').extract_first()
print(title)
@defer.inlineCallbacks
def task():
url = 'http://www.baidu.com'
d=getPage(url.encode('utf-8'))
d.addCallback(get_response_callback)
yield d
def done():
reactor.stop()
def done1(*args,**kwargs):
reactor.stop()
task_list =
for i in range(4):
d=task()
task_list.append(d)
dd = defer.DeferredList(task_list)
dd.addBoth(done)
reactor.run()
上面的代码是无法停止的,如果使用的是
dd.addBoth(done)
done函数的定义是没有参数的。
而使用另一个done函数带参数的done(*args,**kwargs)
是可以正常退出的,done里面写了reactor.stop() 函数
原创文章
转载请注明出处:
http://30daydo.com/article/509
收起阅读 »
numpy indices的用法
Suppose you have a matrix M whose (i,j)-th element equals
M_ij = 2*i + 3*j
One way to define this matrix would be
i, j = np.indices((2,3))
M = 2*i + 3*j
which yields
array([[0, 3, 6],
[2, 5, 8]])
In other words, np.indices returns arrays which can be used as indices. The elements in i indicate the row index:
In [12]: i
Out[12]:
array([[0, 0, 0],
[1, 1, 1]])
The elements in j indicate the column index:
In [13]: j
Out[13]:
array([[0, 1, 2],
[0, 1, 2]])
上面是Stack Overflow的解释。 翻译一下:
np.indices((2,3))
返回的是一个行列的索引,然后可以用这个索引快速的创建二维数据。
比如我要画一个圆:
img = np.zeros((400,400))
ir,ic = np.indices(img.shape)
circle = (ir-135)**2+(ic-150)**2 < 30**2 # 半径30,圆心在135,150
img[circle]=1
img现在就是一个圆啦
收起阅读 »
cv2 distanceTransform函数的用法 python
distanceTransform
Calculates the distance to the closest zero pixel for each pixel of the source image.
Python: cv2.distanceTransform(src, distanceType, maskSize[, dst]) → dst
Python: cv.DistTransform(src, dst, distance_type=CV_DIST_L2, mask_size=3, mask=None, labels=None) → None
Parameters:
src – 8-bit, single-channel (binary) source image.
dst – Output image with calculated distances. It is a 32-bit floating-point, single-channel image of the same size as src .
distanceType – Type of distance. It can be CV_DIST_L1, CV_DIST_L2 , or CV_DIST_C .
maskSize – Size of the distance transform mask. It can be 3, 5, or CV_DIST_MASK_PRECISE (the latter option is only supported by the first function). In case of the CV_DIST_L1 or CV_DIST_C distance type, the parameter is forced to 3 because a 3\times 3 mask gives the same result as 5\times 5 or any larger aperture.
labels – Optional output 2D array of labels (the discrete Voronoi diagram). It has the type CV_32SC1 and the same size as src . See the details below.
labelType – Type of the label array to build. If labelType==DIST_LABEL_CCOMP then each connected component of zeros in src (as well as all the non-zero pixels closest to the connected component) will be assigned the same label. If labelType==DIST_LABEL_PIXEL then each zero pixel (and all the non-zero pixels closest to it) gets its own label.
The functions distanceTransform calculate the approximate or precise distance from every binary image pixel to the nearest zero pixel. For zero image pixels, the distance will obviously be zero.
When maskSize == CV_DIST_MASK_PRECISE and distanceType == CV_DIST_L2 , the function runs the algorithm described in [Felzenszwalb04]. This algorithm is parallelized with the TBB library.
In other cases, the algorithm [Borgefors86] is used. This means that for a pixel the function finds the shortest path to the nearest zero pixel consisting of basic shifts: horizontal, vertical, diagonal, or knight’s move (the latest is available for a 5\times 5 mask). The overall distance is calculated as a sum of these basic distances. Since the distance function should be symmetric, all of the horizontal and vertical shifts must have the same cost (denoted as a ), all the diagonal shifts must have the same cost (denoted as b ), and all knight’s moves must have the same cost (denoted as c ). For the CV_DIST_C and CV_DIST_L1 types, the distance is calculated precisely, whereas for CV_DIST_L2 (Euclidean distance) the distance can be calculated only with a relative error (a 5\times 5 mask gives more accurate results). For a,``b`` , and c , OpenCV uses the values suggested in the original paper:
CV_DIST_C (3\times 3) a = 1, b = 1
CV_DIST_L1 (3\times 3) a = 1, b = 2
CV_DIST_L2 (3\times 3) a=0.955, b=1.3693
CV_DIST_L2 (5\times 5) a=1, b=1.4, c=2.1969
Typically, for a fast, coarse distance estimation CV_DIST_L2, a 3\times 3 mask is used. For a more accurate distance estimation CV_DIST_L2 , a 5\times 5 mask or the precise algorithm is used. Note that both the precise and the approximate algorithms are linear on the number of pixels.
The second variant of the function does not only compute the minimum distance for each pixel (x, y) but also identifies the nearest connected component consisting of zero pixels (labelType==DIST_LABEL_CCOMP) or the nearest zero pixel (labelType==DIST_LABEL_PIXEL). Index of the component/pixel is stored in \texttt{labels}(x, y) . When labelType==DIST_LABEL_CCOMP, the function automatically finds connected components of zero pixels in the input image and marks them with distinct labels. When labelType==DIST_LABEL_CCOMP, the function scans through the input image and marks all the zero pixels with distinct labels.
In this mode, the complexity is still linear. That is, the function provides a very fast way to compute the Voronoi diagram for a binary image. Currently, the second variant can use only the approximate distance transform algorithm, i.e. maskSize=CV_DIST_MASK_PRECISE is not supported yet.
Note
An example on using the distance transform can be found at opencv_source_code/samples/cpp/distrans.cpp
(Python) An example on using the distance transform can be found at opencv_source/samples/python2/distrans.py
收起阅读 »
Win10下PhantomJS无法运行 【版本兼容问题】
在win10下就报错:
selenium.common.exceptions.WebDriverException: Message: Service C:\Tool\phantomjs-2.5.0-beta2-windows\phantomjs-2.5.0-beta2-windows\bin\phantomjs.exe unexpectedly exited. Status code was: 4294967295
后来替换了一个旧的版本,发现问题就这么解决了。
旧版本:phantomjs-2.1.1-windows
原创文章
转载请注明出处
http://30daydo.com/article/505
收起阅读 »
nunpy中的std标准差是样本差吗
# 测试一下那个方差X.mean()
x=[1,2,3,4,5,6,7,8,9,10]
X = np.array(x)
5.5
X.std() # 标准差
2.8722813232690143
手工计算一下:
def my_fangca(X):
l=len(X)
mean=X.mean()
sum_ = 0
sum_std=0
for i in X:
sum_+=(i-mean)**2
var_=sum_/l
std_=(sum_/(l))**0.5
return var_,std_
result = my_fangca(X)
得到的result
(8.25, 2.8722813232690143)
说明numpy的std是标准差,不是样本差 收起阅读 »
喜马拉雅app 爬取音频文件
因为喜马拉雅的源码格式改了,所以爬虫代码也更新了一波
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py
import requests
import re
import os
url = 'http://180.153.255.6/mobile/v1/album/track/ts-1571294887744?albumId=23057324&device=android&isAsc=true&isQueryInvitationBrand=true&pageId={}&pageSize=20&pre_page=0'
headers = {'User-Agent': 'Xiaomi'}
def download():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
# trackName=re.sub(':','',trackName)
src_url = item.get('playUrl64')
filename = '{}.mp3'.format(trackName)
if not os.path.exists(filename):
try:
r0 = requests.get(src_url, headers=headers)
except Exception as e:
print(e)
print(trackName)
r0 = requests.get(src_url, headers=headers)
else:
with open(filename, 'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))
else:
print(f'{filename}已经下载过了')
import shutil
def rename_():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
src_url = item.get('playUrl64')
orderNo=item.get('orderNo')
filename = '{}.mp3'.format(trackName)
try:
if os.path.exists(filename):
new_file='{}_{}.mp3'.format(orderNo,trackName)
shutil.move(filename,new_file)
except Exception as e:
print(e)
if __name__=='__main__':
rename_()
音频文件也更新了,详情见百度网盘。
======== 2018-10=============
爬取喜马拉雅app上 杨继东的投资之道 的音频文件
运行环境:python3
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py
import requests
import re
url = 'https://www.ximalaya.com/revision/play/album?albumId=23057324&pageNum=1&sort=1&pageSize=60'
headers={'User-Agent':'Xiaomi'}
r = requests.get(url=url,headers=headers)
js_data = r.json()
data_list = js_data.get('data',{}).get('tracksAudioPlay',)
for item in data_list:
trackName=item.get('trackName')
trackName=re.sub(':','',trackName)
src_url = item.get('src')
try:
r0=requests.get(src_url,headers=headers)
except Exception as e:
print(e)
print(trackName)
else:
with open('{}.m4a'.format(trackName),'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))
保存为main.py
然后运行 python main.py
稍微等几分钟就自动下载好了。
附下载好的音频文件:
链接:https://pan.baidu.com/s/1t_vJhTvSJSeFdI1IaDS6fA
提取码:e3zb
原创文章
转载请注明出处
http://30daydo.com/article/503 收起阅读 »
python3与python2迭代器的写法的区别
附一个例子:
def iter_demo():收起阅读 »
class DefineIter(object):
def __init__(self,length):
self.length = length
self.data = range(self.length)
self.index=0
def __iter__(self):
return self
def __next__(self):
if self.index >=self.length:
# return None
raise StopIteration
d = self.data[self.index]*50
self.index =self.index + 1
return d
a = DefineIter(10)
print(type(a))
for i in a:
print(i)
PyCharm 快捷键快速插入当前时间
方式
通过 Live Template 快速添加时间
步骤
1、添加一个 Template Group 命名为 Common
2、添加一个 Live Template 设置如下
Abbreviation: time
Description : current time
Template Text: $time$
Edit Variables -> Expresssion : date("yyyy-MM-dd HH:mm:ss")
3、让设置生效
Define->Everywhere
4、使用
输入 time 后 按下tab键 就能转换为当前时间了
收起阅读 »
深圳转债转股后不可以撤单
说明转股后也是不能撤单的。
修改easytrader国金证券的默认启动路径
pywinauto.application.AppStartError: Could not create the process "C:\全能行证券交易终端\xiadan.exe"
Error returned by CreateProcess: (2, 'CreateProcess', '系统找不到指定的文件。')
看了配置文件,也是没有具体的参数可以修改,只好修改源代码。
别听到改源代码就害怕,只是需要改一行就可以了。
找到文件:
site-package\easytrader\config\client.py
找过这一行:
class GJ(CommonConfig):只要修改上面的路径就可以了。注意用双斜杠。
DEFAULT_EXE_PATH = "C:\\Tool\\xiadan.exe"
收起阅读 »
使用pymongo中的find_one_and_update出错:需要分片键
File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Query for sharded findAndModify must contain the shard key
2019-06-10 16:14:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-10 16:14:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
需要在查询语句中把分片键也添加进去。
因为findOneModify只会找一个记录,但是到底在哪个分片的记录呢? 因为不确定,所以才需要把shard加上去。
参考官方:
Targeted Operations vs. Broadcast Operations收起阅读 »
Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.
For queries that don’t include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These “scatter/gather” queries can be long running operations.
Broadcast Operations
mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.
After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.
Multi-update operations are always broadcast operations.
The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.
Targeted Operations
mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.
For example, if the shard key is:
copy
{ a: 1, b: 1, c: 1 }
The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
copy
{ a: 1 }
{ a: 1, b: 1 }
All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.
All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.
Index Use
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.
Sharded Cluster Security
Use Internal Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.
See Deploy Sharded Cluster with Keyfile Access Control for a tutorial on deploying a secured shardedcluster.
Cluster Users
Sharded clusters support Role-Based Access Control (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Internal Authentication for inter-cluster security also enables user access controls via RBAC.
With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.
Each cluster has its own cluster users. These users cannot be used to access individual shards.
See Enable Access Control for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.
jupyter notebook格式的文件损坏如何修复
使用下面的代码:
# 拯救损坏的jupyter 文件只要把你要修复的文件替换一下就可以了。 收起阅读 »
import re
import codecs
pattern = re.compile('"source": \[(.*?)\]\s+\},',re.S)
filename = 'tushare_usage.ipynb'
with codecs.open(filename,encoding='utf8') as f:
content = f.read()
source = pattern.findall(content)
for s in source:
t=s.replace('\\n','')
t=re.sub('"','',t)
t=re.sub('(,$)','',t)
print(t)
python连接mongodb集群 cluster
连接方法如下:
import pymongo上面默认的端口do都是27017,如果是其他端口,需要这样修改:
db = pymongo.MongoClient('mongodb://10.18.6.46,10.18.6.26,10.18.6.102')
db = pymongo.MongoClient('mongodb://10.18.6.46:8888,10.18.6.26:9999,10.18.6.102:7777')
然后就可以正常读写数据库:
读:
coll=db['testdb']['testcollection'].find()输出内容:
for i in coll:
print(i)
{'_id': ObjectId('5cf4c7981ee9edff72e5c503'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c504'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c505'), 'username': 'hello'}
{'_id': ObjectId('5cf4c79a1ee9edff72e5c506'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7b21ee9edff72e5c507'), 'username': 'hello world'}
写:
collection = db['testdb']['testcollection']
for i in range(10):
collection.insert({'username':'huston{}'.format(i)})
原创文章,转载请注明出处:
http://30daydo.com/article/494
收起阅读 »
ElasticSearch查看已经存在的文档保存在哪个分片
比如我有以下的文档:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "XxyrM2kBVzdNcvl_GHv2",
"_score" : 1.0,
"_source" : {
"name" : "Shiled",
"twitter" : "Sonny sql is awesome",
"date" : "2018-12-27",
"id" : 1240,
"tags" : [
"red",
"shit"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YByrM2kBVzdNcvl_tnvm",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 12357,
"tags" : [
"red"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "7777",
"_score" : 1.0,
"_source" : {
"name" : "Rocky Chen",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 9999
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YhzDN2kBVzdNcvl_enuT",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YxzDN2kBVzdNcvl_u3th",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
}
]
}
}
如果我想看看id是 "_id" : "YxzDN2kBVzdNcvl_u3th",
这个文档是保存在哪个分片,如何查看?
引用:
路由一个文档到一个分片中编辑
当索引一个文档的时候,文档会被存储到一个主分片中。 Elasticsearch 如何知道一个文档应该存放到哪个分片中呢?当我们创建文档时,它如何决定这个文档应当被存储在分片 1 还是分片 2 中呢?
首先这肯定不会是随机的,否则将来要获取文档的时候我们就不知道从何处寻找了。实际上,这个过程是根据下面这个公式决定的:
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。
那么可以使用
GET test/_search_shards?routing=ID号 来查看你要查询的id所在的分片
得到的结果:
{
"nodes" : {
"yl-qYmh1SXqzJsfI4d1ddw" : {
"name" : "node-3",
"ephemeral_id" : "UsJ9rFELTiCW07oHE9YMdg",
"transport_address" : "10.18.6.26:9300",
"attributes" : {
"ml.machine_memory" : "6088101888",
"rack" : "r1",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
}
},
"wT7wUd3iTkujYUsbVNVv-w" : {
"name" : "node-1",
"ephemeral_id" : "fP-vgSb0SdemnHDyaJUsWw",
"transport_address" : "10.18.6.102:9300",
"attributes" : {
"ml.machine_memory" : "8256720896",
"rack" : "r1",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
}
}
},
"indices" : {
"test" : { }
},
"shards" : [
[
{
"state" : "STARTED",
"primary" : true,
"node" : "wT7wUd3iTkujYUsbVNVv-w",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "k-8E4dL7QmGgwcsNsUCP6Q"
}
},
{
"state" : "STARTED",
"primary" : false,
"node" : "yl-qYmh1SXqzJsfI4d1ddw",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "lvOpQIKgRUibkulr3nRfEw"
}
}
]
]
}
只需要关注shards字段就可以,从上面可以看到,该文档存在shard 1 分片上。 分别在node1和node3节点,一个是主分片,一个是副本分片 收起阅读 »
elasticsearch在match查询里面使用了type字段 报错
POST get-together/_search
{
"query":
{
"match": {
"name": {
"type":"phrase",
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
报错:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
}
],
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
},
"status": 400
}
在6.x已经不支持在math里面使用type,
可以修改为以下语法:
POST get-together/_search
{
"query":
{
"match_phrase": {
"name": {
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
得到的效果是一致的:
{收起阅读 »
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.3243701,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3243701,
"_source" : {
"name" : "Enterprise search London get-together"
}
}
]
}
}
elasticsearch 更新文档的坑
POST cnbeta/doc/cUxO42oB9O-zF2ru-rs-/_update
{
"doc":{
"title":"中国操作系统"
}
}
那个body里面的”doc" 不能少
不然会报错:
{
"error": {
"root_cause": [
{
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
}
],
"type": "action_request_validation_exception",
"reason": "Validation Failed: 1: script or doc is missing;"
},
"status": 400
} 收起阅读 »
python的mixin类
maxin类似多重继承的一种限制形式:
关于Python的Mixin模式
像C或C++这类语言都支持多重继承,一个子类可以有多个父类,这样的设计常被人诟病。因为继承应该是个”is-a”关系。比如轿车类继承交通工具类,因为轿车是一个(“is-a”)交通工具。一个物品不可能是多种不同的东西,因此就不应该存在多重继承。不过有没有这种情况,一个类的确是需要继承多个类呢?
答案是有,我们还是拿交通工具来举例子,民航飞机是一种交通工具,对于土豪们来说直升机也是一种交通工具。对于这两种交通工具,它们都有一个功能是飞行,但是轿车没有。所以,我们不可能将飞行功能写在交通工具这个父类中。但是如果民航飞机和直升机都各自写自己的飞行方法,又违背了代码尽可能重用的原则(如果以后飞行工具越来越多,那会出现许多重复代码)。怎么办,那就只好让这两种飞机同时继承交通工具以及飞行器两个父类,这样就出现了多重继承。这时又违背了继承必须是”is-a”关系。这个难题该怎么破?
不同的语言给出了不同的方法,让我们先来看下Java。Java提供了接口interface功能,来实现多重继承:
public abstract class Vehicle {
}
public interface Flyable {
public void fly();
}
public class FlyableImpl implements Flyable {
public void fly() {
System.out.println("I am flying");
}
}
public class Airplane extends Vehicle implements Flyable {
private flyable;
public Airplane() {
flyable = new FlyableImpl();
}
public void fly() {
flyable.fly();
}
}
现在我们的飞机同时具有了交通工具及飞行器两种属性,而且我们不需要重写飞行器中的飞行方法,同时我们没有破坏单一继承的原则。飞机就是一种交通工具,可飞行的能力是是飞机的属性,通过继承接口来获取。
回到主题,Python语言可没有接口功能,但是它可以多重继承。那Python是不是就该用多重继承来实现呢?是,也不是。说是,因为从语法上看,的确是通过多重继承实现的。说不是,因为它的继承依然遵守”is-a”关系,从含义上看依然遵循单继承的原则。这个怎么理解呢?我们还是看例子吧。
class Vehicle(object):
pass
class PlaneMixin(object):
def fly(self):
print 'I am flying'
class Airplane(Vehicle, PlaneMixin):
pass
可以看到,上面的Airplane类实现了多继承,不过它继承的第二个类我们起名为PlaneMixin,而不是Plane,这个并不影响功能,但是会告诉后来读代码的人,这个类是一个Mixin类。所以从含义上理解,Airplane只是一个Vehicle,不是一个Plane。这个Mixin,表示混入(mix-in),它告诉别人,这个类是作为功能添加到子类中,而不是作为父类,它的作用同Java中的接口。
使用Mixin类实现多重继承要非常小心
- 首先它必须表示某一种功能,而不是某个物品,如同Java中的Runnable,Callable等
- 其次它必须责任单一,如果有多个功能,那就写多个Mixin类
- 然后,它不依赖于子类的实现
- 最后,子类即便没有继承这个Mixin类,也照样可以工作,就是缺少了某个功能。(比如飞机照样可以载客,就是不能飞了^_^)
原创文章,转载请注明出处
http://30daydo.com/article/480
收起阅读 »
截止今天(2019-05-14)银行股今年的涨幅排名
ticker secShortName secFullName y_chgPct收起阅读 »
31 601288 农业银行 中国农业银行股份有限公司 -0.341178
11 600015 华夏银行 华夏银行股份有限公司 1.856174
45 601988 中国银行 中国银行股份有限公司 2.248533
32 601328 交通银行 交通银行股份有限公司 3.532657
30 601229 上海银行 上海银行股份有限公司 3.725781
35 601398 工商银行 中国工商银行股份有限公司 4.771403
40 601818 光大银行 中国光大银行股份有限公司 5.643119
27 601169 北京银行 北京银行股份有限公司 6.205580
14 600016 民生银行 中国民生银行股份有限公司 7.092815
5 002936 郑州银行 郑州银行股份有限公司 7.551112
49 601998 中信银行 中信银行股份有限公司 8.181956
43 601939 建设银行 中国建设银行股份有限公司 9.402651
41 601838 成都银行 成都银行股份有限公司 9.424554
52 603323 苏农银行 江苏苏州农村商业银行股份有限公司 12.375732
20 600926 杭州银行 杭州银行股份有限公司 12.933645
8 600000 浦发银行 上海浦东发展银行股份有限公司 14.752244
39 601577 长沙银行 长沙银行股份有限公司 14.792683
18 600908 无锡银行 无锡农村商业银行股份有限公司 16.181704
3 002807 江阴银行 江苏江阴农村商业银行股份有限公司 19.274586
48 601997 贵阳银行 贵阳银行股份有限公司 20.489563
4 002839 张家港行 江苏张家港农村商业银行股份有限公司 20.599511
25 601166 兴业银行 兴业银行股份有限公司 21.206503
24 601128 常熟银行 江苏常熟农村商业银行股份有限公司 21.571187
19 600919 江苏银行 江苏银行股份有限公司 23.218299
22 601009 南京银行 南京银行股份有限公司 26.297500
16 600036 招商银行 招商银行股份有限公司 27.518708
0 000001 平安银行 平安银行股份有限公司 31.624747
2 002142 宁波银行 宁波银行股份有限公司 31.729062
6 002948 青岛银行 青岛银行股份有限公司 48.602573
7 002958 青农商行 青岛农村商业银行股份有限公司 108.983776
42 601860 紫金银行 江苏紫金农村商业银行股份有限公司 115.147347
21 600928 西安银行 西安银行股份有限公司 128.496683
正则表达式替换中文换行符【python】
使用正则表达式替换换行符。(也可以替换为任意字符)
js=re.sub('\r\n','',js)
完毕。
request header显示Provisional headers are shown
把插件卸载了问题就解决了。