nunpy中的std标准差是样本差吗

写个代码测试下:
# 测试一下那个方差
x=[1,2,3,4,5,6,7,8,9,10]
X = np.array(x)
X.mean()
5.5
 
X.std() # 标准差
2.8722813232690143
 
手工计算一下:
def my_fangca(X):
l=len(X)
mean=X.mean()
sum_ = 0
sum_std=0
for i in X:
sum_+=(i-mean)**2
var_=sum_/l
std_=(sum_/(l))**0.5
return var_,std_

result = my_fangca(X)
得到的result

(8.25, 2.8722813232690143)
 
说明numpy的std是标准差,不是样本差
继续阅读 »
写个代码测试下:
# 测试一下那个方差
x=[1,2,3,4,5,6,7,8,9,10]
X = np.array(x)
X.mean()
5.5
 
X.std() # 标准差
2.8722813232690143
 
手工计算一下:
def my_fangca(X):
l=len(X)
mean=X.mean()
sum_ = 0
sum_std=0
for i in X:
sum_+=(i-mean)**2
var_=sum_/l
std_=(sum_/(l))**0.5
return var_,std_

result = my_fangca(X)
得到的result

(8.25, 2.8722813232690143)
 
说明numpy的std是标准差,不是样本差 收起阅读 »

喜马拉雅app 爬取音频文件

============== 2019-10-28更新 =================
因为喜马拉雅的源码格式改了,所以爬虫代码也更新了一波
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
import os

url = 'http://180.153.255.6/mobile/v1/album/track/ts-1571294887744?albumId=23057324&device=android&isAsc=true&isQueryInvitationBrand=true&pageId={}&pageSize=20&pre_page=0'
headers = {'User-Agent': 'Xiaomi'}

def download():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
# trackName=re.sub(':','',trackName)
src_url = item.get('playUrl64')
filename = '{}.mp3'.format(trackName)
if not os.path.exists(filename):

try:
r0 = requests.get(src_url, headers=headers)
except Exception as e:
print(e)
print(trackName)
r0 = requests.get(src_url, headers=headers)


else:
with open(filename, 'wb') as f:
f.write(r0.content)

print('{} downloaded'.format(trackName))

else:
print(f'{filename}已经下载过了')

import shutil

def rename_():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
src_url = item.get('playUrl64')

orderNo=item.get('orderNo')

filename = '{}.mp3'.format(trackName)
try:

if os.path.exists(filename):
new_file='{}_{}.mp3'.format(orderNo,trackName)
shutil.move(filename,new_file)
except Exception as e:
print(e)





if __name__=='__main__':
rename_()

 
音频文件也更新了,详情见百度网盘。
 
 
======== 2018-10=============
 爬取喜马拉雅app上 杨继东的投资之道 的音频文件
运行环境:python3
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
url = 'https://www.ximalaya.com/revision/play/album?albumId=23057324&pageNum=1&sort=1&pageSize=60'
headers={'User-Agent':'Xiaomi'}

r = requests.get(url=url,headers=headers)
js_data = r.json()
data_list = js_data.get('data',{}).get('tracksAudioPlay',)
for item in data_list:
trackName=item.get('trackName')
trackName=re.sub(':','',trackName)
src_url = item.get('src')
try:
r0=requests.get(src_url,headers=headers)
except Exception as e:
print(e)
print(trackName)
else:
with open('{}.m4a'.format(trackName),'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))

保存为main.py
然后运行 python main.py
稍微等几分钟就自动下载好了。


喜马拉雅.PNG


附下载好的音频文件:
链接:https://pan.baidu.com/s/1t_vJhTvSJSeFdI1IaDS6fA 
提取码:e3zb 
 

原创文章
转载请注明出处
http://30daydo.com/article/503
继续阅读 »
============== 2019-10-28更新 =================
因为喜马拉雅的源码格式改了,所以爬虫代码也更新了一波
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
import os

url = 'http://180.153.255.6/mobile/v1/album/track/ts-1571294887744?albumId=23057324&device=android&isAsc=true&isQueryInvitationBrand=true&pageId={}&pageSize=20&pre_page=0'
headers = {'User-Agent': 'Xiaomi'}

def download():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
# trackName=re.sub(':','',trackName)
src_url = item.get('playUrl64')
filename = '{}.mp3'.format(trackName)
if not os.path.exists(filename):

try:
r0 = requests.get(src_url, headers=headers)
except Exception as e:
print(e)
print(trackName)
r0 = requests.get(src_url, headers=headers)


else:
with open(filename, 'wb') as f:
f.write(r0.content)

print('{} downloaded'.format(trackName))

else:
print(f'{filename}已经下载过了')

import shutil

def rename_():
for i in range(1, 3):
r = requests.get(url=url.format(i), headers=headers)
js_data = r.json()
data_list = js_data.get('data', {}).get('list', [])
for item in data_list:
trackName = item.get('title')
trackName = re.sub('[\/\\\:\*\?\"\<\>\|]', '_', trackName)
src_url = item.get('playUrl64')

orderNo=item.get('orderNo')

filename = '{}.mp3'.format(trackName)
try:

if os.path.exists(filename):
new_file='{}_{}.mp3'.format(orderNo,trackName)
shutil.move(filename,new_file)
except Exception as e:
print(e)





if __name__=='__main__':
rename_()

 
音频文件也更新了,详情见百度网盘。
 
 
======== 2018-10=============
 爬取喜马拉雅app上 杨继东的投资之道 的音频文件
运行环境:python3
# -*- coding: utf-8 -*-
# website: http://30daydo.com
# @Time : 2019/6/30 12:03
# @File : main.py

import requests
import re
url = 'https://www.ximalaya.com/revision/play/album?albumId=23057324&pageNum=1&sort=1&pageSize=60'
headers={'User-Agent':'Xiaomi'}

r = requests.get(url=url,headers=headers)
js_data = r.json()
data_list = js_data.get('data',{}).get('tracksAudioPlay',)
for item in data_list:
trackName=item.get('trackName')
trackName=re.sub(':','',trackName)
src_url = item.get('src')
try:
r0=requests.get(src_url,headers=headers)
except Exception as e:
print(e)
print(trackName)
else:
with open('{}.m4a'.format(trackName),'wb') as f:
f.write(r0.content)
print('{} downloaded'.format(trackName))

保存为main.py
然后运行 python main.py
稍微等几分钟就自动下载好了。


喜马拉雅.PNG


附下载好的音频文件:
链接:https://pan.baidu.com/s/1t_vJhTvSJSeFdI1IaDS6fA 
提取码:e3zb 
 

原创文章
转载请注明出处
http://30daydo.com/article/503 收起阅读 »

python3与python2迭代器的写法的区别

大部分相同,只是python2里面需要实现在类中实现next()方法,而python3里面需要实现__next__()方法。
 
附一个例子:
def iter_demo():

class DefineIter(object):

def __init__(self,length):
self.length = length
self.data = range(self.length)
self.index=0

def __iter__(self):
return self


def __next__(self):

if self.index >=self.length:
# return None
raise StopIteration

d = self.data[self.index]*50
self.index =self.index + 1

return d

a = DefineIter(10)
print(type(a))
for i in a:
print(i)
继续阅读 »
大部分相同,只是python2里面需要实现在类中实现next()方法,而python3里面需要实现__next__()方法。
 
附一个例子:
def iter_demo():

class DefineIter(object):

def __init__(self,length):
self.length = length
self.data = range(self.length)
self.index=0

def __iter__(self):
return self


def __next__(self):

if self.index >=self.length:
# return None
raise StopIteration

d = self.data[self.index]*50
self.index =self.index + 1

return d

a = DefineIter(10)
print(type(a))
for i in a:
print(i)
收起阅读 »

PyCharm 快捷键快速插入当前时间

个人觉得这是一个非常常用的功能,不过需要自定义实现。
 
方式
通过 Live Template 快速添加时间

步骤
1、添加一个 Template Group 命名为 Common
2、添加一个 Live Template 设置如下
Abbreviation: time
Description : current time
Template Text: $time$

Edit Variables -> Expresssion : date("yyyy-MM-dd HH:mm:ss")



3、让设置生效
Define->Everywhere

4、使用
输入 time 后 按下tab键 就能转换为当前时间了

 
继续阅读 »
个人觉得这是一个非常常用的功能,不过需要自定义实现。
 
方式
通过 Live Template 快速添加时间

步骤
1、添加一个 Template Group 命名为 Common
2、添加一个 Live Template 设置如下
Abbreviation: time
Description : current time
Template Text: $time$

Edit Variables -> Expresssion : date("yyyy-MM-dd HH:mm:ss")



3、让设置生效
Define->Everywhere

4、使用
输入 time 后 按下tab键 就能转换为当前时间了

  收起阅读 »

深圳转债转股后不可以撤单

亲身经历,深圳转债转股后可以撤单操作,并显示已撤单,但是晚上正常转股了。
说明转股后也是不能撤单的。
亲身经历,深圳转债转股后可以撤单操作,并显示已撤单,但是晚上正常转股了。
说明转股后也是不能撤单的。

修改easytrader国金证券的默认启动路径

如果你的国金证券不是安装在默认路径的话,会无法启动。报错:


pywinauto.application.AppStartError: Could not create the process "C:\全能行证券交易终端\xiadan.exe"
Error returned by CreateProcess: (2, 'CreateProcess', '系统找不到指定的文件。')


 
看了配置文件,也是没有具体的参数可以修改,只好修改源代码。
别听到改源代码就害怕,只是需要改一行就可以了。
 
找到文件:
site-package\easytrader\config\client.py
 
找过这一行:
class GJ(CommonConfig):
DEFAULT_EXE_PATH = "C:\\Tool\\xiadan.exe"
只要修改上面的路径就可以了。注意用双斜杠。
 
继续阅读 »
如果你的国金证券不是安装在默认路径的话,会无法启动。报错:


pywinauto.application.AppStartError: Could not create the process "C:\全能行证券交易终端\xiadan.exe"
Error returned by CreateProcess: (2, 'CreateProcess', '系统找不到指定的文件。')


 
看了配置文件,也是没有具体的参数可以修改,只好修改源代码。
别听到改源代码就害怕,只是需要改一行就可以了。
 
找到文件:
site-package\easytrader\config\client.py
 
找过这一行:
class GJ(CommonConfig):
DEFAULT_EXE_PATH = "C:\\Tool\\xiadan.exe"
只要修改上面的路径就可以了。注意用双斜杠。
  收起阅读 »

conda无法在win10下用命令行切换虚拟环境

虚拟环境已经安装好了
然后在PowerShell下运行activate py2,没有任何反应。(powershell是win7后面系统的增强命令行)
后来使用系统原始的cmd命令行,在运行里面敲入cmd,然后重新执行activate py2,问题得到解决了。
原因是兼容问题。
继续阅读 »
虚拟环境已经安装好了
然后在PowerShell下运行activate py2,没有任何反应。(powershell是win7后面系统的增强命令行)
后来使用系统原始的cmd命令行,在运行里面敲入cmd,然后重新执行activate py2,问题得到解决了。
原因是兼容问题。 收起阅读 »

使用pymongo中的find_one_and_update出错:需要分片键

错误信息如下:
  File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Query for sharded findAndModify must contain the shard key
2019-06-10 16:14:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-10 16:14:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

需要在查询语句中把分片键也添加进去。
因为findOneModify只会找一个记录,但是到底在哪个分片的记录呢? 因为不确定,所以才需要把shard加上去。
 
 
参考官方:
Targeted Operations vs. Broadcast Operations
Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.
For queries that don’t include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These “scatter/gather” queries can be long running operations.
Broadcast Operations
mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.

After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.
Multi-update operations are always broadcast operations.
The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.
Targeted Operations
mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.

For example, if the shard key is:
copy
{ a: 1, b: 1, c: 1 }

The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
copy
{ a: 1 }
{ a: 1, b: 1 }

All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.
All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.
Index Use
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.
Sharded Cluster Security
Use Internal Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.
See Deploy Sharded Cluster with Keyfile Access Control for a tutorial on deploying a secured shardedcluster.
Cluster Users
Sharded clusters support Role-Based Access Control (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Internal Authentication for inter-cluster security also enables user access controls via RBAC.
With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.
Each cluster has its own cluster users. These users cannot be used to access individual shards.
See Enable Access Control for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.
继续阅读 »
错误信息如下:
  File "C:\ProgramData\Anaconda3\lib\site-packages\pymongo\helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Query for sharded findAndModify must contain the shard key
2019-06-10 16:14:32 [scrapy.core.engine] INFO: Closing spider (finished)
2019-06-10 16:14:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

需要在查询语句中把分片键也添加进去。
因为findOneModify只会找一个记录,但是到底在哪个分片的记录呢? 因为不确定,所以才需要把shard加上去。
 
 
参考官方:
Targeted Operations vs. Broadcast Operations
Generally, the fastest queries in a sharded environment are those that mongos route to a single shard, using the shard key and the cluster meta data from the config server. These targeted operations use the shard key value to locate the shard or subset of shards that satisfy the query document.
For queries that don’t include the shard key, mongos must query all shards, wait for their responses and then return the result to the application. These “scatter/gather” queries can be long running operations.
Broadcast Operations
mongos instances broadcast queries to all shards for the collection unless the mongos can determine which shard or subset of shards stores this data.

After the mongos receives responses from all shards, it merges the data and returns the result document. The performance of a broadcast operation depends on the overall load of the cluster, as well as variables like network latency, individual shard load, and number of documents returned per shard. Whenever possible, favor operations that result in targeted operation over those that result in a broadcast operation.
Multi-update operations are always broadcast operations.
The updateMany() and deleteMany() methods are broadcast operations, unless the query document specifies the shard key in full.
Targeted Operations
mongos can route queries that include the shard key or the prefix of a compound shard key a specific shard or set of shards. mongos uses the shard key value to locate the chunk whose range includes the shard key value and directs the query at the shard containing that chunk.

For example, if the shard key is:
copy
{ a: 1, b: 1, c: 1 }

The mongos program can route queries that include the full shard key or either of the following shard key prefixes at a specific shard or set of shards:
copy
{ a: 1 }
{ a: 1, b: 1 }

All insertOne() operations target to one shard. Each document in the insertMany() array targets to a single shard, but there is no guarantee all documents in the array insert into a single shard.
All updateOne(), replaceOne() and deleteOne() operations must include the shard key or _id in the query document. MongoDB returns an error if these methods are used without the shard key or _id.
Depending on the distribution of data in the cluster and the selectivity of the query, mongos may still perform a broadcast operation to fulfill these queries.
Index Use
If the query does not include the shard key, the mongos must send the query to all shards as a “scatter/gather” operation. Each shard will, in turn, use either the shard key index or another more efficient index to fulfill the query.
If the query includes multiple sub-expressions that reference the fields indexed by the shard key and the secondary index, the mongos can route the queries to a specific shard and the shard will use the index that will allow it to fulfill most efficiently.
Sharded Cluster Security
Use Internal Authentication to enforce intra-cluster security and prevent unauthorized cluster components from accessing the cluster. You must start each mongod or mongos in the cluster with the appropriate security settings in order to enforce internal authentication.
See Deploy Sharded Cluster with Keyfile Access Control for a tutorial on deploying a secured shardedcluster.
Cluster Users
Sharded clusters support Role-Based Access Control (RBAC) for restricting unauthorized access to cluster data and operations. You must start each mongod in the cluster, including the config servers, with the --auth option in order to enforce RBAC. Alternatively, enforcing Internal Authentication for inter-cluster security also enables user access controls via RBAC.
With RBAC enforced, clients must specify a --username, --password, and --authenticationDatabase when connecting to the mongos in order to access cluster resources.
Each cluster has its own cluster users. These users cannot be used to access individual shards.
See Enable Access Control for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.
收起阅读 »

jupyter notebook格式的文件损坏如何修复

有时候用git同步时,造成了冲突后合并,jupyter notebook的文件被插入了诸如>>>>>HEAD,ORIGIN等字符,这时候再打开jupyter notebook文件(.ipynb后缀),会无法打开。修复过程:
 
使用下面的代码:
# 拯救损坏的jupyter 文件
import re
import codecs

pattern = re.compile('"source": \[(.*?)\]\s+\},',re.S)
filename = 'tushare_usage.ipynb'
with codecs.open(filename,encoding='utf8') as f:
content = f.read()

source = pattern.findall(content)
for s in source:
t=s.replace('\\n','')
t=re.sub('"','',t)
t=re.sub('(,$)','',t)
print(t)
只要把你要修复的文件替换一下就可以了。
继续阅读 »
有时候用git同步时,造成了冲突后合并,jupyter notebook的文件被插入了诸如>>>>>HEAD,ORIGIN等字符,这时候再打开jupyter notebook文件(.ipynb后缀),会无法打开。修复过程:
 
使用下面的代码:
# 拯救损坏的jupyter 文件
import re
import codecs

pattern = re.compile('"source": \[(.*?)\]\s+\},',re.S)
filename = 'tushare_usage.ipynb'
with codecs.open(filename,encoding='utf8') as f:
content = f.read()

source = pattern.findall(content)
for s in source:
t=s.replace('\\n','')
t=re.sub('"','',t)
t=re.sub('(,$)','',t)
print(t)
只要把你要修复的文件替换一下就可以了。 收起阅读 »

Warning: unable to run listCollections, attempting to approximate collection

在mongodb中参数查看数据库中的表是报错:

Warning: unable to run listCollections, attempting to approximate collection names by parsing connectionStatus

那是因为设置了密码,但是没有进行认证导致的错误。这个错误为啥不直接说明原因呢。汗
 
直接: db.auth('admin','密码')
认证成功返回1, 然后重新执行show tables就可以看到所有的表了。
继续阅读 »
在mongodb中参数查看数据库中的表是报错:

Warning: unable to run listCollections, attempting to approximate collection names by parsing connectionStatus

那是因为设置了密码,但是没有进行认证导致的错误。这个错误为啥不直接说明原因呢。汗
 
直接: db.auth('admin','密码')
认证成功返回1, 然后重新执行show tables就可以看到所有的表了。 收起阅读 »

python连接mongodb集群 cluster

网上资料比较少,自己测试了下。
连接方法如下:
import pymongo
db = pymongo.MongoClient('mongodb://10.18.6.46,10.18.6.26,10.18.6.102')
上面默认的端口do都是27017,如果是其他端口,需要这样修改:
db = pymongo.MongoClient('mongodb://10.18.6.46:8888,10.18.6.26:9999,10.18.6.102:7777')

然后就可以正常读写数据库:
 
读:
coll=db['testdb']['testcollection'].find()
for i in coll:
print(i)
输出内容:
{'_id': ObjectId('5cf4c7981ee9edff72e5c503'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c504'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c505'), 'username': 'hello'}
{'_id': ObjectId('5cf4c79a1ee9edff72e5c506'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7b21ee9edff72e5c507'), 'username': 'hello world'}



 
写:
collection = db['testdb']['testcollection']

for i in range(10):
collection.insert({'username':'huston{}'.format(i)})

 
原创文章,转载请注明出处:
http://30daydo.com/article/494
 
继续阅读 »
网上资料比较少,自己测试了下。
连接方法如下:
import pymongo
db = pymongo.MongoClient('mongodb://10.18.6.46,10.18.6.26,10.18.6.102')
上面默认的端口do都是27017,如果是其他端口,需要这样修改:
db = pymongo.MongoClient('mongodb://10.18.6.46:8888,10.18.6.26:9999,10.18.6.102:7777')

然后就可以正常读写数据库:
 
读:
coll=db['testdb']['testcollection'].find()
for i in coll:
print(i)
输出内容:
{'_id': ObjectId('5cf4c7981ee9edff72e5c503'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c504'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7991ee9edff72e5c505'), 'username': 'hello'}
{'_id': ObjectId('5cf4c79a1ee9edff72e5c506'), 'username': 'hello'}
{'_id': ObjectId('5cf4c7b21ee9edff72e5c507'), 'username': 'hello world'}



 
写:
collection = db['testdb']['testcollection']

for i in range(10):
collection.insert({'username':'huston{}'.format(i)})

 
原创文章,转载请注明出处:
http://30daydo.com/article/494
  收起阅读 »

ElasticSearch查看已经存在的文档保存在哪个分片


比如我有以下的文档:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "XxyrM2kBVzdNcvl_GHv2",
"_score" : 1.0,
"_source" : {
"name" : "Shiled",
"twitter" : "Sonny sql is awesome",
"date" : "2018-12-27",
"id" : 1240,
"tags" : [
"red",
"shit"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YByrM2kBVzdNcvl_tnvm",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 12357,
"tags" : [
"red"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "7777",
"_score" : 1.0,
"_source" : {
"name" : "Rocky Chen",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 9999
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YhzDN2kBVzdNcvl_enuT",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YxzDN2kBVzdNcvl_u3th",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
}
]
}
}

如果我想看看id是  "_id" : "YxzDN2kBVzdNcvl_u3th",

这个文档是保存在哪个分片,如何查看?
 
引用:


 

路由一个文档到一个分片中编辑
当索引一个文档的时候,文档会被存储到一个主分片中。 Elasticsearch 如何知道一个文档应该存放到哪个分片中呢?当我们创建文档时,它如何决定这个文档应当被存储在分片 1 还是分片 2 中呢?
首先这肯定不会是随机的,否则将来要获取文档的时候我们就不知道从何处寻找了。实际上,这个过程是根据下面这个公式决定的:
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。


 
那么可以使用

GET test/_search_shards?routing=ID号 来查看你要查询的id所在的分片

得到的结果:
{
"nodes" : {
"yl-qYmh1SXqzJsfI4d1ddw" : {
"name" : "node-3",
"ephemeral_id" : "UsJ9rFELTiCW07oHE9YMdg",
"transport_address" : "10.18.6.26:9300",
"attributes" : {
"ml.machine_memory" : "6088101888",
"rack" : "r1",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
}
},
"wT7wUd3iTkujYUsbVNVv-w" : {
"name" : "node-1",
"ephemeral_id" : "fP-vgSb0SdemnHDyaJUsWw",
"transport_address" : "10.18.6.102:9300",
"attributes" : {
"ml.machine_memory" : "8256720896",
"rack" : "r1",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
}
}
},
"indices" : {
"test" : { }
},
"shards" : [
[
{
"state" : "STARTED",
"primary" : true,
"node" : "wT7wUd3iTkujYUsbVNVv-w",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "k-8E4dL7QmGgwcsNsUCP6Q"
}
},
{
"state" : "STARTED",
"primary" : false,
"node" : "yl-qYmh1SXqzJsfI4d1ddw",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "lvOpQIKgRUibkulr3nRfEw"
}
}
]
]
}

只需要关注shards字段就可以,从上面可以看到,该文档存在shard 1 分片上。 分别在node1和node3节点,一个是主分片,一个是副本分片
继续阅读 »

比如我有以下的文档:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "XxyrM2kBVzdNcvl_GHv2",
"_score" : 1.0,
"_source" : {
"name" : "Shiled",
"twitter" : "Sonny sql is awesome",
"date" : "2018-12-27",
"id" : 1240,
"tags" : [
"red",
"shit"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YByrM2kBVzdNcvl_tnvm",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 12357,
"tags" : [
"red"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "7777",
"_score" : 1.0,
"_source" : {
"name" : "Rocky Chen",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 9999
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YhzDN2kBVzdNcvl_enuT",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
},
{
"_index" : "test",
"_type" : "mydoc",
"_id" : "YxzDN2kBVzdNcvl_u3th",
"_score" : 1.0,
"_source" : {
"name" : "YYerk",
"twitter" : "sql is awesome",
"date" : "2008-12-27",
"id" : 888888,
"tags" : [
"red",
"green"
]
}
}
]
}
}

如果我想看看id是  "_id" : "YxzDN2kBVzdNcvl_u3th",

这个文档是保存在哪个分片,如何查看?
 
引用:


 

路由一个文档到一个分片中编辑
当索引一个文档的时候,文档会被存储到一个主分片中。 Elasticsearch 如何知道一个文档应该存放到哪个分片中呢?当我们创建文档时,它如何决定这个文档应当被存储在分片 1 还是分片 2 中呢?
首先这肯定不会是随机的,否则将来要获取文档的时候我们就不知道从何处寻找了。实际上,这个过程是根据下面这个公式决定的:
shard = hash(routing) % number_of_primary_shards
routing 是一个可变值,默认是文档的 _id ,也可以设置成一个自定义的值。 routing 通过 hash 函数生成一个数字,然后这个数字再除以 number_of_primary_shards (主分片的数量)后得到 余数 。这个分布在 0 到 number_of_primary_shards-1 之间的余数,就是我们所寻求的文档所在分片的位置。
这就解释了为什么我们要在创建索引的时候就确定好主分片的数量 并且永远不会改变这个数量:因为如果数量变化了,那么所有之前路由的值都会无效,文档也再也找不到了。


 
那么可以使用

GET test/_search_shards?routing=ID号 来查看你要查询的id所在的分片

得到的结果:
{
"nodes" : {
"yl-qYmh1SXqzJsfI4d1ddw" : {
"name" : "node-3",
"ephemeral_id" : "UsJ9rFELTiCW07oHE9YMdg",
"transport_address" : "10.18.6.26:9300",
"attributes" : {
"ml.machine_memory" : "6088101888",
"rack" : "r1",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"ml.enabled" : "true"
}
},
"wT7wUd3iTkujYUsbVNVv-w" : {
"name" : "node-1",
"ephemeral_id" : "fP-vgSb0SdemnHDyaJUsWw",
"transport_address" : "10.18.6.102:9300",
"attributes" : {
"ml.machine_memory" : "8256720896",
"rack" : "r1",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
}
}
},
"indices" : {
"test" : { }
},
"shards" : [
[
{
"state" : "STARTED",
"primary" : true,
"node" : "wT7wUd3iTkujYUsbVNVv-w",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "k-8E4dL7QmGgwcsNsUCP6Q"
}
},
{
"state" : "STARTED",
"primary" : false,
"node" : "yl-qYmh1SXqzJsfI4d1ddw",
"relocating_node" : null,
"shard" : 1,
"index" : "test",
"allocation_id" : {
"id" : "lvOpQIKgRUibkulr3nRfEw"
}
}
]
]
}

只需要关注shards字段就可以,从上面可以看到,该文档存在shard 1 分片上。 分别在node1和node3节点,一个是主分片,一个是副本分片 收起阅读 »

elasticsearch在match查询里面使用了type字段 报错

POST get-together/_search
{
"query":
{
"match": {
"name": {
"type":"phrase",
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
 
报错:
 
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
}
],
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
},
"status": 400
}

 
在6.x已经不支持在math里面使用type,
可以修改为以下语法:
POST get-together/_search
{
"query":
{
"match_phrase": {
"name": {

"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}

得到的效果是一致的:
 
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.3243701,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3243701,
"_source" : {
"name" : "Enterprise search London get-together"
}
}
]
}
}
继续阅读 »
POST get-together/_search
{
"query":
{
"match": {
"name": {
"type":"phrase",
"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}
 
报错:
 
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
}
],
"type": "parsing_exception",
"reason": "[match] query does not support [type]",
"line": 6,
"col": 13
},
"status": 400
}

 
在6.x已经不支持在math里面使用type,
可以修改为以下语法:
POST get-together/_search
{
"query":
{
"match_phrase": {
"name": {

"query":"enterprise london",
"slop":1
}}
},
"_source": "name"
}

得到的效果是一致的:
 
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.3243701,
"hits" : [
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.3243701,
"_source" : {
"name" : "Enterprise search London get-together"
}
}
]
}
}
收起阅读 »

elasticsearch 更新文档的坑

POST cnbeta/doc/cUxO42oB9O-zF2ru-rs-/_update
{
"doc":{
"title":"中国操作系统"
}
}
 

那个body里面的”doc" 不能少
不然会报错:
 
{
    "error": {
        "root_cause": [
            {
                "type": "action_request_validation_exception",
                "reason": "Validation Failed: 1: script or doc is missing;"
            }
        ],
        "type": "action_request_validation_exception",
        "reason": "Validation Failed: 1: script or doc is missing;"
    },
    "status": 400
}
继续阅读 »
POST cnbeta/doc/cUxO42oB9O-zF2ru-rs-/_update
{
"doc":{
"title":"中国操作系统"
}
}
 

那个body里面的”doc" 不能少
不然会报错:
 
{
    "error": {
        "root_cause": [
            {
                "type": "action_request_validation_exception",
                "reason": "Validation Failed: 1: script or doc is missing;"
            }
        ],
        "type": "action_request_validation_exception",
        "reason": "Validation Failed: 1: script or doc is missing;"
    },
    "status": 400
} 收起阅读 »

requests直接post图片文件

代码如下:
    file_path=r'9927_15562445086485238.png'
file=open(file_path, 'rb').read()
r=requests.post(url=code_url,data=file)
print(r.text)
继续阅读 »
代码如下:
    file_path=r'9927_15562445086485238.png'
file=open(file_path, 'rb').read()
r=requests.post(url=code_url,data=file)
print(r.text)
收起阅读 »

python的mixin类

A mixin is a limited form of multiple inheritance.
 
maxin类似多重继承的一种限制形式:
 关于Python的Mixin模式

像C或C++这类语言都支持多重继承,一个子类可以有多个父类,这样的设计常被人诟病。因为继承应该是个”is-a”关系。比如轿车类继承交通工具类,因为轿车是一个(“is-a”)交通工具。一个物品不可能是多种不同的东西,因此就不应该存在多重继承。不过有没有这种情况,一个类的确是需要继承多个类呢?

答案是有,我们还是拿交通工具来举例子,民航飞机是一种交通工具,对于土豪们来说直升机也是一种交通工具。对于这两种交通工具,它们都有一个功能是飞行,但是轿车没有。所以,我们不可能将飞行功能写在交通工具这个父类中。但是如果民航飞机和直升机都各自写自己的飞行方法,又违背了代码尽可能重用的原则(如果以后飞行工具越来越多,那会出现许多重复代码)。怎么办,那就只好让这两种飞机同时继承交通工具以及飞行器两个父类,这样就出现了多重继承。这时又违背了继承必须是”is-a”关系。这个难题该怎么破?

不同的语言给出了不同的方法,让我们先来看下Java。Java提供了接口interface功能,来实现多重继承:
public abstract class Vehicle {
}

public interface Flyable {
public void fly();
}

public class FlyableImpl implements Flyable {
public void fly() {
System.out.println("I am flying");
}
}

public class Airplane extends Vehicle implements Flyable {
private flyable;

public Airplane() {
flyable = new FlyableImpl();
}

public void fly() {
flyable.fly();
}
}


现在我们的飞机同时具有了交通工具及飞行器两种属性,而且我们不需要重写飞行器中的飞行方法,同时我们没有破坏单一继承的原则。飞机就是一种交通工具,可飞行的能力是是飞机的属性,通过继承接口来获取。

回到主题,Python语言可没有接口功能,但是它可以多重继承。那Python是不是就该用多重继承来实现呢?是,也不是。说是,因为从语法上看,的确是通过多重继承实现的。说不是,因为它的继承依然遵守”is-a”关系,从含义上看依然遵循单继承的原则。这个怎么理解呢?我们还是看例子吧。
class Vehicle(object):
pass

class PlaneMixin(object):
def fly(self):
print 'I am flying'

class Airplane(Vehicle, PlaneMixin):
pass


可以看到,上面的Airplane类实现了多继承,不过它继承的第二个类我们起名为PlaneMixin,而不是Plane,这个并不影响功能,但是会告诉后来读代码的人,这个类是一个Mixin类。所以从含义上理解,Airplane只是一个Vehicle,不是一个Plane。这个Mixin,表示混入(mix-in),它告诉别人,这个类是作为功能添加到子类中,而不是作为父类,它的作用同Java中的接口。

使用Mixin类实现多重继承要非常小心
  • 首先它必须表示某一种功能,而不是某个物品,如同Java中的Runnable,Callable等

 
  • 其次它必须责任单一,如果有多个功能,那就写多个Mixin类
  • 然后,它不依赖于子类的实现
  • 最后,子类即便没有继承这个Mixin类,也照样可以工作,就是缺少了某个功能。(比如飞机照样可以载客,就是不能飞了^_^)

 
原创文章,转载请注明出处
http://30daydo.com/article/480
 
继续阅读 »
A mixin is a limited form of multiple inheritance.
 
maxin类似多重继承的一种限制形式:
 关于Python的Mixin模式

像C或C++这类语言都支持多重继承,一个子类可以有多个父类,这样的设计常被人诟病。因为继承应该是个”is-a”关系。比如轿车类继承交通工具类,因为轿车是一个(“is-a”)交通工具。一个物品不可能是多种不同的东西,因此就不应该存在多重继承。不过有没有这种情况,一个类的确是需要继承多个类呢?

答案是有,我们还是拿交通工具来举例子,民航飞机是一种交通工具,对于土豪们来说直升机也是一种交通工具。对于这两种交通工具,它们都有一个功能是飞行,但是轿车没有。所以,我们不可能将飞行功能写在交通工具这个父类中。但是如果民航飞机和直升机都各自写自己的飞行方法,又违背了代码尽可能重用的原则(如果以后飞行工具越来越多,那会出现许多重复代码)。怎么办,那就只好让这两种飞机同时继承交通工具以及飞行器两个父类,这样就出现了多重继承。这时又违背了继承必须是”is-a”关系。这个难题该怎么破?

不同的语言给出了不同的方法,让我们先来看下Java。Java提供了接口interface功能,来实现多重继承:
public abstract class Vehicle {
}

public interface Flyable {
public void fly();
}

public class FlyableImpl implements Flyable {
public void fly() {
System.out.println("I am flying");
}
}

public class Airplane extends Vehicle implements Flyable {
private flyable;

public Airplane() {
flyable = new FlyableImpl();
}

public void fly() {
flyable.fly();
}
}


现在我们的飞机同时具有了交通工具及飞行器两种属性,而且我们不需要重写飞行器中的飞行方法,同时我们没有破坏单一继承的原则。飞机就是一种交通工具,可飞行的能力是是飞机的属性,通过继承接口来获取。

回到主题,Python语言可没有接口功能,但是它可以多重继承。那Python是不是就该用多重继承来实现呢?是,也不是。说是,因为从语法上看,的确是通过多重继承实现的。说不是,因为它的继承依然遵守”is-a”关系,从含义上看依然遵循单继承的原则。这个怎么理解呢?我们还是看例子吧。
class Vehicle(object):
pass

class PlaneMixin(object):
def fly(self):
print 'I am flying'

class Airplane(Vehicle, PlaneMixin):
pass


可以看到,上面的Airplane类实现了多继承,不过它继承的第二个类我们起名为PlaneMixin,而不是Plane,这个并不影响功能,但是会告诉后来读代码的人,这个类是一个Mixin类。所以从含义上理解,Airplane只是一个Vehicle,不是一个Plane。这个Mixin,表示混入(mix-in),它告诉别人,这个类是作为功能添加到子类中,而不是作为父类,它的作用同Java中的接口。

使用Mixin类实现多重继承要非常小心
  • 首先它必须表示某一种功能,而不是某个物品,如同Java中的Runnable,Callable等

 
  • 其次它必须责任单一,如果有多个功能,那就写多个Mixin类
  • 然后,它不依赖于子类的实现
  • 最后,子类即便没有继承这个Mixin类,也照样可以工作,就是缺少了某个功能。(比如飞机照样可以载客,就是不能飞了^_^)

 
原创文章,转载请注明出处
http://30daydo.com/article/480
  收起阅读 »

截止今天(2019-05-14)银行股今年的涨幅排名

今年涨幅最少的是农业银行,最多的是新股 西安银行。
	ticker	secShortName	secFullName	y_chgPct
31 601288 农业银行 中国农业银行股份有限公司 -0.341178
11 600015 华夏银行 华夏银行股份有限公司 1.856174
45 601988 中国银行 中国银行股份有限公司 2.248533
32 601328 交通银行 交通银行股份有限公司 3.532657
30 601229 上海银行 上海银行股份有限公司 3.725781
35 601398 工商银行 中国工商银行股份有限公司 4.771403
40 601818 光大银行 中国光大银行股份有限公司 5.643119
27 601169 北京银行 北京银行股份有限公司 6.205580
14 600016 民生银行 中国民生银行股份有限公司 7.092815
5 002936 郑州银行 郑州银行股份有限公司 7.551112
49 601998 中信银行 中信银行股份有限公司 8.181956
43 601939 建设银行 中国建设银行股份有限公司 9.402651
41 601838 成都银行 成都银行股份有限公司 9.424554
52 603323 苏农银行 江苏苏州农村商业银行股份有限公司 12.375732
20 600926 杭州银行 杭州银行股份有限公司 12.933645
8 600000 浦发银行 上海浦东发展银行股份有限公司 14.752244
39 601577 长沙银行 长沙银行股份有限公司 14.792683
18 600908 无锡银行 无锡农村商业银行股份有限公司 16.181704
3 002807 江阴银行 江苏江阴农村商业银行股份有限公司 19.274586
48 601997 贵阳银行 贵阳银行股份有限公司 20.489563
4 002839 张家港行 江苏张家港农村商业银行股份有限公司 20.599511
25 601166 兴业银行 兴业银行股份有限公司 21.206503
24 601128 常熟银行 江苏常熟农村商业银行股份有限公司 21.571187
19 600919 江苏银行 江苏银行股份有限公司 23.218299
22 601009 南京银行 南京银行股份有限公司 26.297500
16 600036 招商银行 招商银行股份有限公司 27.518708
0 000001 平安银行 平安银行股份有限公司 31.624747
2 002142 宁波银行 宁波银行股份有限公司 31.729062
6 002948 青岛银行 青岛银行股份有限公司 48.602573
7 002958 青农商行 青岛农村商业银行股份有限公司 108.983776
42 601860 紫金银行 江苏紫金农村商业银行股份有限公司 115.147347
21 600928 西安银行 西安银行股份有限公司 128.496683
继续阅读 »
今年涨幅最少的是农业银行,最多的是新股 西安银行。
	ticker	secShortName	secFullName	y_chgPct
31 601288 农业银行 中国农业银行股份有限公司 -0.341178
11 600015 华夏银行 华夏银行股份有限公司 1.856174
45 601988 中国银行 中国银行股份有限公司 2.248533
32 601328 交通银行 交通银行股份有限公司 3.532657
30 601229 上海银行 上海银行股份有限公司 3.725781
35 601398 工商银行 中国工商银行股份有限公司 4.771403
40 601818 光大银行 中国光大银行股份有限公司 5.643119
27 601169 北京银行 北京银行股份有限公司 6.205580
14 600016 民生银行 中国民生银行股份有限公司 7.092815
5 002936 郑州银行 郑州银行股份有限公司 7.551112
49 601998 中信银行 中信银行股份有限公司 8.181956
43 601939 建设银行 中国建设银行股份有限公司 9.402651
41 601838 成都银行 成都银行股份有限公司 9.424554
52 603323 苏农银行 江苏苏州农村商业银行股份有限公司 12.375732
20 600926 杭州银行 杭州银行股份有限公司 12.933645
8 600000 浦发银行 上海浦东发展银行股份有限公司 14.752244
39 601577 长沙银行 长沙银行股份有限公司 14.792683
18 600908 无锡银行 无锡农村商业银行股份有限公司 16.181704
3 002807 江阴银行 江苏江阴农村商业银行股份有限公司 19.274586
48 601997 贵阳银行 贵阳银行股份有限公司 20.489563
4 002839 张家港行 江苏张家港农村商业银行股份有限公司 20.599511
25 601166 兴业银行 兴业银行股份有限公司 21.206503
24 601128 常熟银行 江苏常熟农村商业银行股份有限公司 21.571187
19 600919 江苏银行 江苏银行股份有限公司 23.218299
22 601009 南京银行 南京银行股份有限公司 26.297500
16 600036 招商银行 招商银行股份有限公司 27.518708
0 000001 平安银行 平安银行股份有限公司 31.624747
2 002142 宁波银行 宁波银行股份有限公司 31.729062
6 002948 青岛银行 青岛银行股份有限公司 48.602573
7 002958 青农商行 青岛农村商业银行股份有限公司 108.983776
42 601860 紫金银行 江苏紫金农村商业银行股份有限公司 115.147347
21 600928 西安银行 西安银行股份有限公司 128.496683
收起阅读 »

正则表达式替换中文换行符【python】

js里面的内容有中文的换行符。
使用正则表达式替换换行符。(也可以替换为任意字符)
js=re.sub('\r\n','',js)

完毕。
js里面的内容有中文的换行符。
使用正则表达式替换换行符。(也可以替换为任意字符)
js=re.sub('\r\n','',js)

完毕。

request header显示Provisional headers are shown

出现这个情况,一般是因为装了一些插件,比如屏蔽广告的插件 ad block导致的。
把插件卸载了问题就解决了。
出现这个情况,一般是因为装了一些插件,比如屏蔽广告的插件 ad block导致的。
把插件卸载了问题就解决了。

异步爬虫aiohttp post提交数据

基本的用法:
async def fetch(session,url, data):
async with session.post(url=url, data=data, headers=headers) as response:
return await response.json()

 完整的例子:
import aiohttp
import asyncio

page = 30

post_data = {
'page': 1,
'pageSize': 10,
'keyWord': '',
'dpIds': '',
}

headers = {

"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}

result=


async def fetch(session,url, data):
async with session.post(url=url, data=data, headers=headers) as response:
return await response.json()

async def parse(html):
xzcf_list = html.get('newtxzcfList')
if xzcf_list is None:
return
for i in xzcf_list:
result.append(i)

async def downlod(page):
data=post_data.copy()
data['page']=page
url = 'http://credit.chaozhou.gov.cn/tfieldTypeActionJson!initXzcfListnew.do'
async with aiohttp.ClientSession() as session:
html=await fetch(session,url,data)
await parse(html)

loop = asyncio.get_event_loop()
tasks=[asyncio.ensure_future(downlod(i)) for i in range(1,page)]
tasks=asyncio.gather(*tasks)
# print(tasks)
loop.run_until_complete(tasks)
# loop.close()
# print(result)
count=0
for i in result:
print(i.get('cfXdrMc'))
count+=1
print(f'total {count}')
继续阅读 »
基本的用法:
async def fetch(session,url, data):
async with session.post(url=url, data=data, headers=headers) as response:
return await response.json()

 完整的例子:
import aiohttp
import asyncio

page = 30

post_data = {
'page': 1,
'pageSize': 10,
'keyWord': '',
'dpIds': '',
}

headers = {

"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}

result=


async def fetch(session,url, data):
async with session.post(url=url, data=data, headers=headers) as response:
return await response.json()

async def parse(html):
xzcf_list = html.get('newtxzcfList')
if xzcf_list is None:
return
for i in xzcf_list:
result.append(i)

async def downlod(page):
data=post_data.copy()
data['page']=page
url = 'http://credit.chaozhou.gov.cn/tfieldTypeActionJson!initXzcfListnew.do'
async with aiohttp.ClientSession() as session:
html=await fetch(session,url,data)
await parse(html)

loop = asyncio.get_event_loop()
tasks=[asyncio.ensure_future(downlod(i)) for i in range(1,page)]
tasks=asyncio.gather(*tasks)
# print(tasks)
loop.run_until_complete(tasks)
# loop.close()
# print(result)
count=0
for i in result:
print(i.get('cfXdrMc'))
count+=1
print(f'total {count}')
收起阅读 »

python异步aiohttp爬虫 - 异步爬取链家数据

import requests
from lxml import etree
import asyncio
import aiohttp
import pandas
import re
import math
import time

loction_info = ''' 1→杭州
2→武汉
3→北京
按ENTER确认:'''
loction_select = input(loction_info)
loction_dic = {'1': 'hz',
'2': 'wh',
'3': 'bj'}
city_url = 'https://{}.lianjia.com/ershoufang/'.format(loction_dic[loction_select])
down = input('请输入价格下限(万):')
up = input('请输入价格上限(万):')

inter_list = [(int(down), int(up))]


def half_inter(inter):
lower = inter[0]
upper = inter[1]
delta = int((upper - lower) / 2)
inter_list.remove(inter)
print('已经缩小价格区间', inter)
inter_list.append((lower, lower + delta))
inter_list.append((lower + delta, upper))


pagenum = {}


def get_num(inter):
url = city_url + 'bp{}ep{}/'.format(inter[0], inter[1])
r = requests.get(url).text
print(r)
num = int(etree.HTML(r).xpath("//h2[@class='total fl']/span/text()")[0].strip())
pagenum[(inter[0], inter[1])] = num
return num


totalnum = get_num(inter_list[0])

judge = True
while judge:
a = [get_num(x) > 3000 for x in inter_list]
if True in a:
judge = True
else:
judge = False
for i in inter_list:
if get_num(i) > 3000:
half_inter(i)
print('价格区间缩小完毕!')

url_lst = []
url_lst_failed = []
url_lst_successed = []
url_lst_duplicated = []

for i in inter_list:
totalpage = math.ceil(pagenum[i] / 30)
for j in range(1, totalpage + 1):
url = city_url + 'pg{}bp{}ep{}/'.format(j, i[0], i[1])
url_lst.append(url)
print('url列表获取完毕!')

info_lst = []


async def get_info(url):
async with aiohttp.ClientSession() as session:
async with session.get(url, timeout=5) as resp:
if resp.status != 200:
url_lst_failed.append(url)
else:
url_lst_successed.append(url)
r = await resp.text()
nodelist = etree.HTML(r).xpath("//ul[@class='sellListContent']/li")
# print('-------------------------------------------------------------')
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url),len(url_lst)))
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url), len(url_lst)))
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url), len(url_lst)))
# print('-------------------------------------------------------------')
info_dic = {}
index = 1
print('开始抓取{}'.format(resp.url))
print('开始抓取{}'.format(resp.url))
print('开始抓取{}'.format(resp.url))
for node in nodelist:
try:
info_dic['title'] = node.xpath(".//div[@class='title']/a/text()")[0]
except:
info_dic['title'] = '/'
try:
info_dic['href'] = node.xpath(".//div[@class='title']/a/@href")[0]
except:
info_dic['href'] = '/'
try:
info_dic['xiaoqu'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[0]
except:
info_dic['xiaoqu'] = '/'
try:
info_dic['huxing'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[1]
except:
info_dic['huxing'] = '/'
try:
info_dic['area'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[2]
except:
info_dic['area'] = '/'
try:
info_dic['chaoxiang'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[3]
except:
info_dic['chaoxiang'] = '/'
try:
info_dic['zhuangxiu'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[4]
except:
info_dic['zhuangxiu'] = '/'
try:
info_dic['dianti'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[5]
except:
info_dic['dianti'] = '/'
try:
info_dic['louceng'] = re.findall('\((.*)\)', node.xpath(".//div[@class='positionInfo']/text()")[0])
except:
info_dic['louceng'] = '/'
try:
info_dic['nianxian'] = re.findall('\)(.*?)年', node.xpath(".//div[@class='positionInfo']/text()")[0])
except:
info_dic['nianxian'] = '/'
try:
info_dic['guanzhu'] = ''.join(re.findall('[0-9]', node.xpath(".//div[@class='followInfo']/text()")[
0].replace(' ', '').split('/')[0]))
except:
info_dic['guanzhu'] = '/'
try:
info_dic['daikan'] = ''.join(re.findall('[0-9]',
node.xpath(".//div[@class='followInfo']/text()")[0].replace(
' ', '').split('/')[1]))
except:
info_dic['daikan'] = '/'
try:
info_dic['fabu'] = node.xpath(".//div[@class='followInfo']/text()")[0].replace(' ', '').split('/')[
2]
except:
info_dic['fabu'] = '/'
try:
info_dic['totalprice'] = node.xpath(".//div[@class='totalPrice']/span/text()")[0]
except:
info_dic['totalprice'] = '/'
try:
info_dic['unitprice'] = node.xpath(".//div[@class='unitPrice']/span/text()")[0].replace('单价', '')
except:
info_dic['unitprice'] = '/'
if True in [info_dic['href'] in dic.values() for dic in info_lst]:
url_lst_duplicated.append(info_dic)
else:
info_lst.append(info_dic)
print('第{}条: {}→房屋信息抓取完毕!'.format(index, info_dic['title']))
index += 1
info_dic = {}


start = time.time()

# 首次抓取url_lst中的信息,部分url没有对其发起请求,不知道为什么
tasks = [asyncio.ensure_future(get_info(url)) for url in url_lst]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

# 将没有发起请求的url放入一个列表,对其进行循环抓取,直到所有url都被发起请求
url_lst_unrequested = []
for url in url_lst:
if url not in url_lst_successed or url_lst_failed:
url_lst_unrequested.append(url)
while len(url_lst_unrequested) > 0:
tasks_unrequested = [asyncio.ensure_future(get_info(url)) for url in url_lst_unrequested]
loop.run_until_complete(asyncio.wait(tasks_unrequested))
url_lst_unrequested = []
for url in url_lst:
if url not in url_lst_successed:
url_lst_unrequested.append(url)
end = time.time()
print('当前价格区间段内共有{}套二手房源\(包含{}条重复房源\),实际获得{}条房源信息。'.format(totalnum, len(url_lst_duplicated), len(info_lst)))
print('总共耗时{}秒'.format(end - start))

df = pandas.DataFrame(info_lst)
df.to_csv("ljwh.csv", encoding='gbk')
继续阅读 »
import requests
from lxml import etree
import asyncio
import aiohttp
import pandas
import re
import math
import time

loction_info = ''' 1→杭州
2→武汉
3→北京
按ENTER确认:'''
loction_select = input(loction_info)
loction_dic = {'1': 'hz',
'2': 'wh',
'3': 'bj'}
city_url = 'https://{}.lianjia.com/ershoufang/'.format(loction_dic[loction_select])
down = input('请输入价格下限(万):')
up = input('请输入价格上限(万):')

inter_list = [(int(down), int(up))]


def half_inter(inter):
lower = inter[0]
upper = inter[1]
delta = int((upper - lower) / 2)
inter_list.remove(inter)
print('已经缩小价格区间', inter)
inter_list.append((lower, lower + delta))
inter_list.append((lower + delta, upper))


pagenum = {}


def get_num(inter):
url = city_url + 'bp{}ep{}/'.format(inter[0], inter[1])
r = requests.get(url).text
print(r)
num = int(etree.HTML(r).xpath("//h2[@class='total fl']/span/text()")[0].strip())
pagenum[(inter[0], inter[1])] = num
return num


totalnum = get_num(inter_list[0])

judge = True
while judge:
a = [get_num(x) > 3000 for x in inter_list]
if True in a:
judge = True
else:
judge = False
for i in inter_list:
if get_num(i) > 3000:
half_inter(i)
print('价格区间缩小完毕!')

url_lst = []
url_lst_failed = []
url_lst_successed = []
url_lst_duplicated = []

for i in inter_list:
totalpage = math.ceil(pagenum[i] / 30)
for j in range(1, totalpage + 1):
url = city_url + 'pg{}bp{}ep{}/'.format(j, i[0], i[1])
url_lst.append(url)
print('url列表获取完毕!')

info_lst = []


async def get_info(url):
async with aiohttp.ClientSession() as session:
async with session.get(url, timeout=5) as resp:
if resp.status != 200:
url_lst_failed.append(url)
else:
url_lst_successed.append(url)
r = await resp.text()
nodelist = etree.HTML(r).xpath("//ul[@class='sellListContent']/li")
# print('-------------------------------------------------------------')
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url),len(url_lst)))
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url), len(url_lst)))
# print('开始抓取第{}个页面的数据,共计{}个页面'.format(url_lst.index(url), len(url_lst)))
# print('-------------------------------------------------------------')
info_dic = {}
index = 1
print('开始抓取{}'.format(resp.url))
print('开始抓取{}'.format(resp.url))
print('开始抓取{}'.format(resp.url))
for node in nodelist:
try:
info_dic['title'] = node.xpath(".//div[@class='title']/a/text()")[0]
except:
info_dic['title'] = '/'
try:
info_dic['href'] = node.xpath(".//div[@class='title']/a/@href")[0]
except:
info_dic['href'] = '/'
try:
info_dic['xiaoqu'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[0]
except:
info_dic['xiaoqu'] = '/'
try:
info_dic['huxing'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[1]
except:
info_dic['huxing'] = '/'
try:
info_dic['area'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[2]
except:
info_dic['area'] = '/'
try:
info_dic['chaoxiang'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[3]
except:
info_dic['chaoxiang'] = '/'
try:
info_dic['zhuangxiu'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[4]
except:
info_dic['zhuangxiu'] = '/'
try:
info_dic['dianti'] = \
node.xpath(".//div[@class='houseInfo']")[0].xpath('string(.)').replace(' ', '').split('|')[5]
except:
info_dic['dianti'] = '/'
try:
info_dic['louceng'] = re.findall('\((.*)\)', node.xpath(".//div[@class='positionInfo']/text()")[0])
except:
info_dic['louceng'] = '/'
try:
info_dic['nianxian'] = re.findall('\)(.*?)年', node.xpath(".//div[@class='positionInfo']/text()")[0])
except:
info_dic['nianxian'] = '/'
try:
info_dic['guanzhu'] = ''.join(re.findall('[0-9]', node.xpath(".//div[@class='followInfo']/text()")[
0].replace(' ', '').split('/')[0]))
except:
info_dic['guanzhu'] = '/'
try:
info_dic['daikan'] = ''.join(re.findall('[0-9]',
node.xpath(".//div[@class='followInfo']/text()")[0].replace(
' ', '').split('/')[1]))
except:
info_dic['daikan'] = '/'
try:
info_dic['fabu'] = node.xpath(".//div[@class='followInfo']/text()")[0].replace(' ', '').split('/')[
2]
except:
info_dic['fabu'] = '/'
try:
info_dic['totalprice'] = node.xpath(".//div[@class='totalPrice']/span/text()")[0]
except:
info_dic['totalprice'] = '/'
try:
info_dic['unitprice'] = node.xpath(".//div[@class='unitPrice']/span/text()")[0].replace('单价', '')
except:
info_dic['unitprice'] = '/'
if True in [info_dic['href'] in dic.values() for dic in info_lst]:
url_lst_duplicated.append(info_dic)
else:
info_lst.append(info_dic)
print('第{}条: {}→房屋信息抓取完毕!'.format(index, info_dic['title']))
index += 1
info_dic = {}


start = time.time()

# 首次抓取url_lst中的信息,部分url没有对其发起请求,不知道为什么
tasks = [asyncio.ensure_future(get_info(url)) for url in url_lst]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

# 将没有发起请求的url放入一个列表,对其进行循环抓取,直到所有url都被发起请求
url_lst_unrequested = []
for url in url_lst:
if url not in url_lst_successed or url_lst_failed:
url_lst_unrequested.append(url)
while len(url_lst_unrequested) > 0:
tasks_unrequested = [asyncio.ensure_future(get_info(url)) for url in url_lst_unrequested]
loop.run_until_complete(asyncio.wait(tasks_unrequested))
url_lst_unrequested = []
for url in url_lst:
if url not in url_lst_successed:
url_lst_unrequested.append(url)
end = time.time()
print('当前价格区间段内共有{}套二手房源\(包含{}条重复房源\),实际获得{}条房源信息。'.format(totalnum, len(url_lst_duplicated), len(info_lst)))
print('总共耗时{}秒'.format(end - start))

df = pandas.DataFrame(info_lst)
df.to_csv("ljwh.csv", encoding='gbk')
收起阅读 »

神经网络中数值梯度的计算 python代码

深度学习入门python
 
import matplotlib.pyplot as plt
import numpy as np
import time
from collections import OrderedDict

def softmax(a):
a = a - np.max(a)
exp_a = np.exp(a)
exp_a_sum = np.sum(exp_a)
return exp_a / exp_a_sum


def cross_entropy_error(t, y):
delta = 1e-7
s = -1 * np.sum(t * np.log(y + delta))
# print('cross entropy ',s)
return s


class simpleNet:
def __init__(self):
self.W = np.random.randn(2, 3)

def predict(self, x):
print('current w',self.W)
return np.dot(x, self.W)

def loss(self, x, t):
z = self.predict(x)
# print(z)
# print(z.ndim)
y = softmax(z)
# print('y',y)
loss = cross_entropy_error(y, t) # y为预测的值
return loss



def numerical_gradient_(f, x): # 针对2维的情况 甚至是多维
h = 1e-4 # 0.0001
grad = np.zeros_like(x)

it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx', idx)
tmp_val = x[idx]
x[idx] = float(tmp_val) + h
fxh1 = f(x) # f(x+h)
print('fxh1 ', fxh1)
# print('current W', net.W)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2 ', fxh2)
# print('next currnet W ', net.W)
grad[idx] = (fxh1 - fxh2) / (2 * h)

x[idx] = tmp_val # 还原值
it.iternext()

return grad



net = simpleNet()
x=np.array([0.6,0.9])
t = np.array([0.0,0.0,1.0])

def f(W):
return net.loss(x,t)

grads =numerical_gradient_(f,net.W)
print(grads)
继续阅读 »
深度学习入门python
 
import matplotlib.pyplot as plt
import numpy as np
import time
from collections import OrderedDict

def softmax(a):
a = a - np.max(a)
exp_a = np.exp(a)
exp_a_sum = np.sum(exp_a)
return exp_a / exp_a_sum


def cross_entropy_error(t, y):
delta = 1e-7
s = -1 * np.sum(t * np.log(y + delta))
# print('cross entropy ',s)
return s


class simpleNet:
def __init__(self):
self.W = np.random.randn(2, 3)

def predict(self, x):
print('current w',self.W)
return np.dot(x, self.W)

def loss(self, x, t):
z = self.predict(x)
# print(z)
# print(z.ndim)
y = softmax(z)
# print('y',y)
loss = cross_entropy_error(y, t) # y为预测的值
return loss



def numerical_gradient_(f, x): # 针对2维的情况 甚至是多维
h = 1e-4 # 0.0001
grad = np.zeros_like(x)

it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = it.multi_index
print('idx', idx)
tmp_val = x[idx]
x[idx] = float(tmp_val) + h
fxh1 = f(x) # f(x+h)
print('fxh1 ', fxh1)
# print('current W', net.W)
x[idx] = tmp_val - h
fxh2 = f(x) # f(x-h)
print('fxh2 ', fxh2)
# print('next currnet W ', net.W)
grad[idx] = (fxh1 - fxh2) / (2 * h)

x[idx] = tmp_val # 还原值
it.iternext()

return grad



net = simpleNet()
x=np.array([0.6,0.9])
t = np.array([0.0,0.0,1.0])

def f(W):
return net.loss(x,t)

grads =numerical_gradient_(f,net.W)
print(grads)
收起阅读 »

【可转债剩余转股比例数据排序】【2019-05-06】

数据如下:

剩余比例1.PNG


剩余比例2.PNG

 
剩余的比例越少,上市公司下调转股价的欲望就越少。 也就是会任由可转债在那里晾着,不会积极拉正股。
 
数据定期更新。
 
原创文章,
转载请注明出处:
http://30daydo.com/article/472
 
继续阅读 »
数据如下:

剩余比例1.PNG


剩余比例2.PNG

 
剩余的比例越少,上市公司下调转股价的欲望就越少。 也就是会任由可转债在那里晾着,不会积极拉正股。
 
数据定期更新。
 
原创文章,
转载请注明出处:
http://30daydo.com/article/472
  收起阅读 »

ElasticSearch配置集群无法发现节点问题【已解决】

单个节点可以运行,但是配置为多个服务器集群的时候,总是提示无法发现服务器,花了点时间排查了问题,原来是配置文件的timeout问题,需要把timetout的值设置大一些,然后集群就可以发现到局域网中的其他节点。
 
修改文件elasticsearch.yml 文件中的timeout参数,改成原来值得10倍就可以了。
继续阅读 »
单个节点可以运行,但是配置为多个服务器集群的时候,总是提示无法发现服务器,花了点时间排查了问题,原来是配置文件的timeout问题,需要把timetout的值设置大一些,然后集群就可以发现到局域网中的其他节点。
 
修改文件elasticsearch.yml 文件中的timeout参数,改成原来值得10倍就可以了。 收起阅读 »

pycharm激活码 有效期到2019年11月

IDE
pycharm专业版激活码如下,亲测有效,有效期到2019年11月7日
 
MTW881U3Z5-eyJsaWNlbnNlSWQiOiJNVFc4ODFVM1o1IiwibGljZW5zZWVOYW1lIjoiTnNzIEltIiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IkZvciBlZHVjYXRpb25hbCB1c2Ugb25seSIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJBQyIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkRQTiIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IlBTIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiR08iLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJETSIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkNMIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUlMwIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUkMiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJSRCIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IlBDIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUk0iLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJXUyIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkRCIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiREMiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJSU1UiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifV0sImhhc2giOiIxMDgyODE0Ni8wIiwiZ3JhY2VQZXJpb2REYXlzIjowLCJhdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlLCJpc0F1dG9Qcm9sb25nYXRlZCI6ZmFsc2V9-aKyalfjUfiV5UXfhaMGgOqrMzTYy2rnsmobL47k8tTpR/jvG6HeL3FxxleetI+W+Anw3ZSe8QAMsSxqVS4podwlQgIe7f+3w7zyAT1j8HMVlfl2h96KzygdGpDSbwTbwOkJ6/5TQOPgAP86mkaSiM97KgvkZV/2nXQHRz1yhm+MT+OsioTwxDhd/22sSGq6KuIztZ03UvSciEmyrPdl2ueJw1WuT9YmFjdtTm9G7LuXvCM6eav+BgCRm+wwtUeDfoQqigbp0t6FQgkdQrcjoWvLSB0IUgp/f4qGf254fA7lXskT2VCFdDvi0jgxLyMVct1cKnPdM6fkHnbdSXKYDWw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxcQkq+zdxlR2mmRYBPzGbUNdMN6OaXiXzxIWtMEkrJMO/5oUfQJbLLuMSMK0QHFmaI37WShyxZcfRCidwXjot4zmNBKnlyHodDij/78TmVqFl8nOeD5+07B8VEaIu7c3E1N+e1doC6wht4I4+IEmtsPAdoaj5WCQVQbrI8KeT8M9VcBIWX7fD0fhexfg3ZRt0xqwMcXGNp3DdJHiO0rCdU+Itv7EmtnSVq9jBG1usMSFvMowR25mju2JcPFp1+I4ZI+FqgR8gyG8oiNDyNEoAbsR3lOpI7grUYSvkB/xVy/VoklPCK2h0f0GJxFjnye8NT1PAywoyl7RmiAVRE/EKwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQAF8uc+YJOHHwOFcPzmbjcxNDuGoOUIP+2h1R75Lecswb7ru2LWWSUMtXVKQzChLNPn/72W0k+oI056tgiwuG7M49LXp4zQVlQnFmWU1wwGvVhq5R63Rpjx1zjGUhcXgayu7+9zMUW596Lbomsg8qVve6euqsrFicYkIIuUu4zYPndJwfe0YkS5nY72SHnNdbPhEnN8wcB2Kz+OIG0lih3yz5EqFhld03bGp222ZQCIghCTVL6QBNadGsiN/lWLl4JdR3lJkZzlpFdiHijoVRdWeSWqM4y0t23c92HXKrgppoSV18XMxrWVdoSM3nuMHwxGhFyde05OdDtLpCv+jlWf5REAHHA201pAU6bJSZINyHDUTB+Beo28rRXSwSh3OUIvYwKNVeoBY+KwOJ7WnuTCUq1meE6GkKc4D/cXmgpOyW/1SmBz3XjVIi/zprZ0zf3qH5mkphtg6ksjKgKjmx1cXfZAAX6wcDBNaCL+Ortep1Dh8xDUbqbBVNBL4jbiL3i3xsfNiyJgaZ5sX7i8tmStEpLbPwvHcByuf59qJhV/bZOl8KqJBETCDJcY6O2aqhTUy+9x93ThKs1GKrRPePrWPluud7ttlgtRveit/pcBrnQcXOl1rHq7ByB8CFAxNotRUYL9IF5n3wJOgkPojMy6jetQA5Ogc8Sm7RG6vg1yow==
原创文章
转载请注明出处:
http://30daydo.com/article/470
 
继续阅读 »
pycharm专业版激活码如下,亲测有效,有效期到2019年11月7日
 
MTW881U3Z5-eyJsaWNlbnNlSWQiOiJNVFc4ODFVM1o1IiwibGljZW5zZWVOYW1lIjoiTnNzIEltIiwiYXNzaWduZWVOYW1lIjoiIiwiYXNzaWduZWVFbWFpbCI6IiIsImxpY2Vuc2VSZXN0cmljdGlvbiI6IkZvciBlZHVjYXRpb25hbCB1c2Ugb25seSIsImNoZWNrQ29uY3VycmVudFVzZSI6ZmFsc2UsInByb2R1Y3RzIjpbeyJjb2RlIjoiSUkiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJBQyIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkRQTiIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IlBTIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiR08iLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJETSIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkNMIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUlMwIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUkMiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJSRCIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IlBDIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiUk0iLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJXUyIsInBhaWRVcFRvIjoiMjAxOS0xMS0wNiJ9LHsiY29kZSI6IkRCIiwicGFpZFVwVG8iOiIyMDE5LTExLTA2In0seyJjb2RlIjoiREMiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifSx7ImNvZGUiOiJSU1UiLCJwYWlkVXBUbyI6IjIwMTktMTEtMDYifV0sImhhc2giOiIxMDgyODE0Ni8wIiwiZ3JhY2VQZXJpb2REYXlzIjowLCJhdXRvUHJvbG9uZ2F0ZWQiOmZhbHNlLCJpc0F1dG9Qcm9sb25nYXRlZCI6ZmFsc2V9-aKyalfjUfiV5UXfhaMGgOqrMzTYy2rnsmobL47k8tTpR/jvG6HeL3FxxleetI+W+Anw3ZSe8QAMsSxqVS4podwlQgIe7f+3w7zyAT1j8HMVlfl2h96KzygdGpDSbwTbwOkJ6/5TQOPgAP86mkaSiM97KgvkZV/2nXQHRz1yhm+MT+OsioTwxDhd/22sSGq6KuIztZ03UvSciEmyrPdl2ueJw1WuT9YmFjdtTm9G7LuXvCM6eav+BgCRm+wwtUeDfoQqigbp0t6FQgkdQrcjoWvLSB0IUgp/f4qGf254fA7lXskT2VCFdDvi0jgxLyMVct1cKnPdM6fkHnbdSXKYDWw==-MIIElTCCAn2gAwIBAgIBCTANBgkqhkiG9w0BAQsFADAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBMB4XDTE4MTEwMTEyMjk0NloXDTIwMTEwMjEyMjk0NlowaDELMAkGA1UEBhMCQ1oxDjAMBgNVBAgMBU51c2xlMQ8wDQYDVQQHDAZQcmFndWUxGTAXBgNVBAoMEEpldEJyYWlucyBzLnIuby4xHTAbBgNVBAMMFHByb2QzeS1mcm9tLTIwMTgxMTAxMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAxcQkq+zdxlR2mmRYBPzGbUNdMN6OaXiXzxIWtMEkrJMO/5oUfQJbLLuMSMK0QHFmaI37WShyxZcfRCidwXjot4zmNBKnlyHodDij/78TmVqFl8nOeD5+07B8VEaIu7c3E1N+e1doC6wht4I4+IEmtsPAdoaj5WCQVQbrI8KeT8M9VcBIWX7fD0fhexfg3ZRt0xqwMcXGNp3DdJHiO0rCdU+Itv7EmtnSVq9jBG1usMSFvMowR25mju2JcPFp1+I4ZI+FqgR8gyG8oiNDyNEoAbsR3lOpI7grUYSvkB/xVy/VoklPCK2h0f0GJxFjnye8NT1PAywoyl7RmiAVRE/EKwIDAQABo4GZMIGWMAkGA1UdEwQCMAAwHQYDVR0OBBYEFGEpG9oZGcfLMGNBkY7SgHiMGgTcMEgGA1UdIwRBMD+AFKOetkhnQhI2Qb1t4Lm0oFKLl/GzoRykGjAYMRYwFAYDVQQDDA1KZXRQcm9maWxlIENBggkA0myxg7KDeeEwEwYDVR0lBAwwCgYIKwYBBQUHAwEwCwYDVR0PBAQDAgWgMA0GCSqGSIb3DQEBCwUAA4ICAQAF8uc+YJOHHwOFcPzmbjcxNDuGoOUIP+2h1R75Lecswb7ru2LWWSUMtXVKQzChLNPn/72W0k+oI056tgiwuG7M49LXp4zQVlQnFmWU1wwGvVhq5R63Rpjx1zjGUhcXgayu7+9zMUW596Lbomsg8qVve6euqsrFicYkIIuUu4zYPndJwfe0YkS5nY72SHnNdbPhEnN8wcB2Kz+OIG0lih3yz5EqFhld03bGp222ZQCIghCTVL6QBNadGsiN/lWLl4JdR3lJkZzlpFdiHijoVRdWeSWqM4y0t23c92HXKrgppoSV18XMxrWVdoSM3nuMHwxGhFyde05OdDtLpCv+jlWf5REAHHA201pAU6bJSZINyHDUTB+Beo28rRXSwSh3OUIvYwKNVeoBY+KwOJ7WnuTCUq1meE6GkKc4D/cXmgpOyW/1SmBz3XjVIi/zprZ0zf3qH5mkphtg6ksjKgKjmx1cXfZAAX6wcDBNaCL+Ortep1Dh8xDUbqbBVNBL4jbiL3i3xsfNiyJgaZ5sX7i8tmStEpLbPwvHcByuf59qJhV/bZOl8KqJBETCDJcY6O2aqhTUy+9x93ThKs1GKrRPePrWPluud7ttlgtRveit/pcBrnQcXOl1rHq7ByB8CFAxNotRUYL9IF5n3wJOgkPojMy6jetQA5Ogc8Sm7RG6vg1yow==
原创文章
转载请注明出处:
http://30daydo.com/article/470
  收起阅读 »

numpy flatten函数的用法


把数据展平,无论多少维的数据,变为1维

例子:
x=np.array([[1,2,3,4],[5,6,7,8]])
x
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

然后对x进行flatten操作
x.flatten()

得到的数据:
array([1, 2, 3, 4, 5, 6, 7, 8])

你也可以指定展平的轴,设定axis就可以了.
继续阅读 »

把数据展平,无论多少维的数据,变为1维

例子:
x=np.array([[1,2,3,4],[5,6,7,8]])
x
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

然后对x进行flatten操作
x.flatten()

得到的数据:
array([1, 2, 3, 4, 5, 6, 7, 8])

你也可以指定展平的轴,设定axis就可以了. 收起阅读 »

发现numpy一个很坑的问题,要一定级别的高手才能发现问题

一个二元一次方程:
y=X0**2+X1**2   # **2 是平方
def function_2(x):
return x[0]**2+x[1]**2

 
下面是计算y的偏导数,分布计算X0和X1的偏导
def numerical_gradient(f,x):
grad = np.zeros_like(x)
h=1e-4
for idx in range(x.size):
temp_v = x[idx]
x[idx]=temp_v+h
f1=f(x)
print(x,f1)
x[idx]=temp_v-h
f2=f(x)
print(x,f2)
ret = (f1-f2)/(2*h)
print(ret)
x[idx]=temp_v
grad[idx]=ret

return grad

然后调用
numerical_gradient(function_2,np.array([3,4]))

计算的是二元一次方程 y=X0**2+X1**2  在点(3,4)的偏导的值
得到的是什么结果?
为什么会得到这样的结果? 
小白一般要花点时间才能找到原因。
 
继续阅读 »
一个二元一次方程:
y=X0**2+X1**2   # **2 是平方
def function_2(x):
return x[0]**2+x[1]**2

 
下面是计算y的偏导数,分布计算X0和X1的偏导
def numerical_gradient(f,x):
grad = np.zeros_like(x)
h=1e-4
for idx in range(x.size):
temp_v = x[idx]
x[idx]=temp_v+h
f1=f(x)
print(x,f1)
x[idx]=temp_v-h
f2=f(x)
print(x,f2)
ret = (f1-f2)/(2*h)
print(ret)
x[idx]=temp_v
grad[idx]=ret

return grad

然后调用
numerical_gradient(function_2,np.array([3,4]))

计算的是二元一次方程 y=X0**2+X1**2  在点(3,4)的偏导的值
得到的是什么结果?
为什么会得到这样的结果? 
小白一般要花点时间才能找到原因。
  收起阅读 »

numpy和dataframe轴的含义,axis为负数的含义

比如有数组:
a=np.array([[[1,2],[3,4]],[[11,12],[13,14]]])

a
array([[[ 1,  2],
[ 3, 4]],

[[11, 12],
[13, 14]]])

 a有3个中括号,那么就有3条轴,从0开始到2,分别是axis=0,1,2
那么我要对a进行求和,分别用axis=0,1,2进行运行。
 
a.sum(axis=0)
得到:
array([[12, 14],
[16, 18]])
意思是去掉一个中括号,然后运行。
 
同理:
a.sum(axis=1)
对a去掉2个中括号,然后运行。
得到:
array([[ 4,  6],
[24, 26]])
那么对a.sum(axis=2)的结果呢?读者可以自己上机去尝试吧。
 
而轴的负数,axis=-3和axis=0的意思是一样的,对于有3层轴的数组来说的话。
 
a.sum(axis=-3)

array([[12, 14],
[16, 18]])

 
继续阅读 »
比如有数组:
a=np.array([[[1,2],[3,4]],[[11,12],[13,14]]])

a
array([[[ 1,  2],
[ 3, 4]],

[[11, 12],
[13, 14]]])

 a有3个中括号,那么就有3条轴,从0开始到2,分别是axis=0,1,2
那么我要对a进行求和,分别用axis=0,1,2进行运行。
 
a.sum(axis=0)
得到:
array([[12, 14],
[16, 18]])
意思是去掉一个中括号,然后运行。
 
同理:
a.sum(axis=1)
对a去掉2个中括号,然后运行。
得到:
array([[ 4,  6],
[24, 26]])
那么对a.sum(axis=2)的结果呢?读者可以自己上机去尝试吧。
 
而轴的负数,axis=-3和axis=0的意思是一样的,对于有3层轴的数组来说的话。
 
a.sum(axis=-3)

array([[12, 14],
[16, 18]])

  收起阅读 »

np.nonzero()的用法【numpy小白】

numpy函数返回非零元素的位置。

返回值为元组, 两个值分别为两个维度, 包含了相应维度上非零元素的目录值。
 
比如:
n1=np.array([0,1,0,0,0,0,1,0,0,0,0,0,0,1])
n1.nonzero()

返回的是:
(array([ 1,  6, 13], dtype=int64),)
注意上面是一个yu元组
要获取里面的值,需要用 n1.nonzero()[0] 来获取。
 
原创文章
转载请注明出处:
http://30daydo.com/article/466
 
继续阅读 »
numpy函数返回非零元素的位置。

返回值为元组, 两个值分别为两个维度, 包含了相应维度上非零元素的目录值。
 
比如:
n1=np.array([0,1,0,0,0,0,1,0,0,0,0,0,0,1])
n1.nonzero()

返回的是:
(array([ 1,  6, 13], dtype=int64),)
注意上面是一个yu元组
要获取里面的值,需要用 n1.nonzero()[0] 来获取。
 
原创文章
转载请注明出处:
http://30daydo.com/article/466
  收起阅读 »

ndarray和array的区别【numpy小白】

在numpy中,np.array()是一个函数,用法: 
np.array([1,2,3,4])
上面代码创建了一个对象,这个对象就是ndarray。 所以ndarray是一个类对象象,而array是一个方法。
 
原创文章
转载请注明出处:
http://30daydo.com/article/465
 
继续阅读 »
在numpy中,np.array()是一个函数,用法: 
np.array([1,2,3,4])
上面代码创建了一个对象,这个对象就是ndarray。 所以ndarray是一个类对象象,而array是一个方法。
 
原创文章
转载请注明出处:
http://30daydo.com/article/465
  收起阅读 »