文章 - 30天尝试新事情

子弹短信 --已经下架了

闲聊

在锤子的应用市场，搜索子弹短信已经找不到相关的应用了。看来已经凉凉了

Screenshot_2019-09-08-09-22-54-521_应用商店.png

在锤子的应用市场，搜索子弹短信已经找不到相关的应用了。看来已经凉凉了

所谓的垃圾创业公司，什么业务都不想出钱，能省就剩，抠到极致，能骗的绝对不花钱。

第一次见这种公司。

事情缘由：

该公司在拉勾上以招聘兼职为由，加你微信，然后借口说已测试一下应聘者的的水平，要求对方写一个爬取一个他们想要爬的网站，而且是用一个第三方的网站神箭手的平台代码来写的。意味着，他可以拿着你的代码直接在上面运行，爬取他们想要的数据。

因为他们要的网站我曾经爬过，我直接把数据接了图给他们。他们就急着要我用神箭手重新写一次。这时我就妥妥地确定他们就是想要空手白狼的人。然后就拉黑了哦。

注意，那个负责人叫王锦锋。

继续阅读 »

所谓的垃圾创业公司，什么业务都不想出钱，能省就剩，抠到极致，能骗的绝对不花钱。

第一次见这种公司。

事情缘由：

该公司在拉勾上以招聘兼职为由，加你微信，然后借口说已测试一下应聘者的的水平，要求对方写一个爬取一个他们想要爬的网站，而且是用一个第三方的网站神箭手的平台代码来写的。意味着，他可以拿着你的代码直接在上面运行，爬取他们想要的数据。

因为他们要的网站我曾经爬过，我直接把数据接了图给他们。他们就急着要我用神箭手重新写一次。这时我就妥妥地确定他们就是想要空手白狼的人。然后就拉黑了哦。

注意，那个负责人叫王锦锋。收起阅读 »

性能对比 pypy vs python

python高性能

性能对比 pypy vs python
不试不知道，一试吓一跳。
如果是CPU密集型的程序，pypy3的执行速度比python要快上一百倍。
talk is cheap, show me the code!

代码很简单，运行加法运算：
执行2千万次

import time



LOOP = 2*10**8



def add(x,y):

    return x+y



def cpu_pressure(loop):

    

    for i in range(loop):

        result = add(i,i+1)





if __name__ == '__main__':

    start = time.time()

    cpu_pressure(LOOP)

    print(f'time used {time.time()-start}s')

python执行：
python main.py
返回用时：time used 21.422261476516724s

pypy执行：
pypy main.py
返回用时：time used 0.1925642490386963s

差距真的很大。

继续阅读 »

性能对比 pypy vs python
不试不知道，一试吓一跳。
如果是CPU密集型的程序，pypy3的执行速度比python要快上一百倍。
talk is cheap, show me the code!

代码很简单，运行加法运算：
执行2千万次

import time



LOOP = 2*10**8



def add(x,y):

    return x+y



def cpu_pressure(loop):

    

    for i in range(loop):

        result = add(i,i+1)





if __name__ == '__main__':

    start = time.time()

    cpu_pressure(LOOP)

    print(f'time used {time.time()-start}s')

python执行：
python main.py
返回用时：time used 21.422261476516724s

pypy执行：
pypy main.py
返回用时：time used 0.1925642490386963s

差距真的很大。收起阅读 »

scrapy源码分析<一>：入口函数以及是如何运行

python爬虫 scrapy源码

运行scrapy crawl example 命令的时候，就会执行我们写的爬虫程序。
下面我们从源码分析一下scrapy执行的流程：

执行scrapy crawl 命令时，调用的是Command类

class Command(ScrapyCommand):



    requires_project = True



    def syntax(self):

        return '[options]'



    def short_desc(self):

        return 'Runs all of the spiders - My Defined'



    def run(self,args,opts):

        print('==================')

        print(type(self.crawler_process))

        spider_list = self.crawler_process.spiders.list() # 找到爬虫类



        for name in spider_list:

            print('=================')

            print(name)

            self.crawler_process.crawl(name,**opts.__dict__)



        self.crawler_process.start()

然后我们去看看crawler_process，这个是来自ScrapyCommand，而ScrapyCommand又是CrawlerProcess的子类，而CrawlerProcess又是CrawlerRunner的子类

在CrawlerRunner构造函数里面主要作用就是这个

      def __init__(self, settings=None):

        if isinstance(settings, dict) or settings is None:

            settings = Settings(settings)

        self.settings = settings

        self.spider_loader = _get_spider_loader(settings) # 构造爬虫

        self._crawlers = set()

        self._active = set()

        self.bootstrap_failed = False

1. 加载配置文件

def _get_spider_loader(settings):



    cls_path = settings.get('SPIDER_LOADER_CLASS')

    

    # settings文件没有定义SPIDER_LOADER_CLASS，所以这里获取到的是系统的默认配置文件，

    # 默认配置文件在接下来的代码块A

    # SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader'    

    

    loader_cls = load_object(cls_path) 

    # 这个函数就是根据路径转为类对象，也就是上面crapy.spiderloader.SpiderLoader 这个

    # 字符串变成一个类对象

    # 具体的load_object 对象代码见下面代码块B



    return loader_cls.from_settings(settings.frozencopy())

默认配置文件defautl_settting.py

# 代码块A

#......省略若干

SCHEDULER = 'scrapy.core.scheduler.Scheduler'

SCHEDULER_DISK_QUEUE = 'scrapy.squeues.PickleLifoDiskQueue'

SCHEDULER_MEMORY_QUEUE = 'scrapy.squeues.LifoMemoryQueue'

SCHEDULER_PRIORITY_QUEUE = 'scrapy.pqueues.ScrapyPriorityQueue'



SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader' 就是这个值

SPIDER_LOADER_WARN_ONLY = False



SPIDER_MIDDLEWARES = {}

load_object的实现

# 代码块B 为了方便，我把异常处理的去除

from importlib import import_module #导入第三方库



def load_object(path):

    dot = path.rindex('.') 

    module, name = path[:dot], path[dot+1:]

    # 上面把路径分为基本路径+模块名

    

    mod = import_module(module)

    obj = getattr(mod, name)

    # 获取模块里面那个值

    

    return obj

测试代码：

In [33]: mod = import_module(module)                                                                                                                                             



In [34]: mod                                                                                                                                                                     

Out[34]: <module 'scrapy.spiderloader' from '/home/xda/anaconda3/lib/python3.7/site-packages/scrapy/spiderloader.py'>



In [35]: getattr(mod,name)                                                                                                                                                       

Out[35]: scrapy.spiderloader.SpiderLoader



In [36]: obj = getattr(mod,name)                                                                                                                                                 



In [37]: obj                                                                                                                                                                     

Out[37]: scrapy.spiderloader.SpiderLoader



In [38]: type(obj)                                                                                                                                                               

Out[38]: type

在代码块A中，loader_cls是SpiderLoader，最后返回的的是SpiderLoader.from_settings(settings.frozencopy())
接下来看看SpiderLoader.from_settings，

    def from_settings(cls, settings):

        return cls(settings)

返回类对象自己，所以直接看__init__函数即可

class SpiderLoader(object):

    """

    SpiderLoader is a class which locates and loads spiders

    in a Scrapy project.

    """

    def __init__(self, settings):

        self.spider_modules = settings.getlist('SPIDER_MODULES') 

        # 获得settting中的模块名字，创建scrapy的时候就默认帮你生成了

        # 你可以看看你的settings文件里面的内容就可以找到这个值，是一个list

        

        self.warn_only = settings.getbool('SPIDER_LOADER_WARN_ONLY')

        self._spiders = {}

        self._found = defaultdict(list)

        self._load_all_spiders() # 加载所有爬虫

核心就是这个_load_all_spiders：
走起：

def _load_all_spiders(self):

        for name in self.spider_modules:            



                for module in walk_modules(name): # 这个遍历文件夹里面的文件，然后再转化为类对象，

                    # 保存到字典：self._spiders = {}       

                    self._load_spiders(module) # 模块变成spider



        self._check_name_duplicates() # 去重，如果名字一样就异常

接下来看看_load_spiders
核心就是下面的。

def iter_spider_classes(module):

    from scrapy.spiders import Spider



    for obj in six.itervalues(vars(module)): # 找到模块里面的变量，然后迭代出来

        if inspect.isclass(obj) and \

           issubclass(obj, Spider) and \

           obj.__module__ == module.__name__ and \

           getattr(obj, 'name', None): # 有name属性，继承于Spider

           yield obj

这个obj就是我们平时写的spider类了。
原来分析了这么多，才找到了我们平时写的爬虫类

待续。。。。

原创文章
转载请注明出处
http://30daydo.com/article/530

继续阅读 »

运行scrapy crawl example 命令的时候，就会执行我们写的爬虫程序。
下面我们从源码分析一下scrapy执行的流程：

执行scrapy crawl 命令时，调用的是Command类

class Command(ScrapyCommand):



    requires_project = True



    def syntax(self):

        return '[options]'



    def short_desc(self):

        return 'Runs all of the spiders - My Defined'



    def run(self,args,opts):

        print('==================')

        print(type(self.crawler_process))

        spider_list = self.crawler_process.spiders.list() # 找到爬虫类



        for name in spider_list:

            print('=================')

            print(name)

            self.crawler_process.crawl(name,**opts.__dict__)



        self.crawler_process.start()

然后我们去看看crawler_process，这个是来自ScrapyCommand，而ScrapyCommand又是CrawlerProcess的子类，而CrawlerProcess又是CrawlerRunner的子类

在CrawlerRunner构造函数里面主要作用就是这个

      def __init__(self, settings=None):

        if isinstance(settings, dict) or settings is None:

            settings = Settings(settings)

        self.settings = settings

        self.spider_loader = _get_spider_loader(settings) # 构造爬虫

        self._crawlers = set()

        self._active = set()

        self.bootstrap_failed = False

1. 加载配置文件

def _get_spider_loader(settings):



    cls_path = settings.get('SPIDER_LOADER_CLASS')

    

    # settings文件没有定义SPIDER_LOADER_CLASS，所以这里获取到的是系统的默认配置文件，

    # 默认配置文件在接下来的代码块A

    # SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader'    

    

    loader_cls = load_object(cls_path) 

    # 这个函数就是根据路径转为类对象，也就是上面crapy.spiderloader.SpiderLoader 这个

    # 字符串变成一个类对象

    # 具体的load_object 对象代码见下面代码块B



    return loader_cls.from_settings(settings.frozencopy())

默认配置文件defautl_settting.py

# 代码块A

#......省略若干

SCHEDULER = 'scrapy.core.scheduler.Scheduler'

SCHEDULER_DISK_QUEUE = 'scrapy.squeues.PickleLifoDiskQueue'

SCHEDULER_MEMORY_QUEUE = 'scrapy.squeues.LifoMemoryQueue'

SCHEDULER_PRIORITY_QUEUE = 'scrapy.pqueues.ScrapyPriorityQueue'



SPIDER_LOADER_CLASS = 'scrapy.spiderloader.SpiderLoader' 就是这个值

SPIDER_LOADER_WARN_ONLY = False



SPIDER_MIDDLEWARES = {}

load_object的实现

# 代码块B 为了方便，我把异常处理的去除

from importlib import import_module #导入第三方库



def load_object(path):

    dot = path.rindex('.') 

    module, name = path[:dot], path[dot+1:]

    # 上面把路径分为基本路径+模块名

    

    mod = import_module(module)

    obj = getattr(mod, name)

    # 获取模块里面那个值

    

    return obj

测试代码：

In [33]: mod = import_module(module)                                                                                                                                             



In [34]: mod                                                                                                                                                                     

Out[34]: <module 'scrapy.spiderloader' from '/home/xda/anaconda3/lib/python3.7/site-packages/scrapy/spiderloader.py'>



In [35]: getattr(mod,name)                                                                                                                                                       

Out[35]: scrapy.spiderloader.SpiderLoader



In [36]: obj = getattr(mod,name)                                                                                                                                                 



In [37]: obj                                                                                                                                                                     

Out[37]: scrapy.spiderloader.SpiderLoader



In [38]: type(obj)                                                                                                                                                               

Out[38]: type

在代码块A中，loader_cls是SpiderLoader，最后返回的的是SpiderLoader.from_settings(settings.frozencopy())
接下来看看SpiderLoader.from_settings，

    def from_settings(cls, settings):

        return cls(settings)

返回类对象自己，所以直接看__init__函数即可

class SpiderLoader(object):

    """

    SpiderLoader is a class which locates and loads spiders

    in a Scrapy project.

    """

    def __init__(self, settings):

        self.spider_modules = settings.getlist('SPIDER_MODULES') 

        # 获得settting中的模块名字，创建scrapy的时候就默认帮你生成了

        # 你可以看看你的settings文件里面的内容就可以找到这个值，是一个list

        

        self.warn_only = settings.getbool('SPIDER_LOADER_WARN_ONLY')

        self._spiders = {}

        self._found = defaultdict(list)

        self._load_all_spiders() # 加载所有爬虫

核心就是这个_load_all_spiders：
走起：

def _load_all_spiders(self):

        for name in self.spider_modules:            



                for module in walk_modules(name): # 这个遍历文件夹里面的文件，然后再转化为类对象，

                    # 保存到字典：self._spiders = {}       

                    self._load_spiders(module) # 模块变成spider



        self._check_name_duplicates() # 去重，如果名字一样就异常

接下来看看_load_spiders
核心就是下面的。

def iter_spider_classes(module):

    from scrapy.spiders import Spider



    for obj in six.itervalues(vars(module)): # 找到模块里面的变量，然后迭代出来

        if inspect.isclass(obj) and \

           issubclass(obj, Spider) and \

           obj.__module__ == module.__name__ and \

           getattr(obj, 'name', None): # 有name属性，继承于Spider

           yield obj

这个obj就是我们平时写的spider类了。
原来分析了这么多，才找到了我们平时写的爬虫类

待续。。。。

原创文章
转载请注明出处
http://30daydo.com/article/530
收起阅读 »

crontab定时运行图形程序

Linux

默认情况不会显示任何图形的界面，需要在程序前添加
export DISPLAY=:0;

* * * * * export DISPLAY=:0; gedit

附一个linux下桌面提醒GUI程序，定时提醒你休息哈：

import pyautogui as pag

import datetime



def neck_rest():

    f = open('neck_record.txt', 'a')

    ret = pag.prompt("Rest! Protect your neck !")

    if ret == 'rest':

        f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

        f.write('\t')

        f.write('Rest')

        f.write('\n')

    else:

        f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

        f.write('\t')

        f.write('Failed to rest')

        f.write('\n')

    f.close()



neck_rest()

程序保存为task.py
然后设定crontab任务：

* * * * * export DISPLAY=:0; python task.py
即可

继续阅读 »

默认情况不会显示任何图形的界面，需要在程序前添加
export DISPLAY=:0;

* * * * * export DISPLAY=:0; gedit

附一个linux下桌面提醒GUI程序，定时提醒你休息哈：

import pyautogui as pag

import datetime



def neck_rest():

    f = open('neck_record.txt', 'a')

    ret = pag.prompt("Rest! Protect your neck !")

    if ret == 'rest':

        f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

        f.write('\t')

        f.write('Rest')

        f.write('\n')

    else:

        f.write(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

        f.write('\t')

        f.write('Failed to rest')

        f.write('\n')

    f.close()



neck_rest()

程序保存为task.py
然后设定crontab任务：

* * * * * export DISPLAY=:0; python task.py
即可

收起阅读 »

python分析目前为止科创板企业省份分布

python量化交易数据分析

科创板上市以来已经有一个多月了，我想看看到目前为止，上市企业都是归属哪些地方的。因为个人觉得科创板是上证板块的，那么来自江浙一带的企业会更多。毕竟现在深市和沪市在争夺资源，深市希望把深圳企业留回在深市的主板或者中小创版块。

首先获取行情数据，借助tushare这个框架：
在python3环境下，pip install tushare --upgrade ，记得要更新，因为用的旧版本会获取不到科创板的数据。
安装成功后试试import tushare as ts，看看有没有报错。没有就是安装成功了。

接下来抓取全市场的行情.

(点击查看大图)
查看前5条数据
现在行情数据存储在df中，然后分析数据。
因为提取的是全市场的数据，然后获取科创板的企业：

(点击查看大图)

使用的是正则表达式，匹配688开头的代码。

接下来就是分析企业归属地：

(点击查看大图)

使用value_counts函数，统计该列每个值出现的次数。

搞定了！是不是很简单？

而且企业地区分布和自己的构想也差不多，江浙沪一带占了一半，加上北京地区，占了80%以上的科创板企业了。

每周会定期更新一篇python数据分析股票的文章。

原创文章，欢迎转载
请注明出处：
http://30daydo.com/article/528

继续阅读 »

科创板上市以来已经有一个多月了，我想看看到目前为止，上市企业都是归属哪些地方的。因为个人觉得科创板是上证板块的，那么来自江浙一带的企业会更多。毕竟现在深市和沪市在争夺资源，深市希望把深圳企业留回在深市的主板或者中小创版块。

首先获取行情数据，借助tushare这个框架：
在python3环境下，pip install tushare --upgrade ，记得要更新，因为用的旧版本会获取不到科创板的数据。
安装成功后试试import tushare as ts，看看有没有报错。没有就是安装成功了。

接下来抓取全市场的行情.

(点击查看大图)
查看前5条数据
现在行情数据存储在df中，然后分析数据。
因为提取的是全市场的数据，然后获取科创板的企业：

(点击查看大图)

使用的是正则表达式，匹配688开头的代码。

接下来就是分析企业归属地：

(点击查看大图)

使用value_counts函数，统计该列每个值出现的次数。

搞定了！是不是很简单？

而且企业地区分布和自己的构想也差不多，江浙沪一带占了一半，加上北京地区，占了80%以上的科创板企业了。

每周会定期更新一篇python数据分析股票的文章。

原创文章，欢迎转载
请注明出处：
http://30daydo.com/article/528

收起阅读 »

python redis.StrictRedis.from_url 连接redis

redis

python redis.StrictRedis.from_url 连接redis
用url的方式连接redis

r=redis.StrictRedis.from_url(url)

url为以下的格式：

redis://[:password]@localhost:6379/0

rediss://[:password]@localhost:6379/0

unix://[:password]@/path/to/socket.sock?db=0

原创文章，转载请注明出处：
http://30daydo.com/article/527

继续阅读 »

python redis.StrictRedis.from_url 连接redis
用url的方式连接redis

r=redis.StrictRedis.from_url(url)

url为以下的格式：

redis://[:password]@localhost:6379/0

rediss://[:password]@localhost:6379/0

unix://[:password]@/path/to/socket.sock?db=0

原创文章，转载请注明出处：
http://30daydo.com/article/527
收起阅读 »

mongodb 判断列表字段不为空

mongodb

首先插入一批数据：

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[1,2,3,4,5]})

db.test_tab.insert({array:[1,2,3,4,5,6]})

使用以下命令判断列表不为空：

db.getCollection("example").find({array:{$exists:true,$ne:[]}}); # 字段不为0

继续阅读 »

首先插入一批数据：

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[]})

db.test_tab.insert({array:[1,2,3,4,5]})

db.test_tab.insert({array:[1,2,3,4,5,6]})

使用以下命令判断列表不为空：

db.getCollection("example").find({array:{$exists:true,$ne:[]}}); # 字段不为0

收起阅读 »

anaconda环境下无法启动jupyter notebook

数据分析

运行 jupyter notebook
报错：

    from . import (constants, error, message, context,

ImportError: DLL load failed: 找不到指定的模块。

但是可以直接在Anaconda navigator中直接启动，所以判断是环境问题。
切换到anaconda的虚拟环境，（在菜单中进入anaconda prompt command），在当前命令行下执行 jupyter notebook就能够正常运行。

继续阅读 »

运行 jupyter notebook
报错：

    from . import (constants, error, message, context,

ImportError: DLL load failed: 找不到指定的模块。

但是可以直接在Anaconda navigator中直接启动，所以判断是环境问题。
切换到anaconda的虚拟环境，（在菜单中进入anaconda prompt command），在当前命令行下执行 jupyter notebook就能够正常运行。

收起阅读 »

投资最重要的是看清楚对手盘。

投资

也就是看清楚到底是从哪里赚的钱啊。

就比如抓娃娃吧，有运气，也有技巧。但是我抓了二十多个的经验来说，最重要的就是你在哪抓。
从经验上说，小区超市门口的是最好抓的，为什么？

一、房租低，所以娃娃机运营成本低，所以钩子不会调那么松。这是基本面。

二、小区主要是老人和小孩在附近玩，人流量小，技术差。所以自然能夹上来的也少。

这时候，只要找篇网上的攻略，多玩几次，基本可以做到不赔本。至于那种沃尔玛门口的，电影院和购物中心里面的，基本都是很难抓住的。钩子松，娃娃大，摆得也不满。

继续阅读 »

也就是看清楚到底是从哪里赚的钱啊。

就比如抓娃娃吧，有运气，也有技巧。但是我抓了二十多个的经验来说，最重要的就是你在哪抓。
从经验上说，小区超市门口的是最好抓的，为什么？

一、房租低，所以娃娃机运营成本低，所以钩子不会调那么松。这是基本面。

二、小区主要是老人和小孩在附近玩，人流量小，技术差。所以自然能夹上来的也少。

这时候，只要找篇网上的攻略，多玩几次，基本可以做到不赔本。至于那种沃尔玛门口的，电影院和购物中心里面的，基本都是很难抓住的。钩子松，娃娃大，摆得也不满。收起阅读 »

alias别名等号后面不用

Linux

alias sync="git commit -m 'update' -a && git push origin master"
alias fetch="git fetch origin"
alias dj="python manage.py runserver 0.0.0.0"
alias py2="python2"
alias py3="python3"
alias ggg="cd ~/git"

继续阅读 »

alias sync="git commit -m 'update' -a && git push origin master"
alias fetch="git fetch origin"
alias dj="python manage.py runserver 0.0.0.0"
alias py2="python2"
alias py3="python3"
alias ggg="cd ~/git" 收起阅读 »

redis health_check_interval 参数无效

redis

因为一直在循环阻塞里面监听redis的发布者，时间长了，redis就掉线了或者网络终端，就会一直卡在等待接受，而发布者后续发布的数据就接收不到了。

 # helper

    class RedisHelp(object):



        def __init__(self,channel):

            # self.pool = redis.ConnectionPool('10.18.6.46',port=6379)



            # self.conn = redis.Redis(connection_pool=self.pool)

            # 上面的方式无法使用订阅者 发布者模式



            self.conn = redis.Redis(host='10.18.6.46')

            self.publish_channel = channel

            self.subscribe_channel = channel





        def publish(self,msg):

            self.conn.publish(self.publish_channel,msg) # 1. 渠道名 ,2 信息



        def subscribe(self):

            self.pub = self.conn.pubsub()

            self.pub.subscribe(self.subscribe_channel)

            self.pub.parse_response()

            print('initial')

            return self.pub





    helper = RedisHelp('cuiqingcai')



    # 订阅者

    if sys.argv[1]=='s':

        print('in subscribe mode')

        pub = helper.subscribe()

        while 1:

            print('waiting for publish')

            pubsub.check_health()

            msg = pub.parse_response()



            s=str(msg[2],encoding='utf-8')

            print(s)

            if s=='exit':

                break





    # 发布者

    elif sys.argv[1]=='p':

        print('in publish mode')

        msg = sys.argv[2]

        print(f'msg -> {msg}')

        helper.publish(msg)

而官网的文档说使用参数：
health_check_interval=30 # 30s心跳检测一次

但实际上这个参数在最新的redis 3.3以上是被去掉了。所以是无办法使用 self.conn = redis.Redis(host='10.18.6.46',health_check_interval=30)

这点在作者的github页面里面也得到了解释。
https://github.com/andymccurdy/redis-py/issues/1199

所以要改成
data = client.blpop('key', timeout=300)
300s后超时，data为None，重新监听。

继续阅读 »

因为一直在循环阻塞里面监听redis的发布者，时间长了，redis就掉线了或者网络终端，就会一直卡在等待接受，而发布者后续发布的数据就接收不到了。

 # helper

    class RedisHelp(object):



        def __init__(self,channel):

            # self.pool = redis.ConnectionPool('10.18.6.46',port=6379)



            # self.conn = redis.Redis(connection_pool=self.pool)

            # 上面的方式无法使用订阅者 发布者模式



            self.conn = redis.Redis(host='10.18.6.46')

            self.publish_channel = channel

            self.subscribe_channel = channel





        def publish(self,msg):

            self.conn.publish(self.publish_channel,msg) # 1. 渠道名 ,2 信息



        def subscribe(self):

            self.pub = self.conn.pubsub()

            self.pub.subscribe(self.subscribe_channel)

            self.pub.parse_response()

            print('initial')

            return self.pub





    helper = RedisHelp('cuiqingcai')



    # 订阅者

    if sys.argv[1]=='s':

        print('in subscribe mode')

        pub = helper.subscribe()

        while 1:

            print('waiting for publish')

            pubsub.check_health()

            msg = pub.parse_response()



            s=str(msg[2],encoding='utf-8')

            print(s)

            if s=='exit':

                break





    # 发布者

    elif sys.argv[1]=='p':

        print('in publish mode')

        msg = sys.argv[2]

        print(f'msg -> {msg}')

        helper.publish(msg)

而官网的文档说使用参数：
health_check_interval=30 # 30s心跳检测一次

但实际上这个参数在最新的redis 3.3以上是被去掉了。所以是无办法使用 self.conn = redis.Redis(host='10.18.6.46',health_check_interval=30)

这点在作者的github页面里面也得到了解释。
https://github.com/andymccurdy/redis-py/issues/1199

所以要改成
data = client.blpop('key', timeout=300)
300s后超时，data为None，重新监听。

收起阅读 »

mongodb 修改嵌套字典字典的字段名

mongodb

对于mongodb，修改字段名称的语法是

db.test.update({},{$rename:{'旧字段':'新字段'}},true,true)

比如下面的例子：

db.getCollection('example').update({},{$rename:{'corp':'企业'}})

上面就是把字段corp改为企业。

如果是嵌套字段呢？
比如 corp字典是一个字典，里面是 { 'address':'USA', 'phone':'12345678' }

那么要修改里面的address为地址：

db.getCollection('example').update({},{$rename:{'corp.address':'corp.地址'}})

原创文章，转载请注明出处
原文连接：http://30daydo.com/article/521

继续阅读 »

对于mongodb，修改字段名称的语法是

db.test.update({},{$rename:{'旧字段':'新字段'}},true,true)

比如下面的例子：

db.getCollection('example').update({},{$rename:{'corp':'企业'}})

上面就是把字段corp改为企业。

如果是嵌套字段呢？
比如 corp字典是一个字典，里面是 { 'address':'USA', 'phone':'12345678' }

那么要修改里面的address为地址：

db.getCollection('example').update({},{$rename:{'corp.address':'corp.地址'}})

原创文章，转载请注明出处
原文连接：http://30daydo.com/article/521
收起阅读 »

random.randint的用法

python

random.randint的用法：

from random import randint



randint(0,1)

Out[25]: 1



randint(0,1)

Out[26]: 1



randint(0,1)

Out[27]: 1



randint(0,1)

Out[28]: 1



randint(0,1)

Out[29]: 0



randint(0,1)

Out[30]: 1

random.randint（a,b）

输出的整数范围包含a和b，和之间的整数

继续阅读 »

random.randint的用法：

from random import randint



randint(0,1)

Out[25]: 1



randint(0,1)

Out[26]: 1



randint(0,1)

Out[27]: 1



randint(0,1)

Out[28]: 1



randint(0,1)

Out[29]: 0



randint(0,1)

Out[30]: 1

random.randint（a,b）

输出的整数范围包含a和b，和之间的整数
收起阅读 »

python执行shell命令时报错： -/bin/sh: 命令:not found的解决办法

python

     file='test.txt'

     cmd = f'rsync -av  {file} root@10.18.6.46:/home/cjw/'



      p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,executable="/bin/bash")

      output, error = p.communicate()

      if p.returncode != 0:

          print("Error while running - %s" % cmd)

          print(error)

      print(output)

用sublime3 运行的时候一直报错。
后来发现，这个是sublime3的运行环境问题，直接用shell执行 python main.py 执行上面的代码，命令可以正常运行。
/bin/sh: 1: rsync: not found

继续阅读 »

     file='test.txt'

     cmd = f'rsync -av  {file} root@10.18.6.46:/home/cjw/'



      p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,executable="/bin/bash")

      output, error = p.communicate()

      if p.returncode != 0:

          print("Error while running - %s" % cmd)

          print(error)

      print(output)

用sublime3 运行的时候一直报错。
后来发现，这个是sublime3的运行环境问题，直接用shell执行 python main.py 执行上面的代码，命令可以正常运行。
/bin/sh: 1: rsync: not found 收起阅读 »

python并行编程手册勘误

书籍

python并行编程手册中文版

65页的进程创建， p.join() 不能写到循环里面，不然的话会阻塞下一次进程的创建，因为下一次进程要卡在join这里。

可以改成这样的

 p0 = multiprocessing.Process(name=str(0), target=foo, args=(0,))

    p0.start()

    p1 = multiprocessing.Process(name=str(1), target=foo, args=(1,))

    p1.start()

    p2 = multiprocessing.Process(name=str(2), target=foo, args=(2,))

    p2.start()

    p3 = multiprocessing.Process(name=str(3), target=foo, args=(3,))

    p3.start()

    p4 = multiprocessing.Process(name=str(4), target=foo, args=(4,))

    p4.start()



    p5 = multiprocessing.Process(name=str(5), target=foo, args=(5,))

    p5.start()

    

    p0.join()

    p1.join()

    p2.join()

    p3.join()

    p4.join()

    p5.join()

而且后面发现，整本书都是有这个问题的。

继续阅读 »

python并行编程手册中文版

65页的进程创建， p.join() 不能写到循环里面，不然的话会阻塞下一次进程的创建，因为下一次进程要卡在join这里。

可以改成这样的

 p0 = multiprocessing.Process(name=str(0), target=foo, args=(0,))

    p0.start()

    p1 = multiprocessing.Process(name=str(1), target=foo, args=(1,))

    p1.start()

    p2 = multiprocessing.Process(name=str(2), target=foo, args=(2,))

    p2.start()

    p3 = multiprocessing.Process(name=str(3), target=foo, args=(3,))

    p3.start()

    p4 = multiprocessing.Process(name=str(4), target=foo, args=(4,))

    p4.start()



    p5 = multiprocessing.Process(name=str(5), target=foo, args=(5,))

    p5.start()

    

    p0.join()

    p1.join()

    p2.join()

    p3.join()

    p4.join()

    p5.join()

而且后面发现，整本书都是有这个问题的。收起阅读 »

mongodb find得到的数据顺序每次都是一样的

mongodb

只要用的find内容不变，那么返回的内容顺序也就都一样的。

[Articles to save]

闲聊

Since on Raspberrypi and can't launch note application , using this web page to save articles link to store later.

https://www.jisilu.cn/question/321759 -Done
https://www.80shihua.com/archives/1590 -Done

继续阅读 »

Since on Raspberrypi and can't launch note application , using this web page to save articles link to store later.

https://www.jisilu.cn/question/321759 -Done
https://www.80shihua.com/archives/1590 -Done
收起阅读 »

Raspberrypi 2 Install or upgrade Python3.6

raspberrypi

Since no chinese input method in my raspberrypi, i can only write with English.

Raspberrypi has python2. 7 and python3.4, but i want to upgrade to python3.6+.

Python3.6 support some new feature such as print(f'{name}') and x=1_000_242_200 expression.

How to upgrade ?



$ wget https://www.python.org/ftp/pyt ... 1.tgz

$ tar zxvf Python-3.6.1.tgz $ cd Python-3.6.1

then run command:



$ sudo ./configure && sudo make && sudo make install

wait for about 20mins (low perf of raspberrypi :( )

then you run command:
python3

it will using the new python3.6 version:

Python 3.6.1 (default, Jul 21 2019, 14:26:28)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.

Enjoy it !

继续阅读 »

Since no chinese input method in my raspberrypi, i can only write with English.

Raspberrypi has python2. 7 and python3.4, but i want to upgrade to python3.6+.

Python3.6 support some new feature such as print(f'{name}') and x=1_000_242_200 expression.

How to upgrade ?



$ wget https://www.python.org/ftp/pyt ... 1.tgz

$ tar zxvf Python-3.6.1.tgz $ cd Python-3.6.1

then run command:



$ sudo ./configure && sudo make && sudo make install

wait for about 20mins (low perf of raspberrypi :( )

then you run command:
python3

it will using the new python3.6 version:

Python 3.6.1 (default, Jul 21 2019, 14:26:28)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.

Enjoy it ! 收起阅读 »

frontera运行link_follower.py 报错：doesn't define any object named 'FIFO'

frontera

代码如下：

from __future__ import print_function



import re



import requests



from frontera.contrib.requests.manager import RequestsFrontierManager

# from frontera.contrib.requests.manager import RequestsFrontierManager

from frontera import Settings



from six.moves.urllib.parse import urljoin





SETTINGS = Settings()

SETTINGS.BACKEND = 'frontera.contrib.backends.memory.FIFO'

# SETTINGS.BACKEND = 'frontera.contrib.backends.memory.MemoryDistributedBackend'



SETTINGS.LOGGING_MANAGER_ENABLED = True

SETTINGS.LOGGING_BACKEND_ENABLED = True

SETTINGS.MAX_REQUESTS = 100

SETTINGS.MAX_NEXT_REQUESTS = 10



SEEDS = [

    'http://www.imdb.com',

]



LINK_RE = re.compile(r'<a.+?href="(.*?)".?>', re.I)





def extract_page_links(response):

    return [urljoin(response.url, link) for link in LINK_RE.findall(response.text)]



if __name__ == '__main__':



    frontier = RequestsFrontierManager(SETTINGS)

    frontier.add_seeds([requests.Request(url=url) for url in SEEDS])

    while True:

        next_requests = frontier.get_next_requests()

        if not next_requests:

            break

        for request in next_requests:

                try:

                    response = requests.get(request.url)

                    links = [

                        requests.Request(url=url)

                        for url in extract_page_links(response)

                    ]

                    frontier.page_crawled(response)

                    print('Crawled', response.url, '(found', len(links), 'urls)')



                    if links:

                        frontier.links_extracted(request, links)

                except requests.RequestException as e:

                    error_code = type(e).__name__

                    frontier.request_error(request, error_code)

                    print('Failed to process request', request.url, 'Error:', e)

无论用的py2或者py3，都会报以下的错误。

raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))

NameError: Module 'frontera.contrib.backends.memory' doesn't define any object named 'FIFO'

继续阅读 »

代码如下：

from __future__ import print_function



import re



import requests



from frontera.contrib.requests.manager import RequestsFrontierManager

# from frontera.contrib.requests.manager import RequestsFrontierManager

from frontera import Settings



from six.moves.urllib.parse import urljoin





SETTINGS = Settings()

SETTINGS.BACKEND = 'frontera.contrib.backends.memory.FIFO'

# SETTINGS.BACKEND = 'frontera.contrib.backends.memory.MemoryDistributedBackend'



SETTINGS.LOGGING_MANAGER_ENABLED = True

SETTINGS.LOGGING_BACKEND_ENABLED = True

SETTINGS.MAX_REQUESTS = 100

SETTINGS.MAX_NEXT_REQUESTS = 10



SEEDS = [

    'http://www.imdb.com',

]



LINK_RE = re.compile(r'<a.+?href="(.*?)".?>', re.I)





def extract_page_links(response):

    return [urljoin(response.url, link) for link in LINK_RE.findall(response.text)]



if __name__ == '__main__':



    frontier = RequestsFrontierManager(SETTINGS)

    frontier.add_seeds([requests.Request(url=url) for url in SEEDS])

    while True:

        next_requests = frontier.get_next_requests()

        if not next_requests:

            break

        for request in next_requests:

                try:

                    response = requests.get(request.url)

                    links = [

                        requests.Request(url=url)

                        for url in extract_page_links(response)

                    ]

                    frontier.page_crawled(response)

                    print('Crawled', response.url, '(found', len(links), 'urls)')



                    if links:

                        frontier.links_extracted(request, links)

                except requests.RequestException as e:

                    error_code = type(e).__name__

                    frontier.request_error(request, error_code)

                    print('Failed to process request', request.url, 'Error:', e)

无论用的py2或者py3，都会报以下的错误。

raise NameError("Module '%s' doesn't define any object named '%s'" % (module, name))

NameError: Module 'frontera.contrib.backends.memory' doesn't define any object named 'FIFO'

收起阅读 »

scrapy-rabbitmq 不支持python3 [修改源码使它支持]

python爬虫

官方版本在2015年就没有更新了。
在python3上运行的收会报错。

需要修改以下地方：

待续。。

scrapy rabbitmq 分布式爬虫

rabbitmq python爬虫

对于没接触过rabbitmq的同学，可以看这个文章：https://blog.csdn.net/hellozpc/article/details/81436980
rabbitmq是个不错的消息队列服务，可以配合scrapy作为消息队列.

下面是一个简单的demo：

import re

import requests

import scrapy

from scrapy import Request

from rabbit_spider import settings

from scrapy.log import logger

import json

from rabbit_spider.items import RabbitSpiderItem

import datetime

from scrapy.selector import Selector

import pika



# from scrapy_rabbitmq.spiders import RabbitMQMixin

# from scrapy.contrib.spiders import CrawlSpider



class Website(scrapy.Spider):

    name = "rabbit"



    def start_requests(self):

        headers = {'Accept': '*/*',

                   'Accept-Encoding': 'gzip, deflate, br',

                   'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',

                   'Host': '36kr.com',

                   'Referer': 'https://36kr.com/information/web_news',

                   'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'

                   }



        url = 'https://36kr.com/information/web_news'

        



        yield Request(url=url,

                      headers=headers)



    def parse(self, response):

       



        credentials = pika.PlainCredentials('admin', 'admin')

        connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101', 5672, '/', credentials))



        channel = connection.channel()

        channel.exchange_declare(exchange='direct_log', exchange_type='direct')



        result = channel.queue_declare(exclusive=True, queue='')



        queue_name = result.method.queue



        # print(queue_name)

        # infos = sys.argv[1:] if len(sys.argv)>1 else ['info']

        info = 'info'



        # 绑定多个值



        channel.queue_bind(

            exchange='direct_log',

            routing_key=info,

            queue=queue_name

        )

        print('start to receive [{}]'.format(info))



        channel.basic_consume(

            on_message_callback=self.callback_func,

            queue=queue_name,

            auto_ack=True,

        )



        channel.start_consuming()





    def callback_func(self, ch, method, properties, body):

        print(body)

启动spider：

from scrapy import cmdline

cmdline.execute('scrapy crawl rabbit'.split())

然后往rabbitmq里面推送数据：

import pika

import settings



credentials = pika.PlainCredentials('admin','admin')

connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))



channel = connection.channel()

channel.exchange_declare(exchange='direct_log',exchange_type='direct') # fanout 就是组播



routing_key = 'info'

message='https://36kr.com/pp/api/aggregation-entity?type=web_latest_article&b_id=59499&per_page=30'

channel.basic_publish(

	exchange='direct_log',

	routing_key=routing_key,

	body=message

	)



print('sending message {}'.format(message))

connection.close()

推送数据后，scrapy会马上接受到队里里面的数据。
注意不能在start_requests里面写等待队列的命令，因为start_requests函数需要返回一个生成器，否则程序会报错。

待续。。。
###### 2019-08-29 更新 ###################
发现一个坑，就是rabbitMQ在接受到数据后，无法在回调函数里面使用yield生成器。

继续阅读 »

对于没接触过rabbitmq的同学，可以看这个文章：https://blog.csdn.net/hellozpc/article/details/81436980
rabbitmq是个不错的消息队列服务，可以配合scrapy作为消息队列.

下面是一个简单的demo：

import re

import requests

import scrapy

from scrapy import Request

from rabbit_spider import settings

from scrapy.log import logger

import json

from rabbit_spider.items import RabbitSpiderItem

import datetime

from scrapy.selector import Selector

import pika



# from scrapy_rabbitmq.spiders import RabbitMQMixin

# from scrapy.contrib.spiders import CrawlSpider



class Website(scrapy.Spider):

    name = "rabbit"



    def start_requests(self):

        headers = {'Accept': '*/*',

                   'Accept-Encoding': 'gzip, deflate, br',

                   'Accept-Language': 'en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7',

                   'Host': '36kr.com',

                   'Referer': 'https://36kr.com/information/web_news',

                   'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'

                   }



        url = 'https://36kr.com/information/web_news'

        



        yield Request(url=url,

                      headers=headers)



    def parse(self, response):

       



        credentials = pika.PlainCredentials('admin', 'admin')

        connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101', 5672, '/', credentials))



        channel = connection.channel()

        channel.exchange_declare(exchange='direct_log', exchange_type='direct')



        result = channel.queue_declare(exclusive=True, queue='')



        queue_name = result.method.queue



        # print(queue_name)

        # infos = sys.argv[1:] if len(sys.argv)>1 else ['info']

        info = 'info'



        # 绑定多个值



        channel.queue_bind(

            exchange='direct_log',

            routing_key=info,

            queue=queue_name

        )

        print('start to receive [{}]'.format(info))



        channel.basic_consume(

            on_message_callback=self.callback_func,

            queue=queue_name,

            auto_ack=True,

        )



        channel.start_consuming()





    def callback_func(self, ch, method, properties, body):

        print(body)

启动spider：

from scrapy import cmdline

cmdline.execute('scrapy crawl rabbit'.split())

然后往rabbitmq里面推送数据：

import pika

import settings



credentials = pika.PlainCredentials('admin','admin')

connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))



channel = connection.channel()

channel.exchange_declare(exchange='direct_log',exchange_type='direct') # fanout 就是组播



routing_key = 'info'

message='https://36kr.com/pp/api/aggregation-entity?type=web_latest_article&b_id=59499&per_page=30'

channel.basic_publish(

	exchange='direct_log',

	routing_key=routing_key,

	body=message

	)



print('sending message {}'.format(message))

connection.close()

推送数据后，scrapy会马上接受到队里里面的数据。
注意不能在start_requests里面写等待队列的命令，因为start_requests函数需要返回一个生成器，否则程序会报错。

待续。。。
###### 2019-08-29 更新 ###################
发现一个坑，就是rabbitMQ在接受到数据后，无法在回调函数里面使用yield生成器。
收起阅读 »

exchange_declare() got an unexpected keyword argument 'type'

rabbitmq python

In new version of pika, now it is using
exchange_type instead of type

	credentials = pika.PlainCredentials('admin','admin')

	connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))



	channel = connection.channel()



	channel.exchange_declare(exchange='logs',exchange_type='fanout')

继续阅读 »

In new version of pika, now it is using
exchange_type instead of type

	credentials = pika.PlainCredentials('admin','admin')

	connection = pika.BlockingConnection(pika.ConnectionParameters('192.168.1.101',5672,'/',credentials))



	channel = connection.channel()



	channel.exchange_declare(exchange='logs',exchange_type='fanout')

收起阅读 »

twisted的getPage已经不建议使用，新接口为twisted.web.client.Agent

twisted

Twisted-16.7.0 is coming soon, and it deprecates twisted.web.client.getPage (and client.HTTPClientFactory). We use these in some of the unit tests, to fetch one of the HTTP WAPI/WUI pages and make sure the contents look right.

We need to change these tests to use twisted.web.client.Agent instead, or a package named "treq", which is a Twisted flavor of the excellent (but blocking) requests library.

继续阅读 »

Twisted-16.7.0 is coming soon, and it deprecates twisted.web.client.getPage (and client.HTTPClientFactory). We use these in some of the unit tests, to fetch one of the HTTP WAPI/WUI pages and make sure the contents look right.

We need to change these tests to use twisted.web.client.Agent instead, or a package named "treq", which is a Twisted flavor of the excellent (but blocking) requests library.

收起阅读 »

twisted　reactor运行后，添加了addBoth函数，但是还是无法停止

python

代码如下：

	from scrapy.selector import Selector

	

	def get_response_callback(content):

		txt = str(content,encoding='utf-8')

		resp = Selector(text=txt)

		title = resp.xpath('//title/text()').extract_first()

		print(title)

	

	@defer.inlineCallbacks

	def task():

		url = 'http://www.baidu.com'

		d=getPage(url.encode('utf-8'))

		d.addCallback(get_response_callback)

		yield d



	def done():

		reactor.stop()



	def done1(*args,**kwargs):

		reactor.stop()



	task_list = 

	for i in range(4):

		d=task()

		task_list.append(d)



	dd = defer.DeferredList(task_list)



	dd.addBoth(done)



	reactor.run()

上面的代码是无法停止的，如果使用的是　
dd.addBoth(done)

done函数的定义是没有参数的。　

而使用另一个done函数带参数的done(*args,**kwargs)
是可以正常退出的，done里面写了reactor.stop() 函数

原创文章
转载请注明出处：
http://30daydo.com/article/509

继续阅读 »

代码如下：

	from scrapy.selector import Selector

	

	def get_response_callback(content):

		txt = str(content,encoding='utf-8')

		resp = Selector(text=txt)

		title = resp.xpath('//title/text()').extract_first()

		print(title)

	

	@defer.inlineCallbacks

	def task():

		url = 'http://www.baidu.com'

		d=getPage(url.encode('utf-8'))

		d.addCallback(get_response_callback)

		yield d



	def done():

		reactor.stop()



	def done1(*args,**kwargs):

		reactor.stop()



	task_list = 

	for i in range(4):

		d=task()

		task_list.append(d)



	dd = defer.DeferredList(task_list)



	dd.addBoth(done)



	reactor.run()

上面的代码是无法停止的，如果使用的是　
dd.addBoth(done)

done函数的定义是没有参数的。　

而使用另一个done函数带参数的done(*args,**kwargs)
是可以正常退出的，done里面写了reactor.stop() 函数

原创文章
转载请注明出处：
http://30daydo.com/article/509
收起阅读 »

numpy indices的用法

量化分析

Suppose you have a matrix M whose (i,j)-th element equals



M_ij = 2*i + 3*j

One way to define this matrix would be



i, j = np.indices((2,3))

M = 2*i + 3*j

which yields



array([[0, 3, 6],

       [2, 5, 8]])

In other words, np.indices returns arrays which can be used as indices. The elements in i indicate the row index:



In [12]: i

Out[12]: 

array([[0, 0, 0],

       [1, 1, 1]])

The elements in j indicate the column index:



In [13]: j

Out[13]: 

array([[0, 1, 2],

       [0, 1, 2]])

上面是Stack Overflow的解释。翻译一下：

np.indices((2,3))

返回的是一个行列的索引，然后可以用这个索引快速的创建二维数据。

比如我要画一个圆：
img = np.zeros((400,400))
ir,ic = np.indices(img.shape)
circle = (ir-135)**2+(ic-150)**2 < 30**2 # 半径30，圆心在135,150
img[circle]=1

img现在就是一个圆啦

继续阅读 »

Suppose you have a matrix M whose (i,j)-th element equals



M_ij = 2*i + 3*j

One way to define this matrix would be



i, j = np.indices((2,3))

M = 2*i + 3*j

which yields



array([[0, 3, 6],

       [2, 5, 8]])

In other words, np.indices returns arrays which can be used as indices. The elements in i indicate the row index:



In [12]: i

Out[12]: 

array([[0, 0, 0],

       [1, 1, 1]])

The elements in j indicate the column index:



In [13]: j

Out[13]: 

array([[0, 1, 2],

       [0, 1, 2]])

上面是Stack Overflow的解释。翻译一下：

np.indices((2,3))

返回的是一个行列的索引，然后可以用这个索引快速的创建二维数据。

比如我要画一个圆：
img = np.zeros((400,400))
ir,ic = np.indices(img.shape)
circle = (ir-135)**2+(ic-150)**2 < 30**2 # 半径30，圆心在135,150
img[circle]=1

img现在就是一个圆啦
收起阅读 »

cv2 distanceTransform函数的用法 python

opencv

distanceTransform

Calculates the distance to the closest zero pixel for each pixel of the source image.





Python: cv2.distanceTransform(src, distanceType, maskSize[, dst]) → dst



Python: cv.DistTransform(src, dst, distance_type=CV_DIST_L2, mask_size=3, mask=None, labels=None) → None



Parameters:	

src – 8-bit, single-channel (binary) source image.

dst – Output image with calculated distances. It is a 32-bit floating-point, single-channel image of the same size as src .



distanceType – Type of distance. It can be CV_DIST_L1, CV_DIST_L2 , or CV_DIST_C .

maskSize – Size of the distance transform mask. It can be 3, 5, or CV_DIST_MASK_PRECISE (the latter option is only supported by the first function). In case of the CV_DIST_L1 or CV_DIST_C distance type, the parameter is forced to 3 because a  3\times 3 mask gives the same result as  5\times 5 or any larger aperture.



labels – Optional output 2D array of labels (the discrete Voronoi diagram). It has the type CV_32SC1 and the same size as src . See the details below.



labelType – Type of the label array to build. If labelType==DIST_LABEL_CCOMP then each connected component of zeros in src (as well as all the non-zero pixels closest to the connected component) will be assigned the same label. If labelType==DIST_LABEL_PIXEL then each zero pixel (and all the non-zero pixels closest to it) gets its own label.

The functions distanceTransform calculate the approximate or precise distance from every binary image pixel to the nearest zero pixel. For zero image pixels, the distance will obviously be zero.





When maskSize == CV_DIST_MASK_PRECISE and distanceType == CV_DIST_L2 , the function runs the algorithm described in [Felzenszwalb04]. This algorithm is parallelized with the TBB library.



In other cases, the algorithm [Borgefors86] is used. This means that for a pixel the function finds the shortest path to the nearest zero pixel consisting of basic shifts: horizontal, vertical, diagonal, or knight’s move (the latest is available for a 5\times 5 mask). The overall distance is calculated as a sum of these basic distances. Since the distance function should be symmetric, all of the horizontal and vertical shifts must have the same cost (denoted as a ), all the diagonal shifts must have the same cost (denoted as b ), and all knight’s moves must have the same cost (denoted as c ). For the CV_DIST_C and CV_DIST_L1 types, the distance is calculated precisely, whereas for CV_DIST_L2 (Euclidean distance) the distance can be calculated only with a relative error (a 5\times 5 mask gives more accurate results). For a,``b`` , and c , OpenCV uses the values suggested in the original paper:



CV_DIST_C	(3\times 3)	a = 1, b = 1

CV_DIST_L1	(3\times 3)	a = 1, b = 2

CV_DIST_L2	(3\times 3)	a=0.955, b=1.3693

CV_DIST_L2	(5\times 5)	a=1, b=1.4, c=2.1969

Typically, for a fast, coarse distance estimation CV_DIST_L2, a 3\times 3 mask is used. For a more accurate distance estimation CV_DIST_L2 , a 5\times 5 mask or the precise algorithm is used. Note that both the precise and the approximate algorithms are linear on the number of pixels.



The second variant of the function does not only compute the minimum distance for each pixel (x, y) but also identifies the nearest connected component consisting of zero pixels (labelType==DIST_LABEL_CCOMP) or the nearest zero pixel (labelType==DIST_LABEL_PIXEL). Index of the component/pixel is stored in \texttt{labels}(x, y) . When labelType==DIST_LABEL_CCOMP, the function automatically finds connected components of zero pixels in the input image and marks them with distinct labels. When labelType==DIST_LABEL_CCOMP, the function scans through the input image and marks all the zero pixels with distinct labels.



In this mode, the complexity is still linear. That is, the function provides a very fast way to compute the Voronoi diagram for a binary image. Currently, the second variant can use only the approximate distance transform algorithm, i.e. maskSize=CV_DIST_MASK_PRECISE is not supported yet.



Note

An example on using the distance transform can be found at opencv_source_code/samples/cpp/distrans.cpp

(Python) An example on using the distance transform can be found at opencv_source/samples/python2/distrans.py

继续阅读 »

distanceTransform

Calculates the distance to the closest zero pixel for each pixel of the source image.





Python: cv2.distanceTransform(src, distanceType, maskSize[, dst]) → dst



Python: cv.DistTransform(src, dst, distance_type=CV_DIST_L2, mask_size=3, mask=None, labels=None) → None



Parameters:	

src – 8-bit, single-channel (binary) source image.

dst – Output image with calculated distances. It is a 32-bit floating-point, single-channel image of the same size as src .



distanceType – Type of distance. It can be CV_DIST_L1, CV_DIST_L2 , or CV_DIST_C .

maskSize – Size of the distance transform mask. It can be 3, 5, or CV_DIST_MASK_PRECISE (the latter option is only supported by the first function). In case of the CV_DIST_L1 or CV_DIST_C distance type, the parameter is forced to 3 because a  3\times 3 mask gives the same result as  5\times 5 or any larger aperture.



labels – Optional output 2D array of labels (the discrete Voronoi diagram). It has the type CV_32SC1 and the same size as src . See the details below.



labelType – Type of the label array to build. If labelType==DIST_LABEL_CCOMP then each connected component of zeros in src (as well as all the non-zero pixels closest to the connected component) will be assigned the same label. If labelType==DIST_LABEL_PIXEL then each zero pixel (and all the non-zero pixels closest to it) gets its own label.

The functions distanceTransform calculate the approximate or precise distance from every binary image pixel to the nearest zero pixel. For zero image pixels, the distance will obviously be zero.





When maskSize == CV_DIST_MASK_PRECISE and distanceType == CV_DIST_L2 , the function runs the algorithm described in [Felzenszwalb04]. This algorithm is parallelized with the TBB library.



In other cases, the algorithm [Borgefors86] is used. This means that for a pixel the function finds the shortest path to the nearest zero pixel consisting of basic shifts: horizontal, vertical, diagonal, or knight’s move (the latest is available for a 5\times 5 mask). The overall distance is calculated as a sum of these basic distances. Since the distance function should be symmetric, all of the horizontal and vertical shifts must have the same cost (denoted as a ), all the diagonal shifts must have the same cost (denoted as b ), and all knight’s moves must have the same cost (denoted as c ). For the CV_DIST_C and CV_DIST_L1 types, the distance is calculated precisely, whereas for CV_DIST_L2 (Euclidean distance) the distance can be calculated only with a relative error (a 5\times 5 mask gives more accurate results). For a,``b`` , and c , OpenCV uses the values suggested in the original paper:



CV_DIST_C	(3\times 3)	a = 1, b = 1

CV_DIST_L1	(3\times 3)	a = 1, b = 2

CV_DIST_L2	(3\times 3)	a=0.955, b=1.3693

CV_DIST_L2	(5\times 5)	a=1, b=1.4, c=2.1969

Typically, for a fast, coarse distance estimation CV_DIST_L2, a 3\times 3 mask is used. For a more accurate distance estimation CV_DIST_L2 , a 5\times 5 mask or the precise algorithm is used. Note that both the precise and the approximate algorithms are linear on the number of pixels.



The second variant of the function does not only compute the minimum distance for each pixel (x, y) but also identifies the nearest connected component consisting of zero pixels (labelType==DIST_LABEL_CCOMP) or the nearest zero pixel (labelType==DIST_LABEL_PIXEL). Index of the component/pixel is stored in \texttt{labels}(x, y) . When labelType==DIST_LABEL_CCOMP, the function automatically finds connected components of zero pixels in the input image and marks them with distinct labels. When labelType==DIST_LABEL_CCOMP, the function scans through the input image and marks all the zero pixels with distinct labels.



In this mode, the complexity is still linear. That is, the function provides a very fast way to compute the Voronoi diagram for a binary image. Currently, the second variant can use only the approximate distance transform algorithm, i.e. maskSize=CV_DIST_MASK_PRECISE is not supported yet.



Note

An example on using the distance transform can be found at opencv_source_code/samples/cpp/distrans.cpp

(Python) An example on using the distance transform can be found at opencv_source/samples/python2/distrans.py

收起阅读 »

Django 版本不兼容报错 AuthenticationMiddleware

django

Django 2.2.

ERRORS:

?: (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application.

在之前的版本上没有问题，更新后就出错。
降级Django

pip install django=2.1.7

PS: 这个django的版本兼容的确是个大问题，哪天升级了下django版本，不经过严格的测试就带来灾难性的后果。

继续阅读 »

Django 2.2.

ERRORS:

?: (admin.E408) 'django.contrib.auth.middleware.AuthenticationMiddleware' must be in MIDDLEWARE in order to use the admin application.

在之前的版本上没有问题，更新后就出错。
降级Django

pip install django=2.1.7

PS: 这个django的版本兼容的确是个大问题，哪天升级了下django版本，不经过严格的测试就带来灾难性的后果。收起阅读 »

Win10下PhantomJS无法运行【版本兼容问题】

python爬虫

以前在win7上运行的好好的。
在win10下就报错：
selenium.common.exceptions.WebDriverException: Message: Service C:\Tool\phantomjs-2.5.0-beta2-windows\phantomjs-2.5.0-beta2-windows\bin\phantomjs.exe unexpectedly exited. Status code was: 4294967295

后来替换了一个旧的版本，发现问题就这么解决了。
旧版本：phantomjs-2.1.1-windows

原创文章
转载请注明出处
http://30daydo.com/article/505

继续阅读 »

以前在win7上运行的好好的。
在win10下就报错：
selenium.common.exceptions.WebDriverException: Message: Service C:\Tool\phantomjs-2.5.0-beta2-windows\phantomjs-2.5.0-beta2-windows\bin\phantomjs.exe unexpectedly exited. Status code was: 4294967295

后来替换了一个旧的版本，发现问题就这么解决了。
旧版本：phantomjs-2.1.1-windows

原创文章
转载请注明出处
http://30daydo.com/article/505
收起阅读 »

nunpy中的std标准差是样本差吗

量化交易

写个代码测试下：

# 测试一下那个方差

x=[1,2,3,4,5,6,7,8,9,10]

X = np.array(x)

X.mean()
5.5

X.std() # 标准差
2.8722813232690143

手工计算一下：

def my_fangca(X):

    l=len(X)

    mean=X.mean()

    sum_ = 0

    sum_std=0

    for i in X:

        sum_+=(i-mean)**2

    var_=sum_/l

    std_=(sum_/(l))**0.5

    return var_,std_

result = my_fangca(X)
得到的result

(8.25, 2.8722813232690143)

说明numpy的std是标准差，不是样本差

继续阅读 »

写个代码测试下：

# 测试一下那个方差

x=[1,2,3,4,5,6,7,8,9,10]

X = np.array(x)

X.mean()
5.5

X.std() # 标准差
2.8722813232690143

手工计算一下：

def my_fangca(X):

    l=len(X)

    mean=X.mean()

    sum_ = 0

    sum_std=0

    for i in X:

        sum_+=(i-mean)**2

    var_=sum_/l

    std_=(sum_/(l))**0.5

    return var_,std_

result = my_fangca(X)
得到的result

(8.25, 2.8722813232690143)

说明numpy的std是标准差，不是样本差收起阅读 »

子弹短信 --已经下架了

轻报APP --骗子请注意点

性能对比 pypy vs python

scrapy源码分析<一>：入口函数以及是如何运行

crontab定时运行图形程序

python分析目前为止科创板企业省份分布

python redis.StrictRedis.from_url 连接redis

mongodb 判断列表字段不为空

anaconda环境下无法启动jupyter notebook

投资最重要的是看清楚对手盘。

alias别名等号后面不用

redis health_check_interval 参数无效

mongodb 修改嵌套字典字典的字段名

random.randint的用法

python执行shell命令时报错： -/bin/sh: 命令:not found的解决办法

python并行编程手册勘误

mongodb find得到的数据顺序每次都是一样的

[Articles to save]

Raspberrypi 2 Install or upgrade Python3.6

frontera运行link_follower.py 报错：doesn't define any object named 'FIFO'

scrapy-rabbitmq 不支持python3 [修改源码使它支持]

scrapy rabbitmq 分布式爬虫

exchange_declare() got an unexpected keyword argument 'type'

twisted的getPage已经不建议使用，新接口为twisted.web.client.Agent

twisted　reactor运行后，添加了addBoth函数，但是还是无法停止

numpy indices的用法

cv2 distanceTransform函数的用法 python

Django 版本不兼容报错 AuthenticationMiddleware

Win10下PhantomJS无法运行【版本兼容问题】

nunpy中的std标准差是样本差吗

热门文章

热门话题