雪球的元卫南靠打赏收割了多少钱？ python爬虫实例

********* 2019-08-18 更新 ***********

今天重新爬了一下，元卫南今年的人气暴涨，在2019年开始到现在，已经获取了31851.6元的打赏金额，虽然金额也不是特别高，但是已经比他2019年前所有打赏金额之和还要高了。具体分析过程见 http://30daydo.com/article/362

********* 2019-08-05 更新 ***********

文章是去年写的，没想到最近居然在雪球火了。后续会更新下最新的数据，还有趴一趴释老毛的打赏金额。

雪球的元卫南每天坚持发帖，把一个股民的日常描述的栩栩如生，让人感叹股民的无助与悲哀。同时也看到上了严重杠杆后，对生活造成的压力，靠着借债来给股票续命。

元卫南雪球链接：https://xueqiu.com/u/2227798650

而且不断有人质疑元卫南写文章，靠打赏金来消费粉丝。刚开始我也这么觉得，毕竟不少人几十块，一百块的打赏，十几万的粉丝，那每天的收入都很客观呀。于是抱着好奇心，把元卫南的所有专栏的文章都爬下来，获取每个文章的赏金金额，然后就知道元兄到底靠赏金拿了多少钱。

撸起袖子干。代码不多，在python3的环境下运行，隐去了header的个人信息，如果在电脑上运行，把你个人的header和cookie加上即可

# -*-coding=utf-8-*-



# @Time : 2018/10/23 9:26

# @File : money_reward.py

import requests

from collections import OrderedDict

import time

import datetime

import pymongo

import config



session = requests.Session()

def get_proxy(retry=10):

    proxyurl = 'http://{}:8081/dynamicIp/common/getDynamicIp.do'.format(config.PROXY)

    count = 0

    for i in range(retry):

        try:

            r = requests.get(proxyurl, timeout=10)

        except Exception as e:

            print(e)

            count += 1

            print('代理获取失败,重试' + str(count))

            time.sleep(1)



        else:

            js = r.json()

            proxyServer = 'http://{0}:{1}'.format(js.get('ip'), js.get('port'))

            proxies_random = {

                'http': proxyServer

            }

            return proxies_random





def get_content(url):

    headers = {

        # 此处添加个人的header信息

    }

    try:

        proxy = get_proxy()

    except Exception as e:

        print(e)

        proxy = get_proxy()



    try:

        r = session.get(url=url, headers=headers,proxies=proxy,timeout=10)

    except Exception as e:

        print(e)

        proxy = get_proxy()

        r = session.get(url=url, headers=headers,proxies=proxy,timeout=10)



    return r





def parse_content(post_id):

    url = 'https://xueqiu.com/statuses/reward/list_by_user.json?status_id={}&page=1&size=99999999'.format(post_id)

    r = get_content(url)

    print(r.text)

    if r.status_code != 200:

        print('status code != 200')

        failed_doc.insert({'post_id':post_id,'status':0})

        return None



    try:



        js_data = r.json()

    except Exception as e:

        print(e)

        print('can not parse to json')

        print(post_id)

        failed_doc.insert({'post_id': post_id, 'status': 0})

        return



    ret = 

    been_reward_user = '元卫南'

    for item in js_data.get('items'):

        name = item.get('name')

        amount = item.get('amount')

        description = item.get('description')

        user_id = item.get('user_id')

        created_at = item.get('created_at')

        if created_at:

            created_at = datetime.datetime.fromtimestamp(int(created_at) / 1000).strftime('%Y-%m-%d %H:%M:%S')



        d = OrderedDict()

        d['name'] = name

        d['user_id'] = user_id

        d['amount'] = amount / 100

        d['description'] = description

        d['created_at'] = created_at

        d['been_reward'] = been_reward_user

        ret.append(d)



    print(ret)

    if ret:

        doc.insert_many(ret)

        failed_doc.insert({'post_id':post_id,'status':1})







def get_all_page_id(user_id):

    doc = db['db_parker']['xueqiu_zhuanglan']



    get_page_url = 'https://xueqiu.com/statuses/original/timeline.json?user_id={}&page=1'.format(user_id)

    r = get_content(get_page_url)

    max_page = int(r.json().get('maxPage'))



    for i in range(1, max_page + 1):

        url = 'https://xueqiu.com/statuses/original/timeline.json?user_id=2227798650&page={}'.format(i)

        r = get_content(url)

        js_data = r.json()

        ret = 



        for item in js_data.get('list'):

            d = OrderedDict()



            d['article_id'] = item.get('id')

            d['title'] = item.get('title')

            d['description'] = item.get('description')

            d['view_count'] = item.get('view_count')

            d['target'] = 'https://xueqiu.com/' + item.get('target')

            d['user_id']= item.get('user_id')

            d['created_at'] = datetime.datetime.fromtimestamp(int(item.get('created_at')) / 1000).strftime(

                '%Y-%m-%d %H:%M:%S')



            ret.append(d)

        print(d)

        doc.insert_many(ret)



def loop_page_id():

    doc = db['db_parker']['xueqiu_zhuanglan']

    ret = doc.find({},{'article_id':1})

    failed_doc = db['db_parker']['xueqiu_reward_status']

    failed_ret = failed_doc.find({'status':1})

    article_id_list =

    for i in failed_ret:

        article_id_list.append(i.get('article_id'))



    for item in ret:

        article_id = item.get('article_id')

        print(article_id)

        if article_id in article_id_list:

            continue

        else:

            parse_content(article_id)



loop_page_id()

然后就是开始爬。
因为使用了代理，所有速度回有点慢，大概10分钟就把所有内容爬完了。

点击查看大图

数据是存储在mongodb数据库中，打开mongodb，可以查看每一条数据，还可以做统计。

点击查看大图

从今天（2018-10-23）追溯到元兄第一篇专栏文章（2014-2-17），元兄总共发了1144篇文章。

点击查看大图

然后再看另外一个打赏的列表

点击查看大图

从最新的开始日期（2018-10-23），这位金王山而的用户似乎打赏的很多次，看了是元兄的忠实粉丝。

统计了下，元神共有4222次打赏。

点击查看大图

打赏总金额为：
24128.13

点击查看大图

好吧，太出乎意料了！！！还以为会有几百万的打赏金额呀，最后算出来才只有24128，这点钱，元兄只够补仓5手东阿阿胶呀。

然后按照打赏金额排序：

点击查看大图

打赏最高金额的是唐史主任，金额为250元，200元的有十来个，还看到之前梁大师打赏的200元，可以排在并列前10了。

其实大部分人都是拿小钱来打赏下，2元以下就有2621，占了50%了。

还是很支持元神每天坚持发帖，在当前的行情下或可以聊以慰藉，或娱乐大家，或引以为戒，让大家看到股市对散户生活造成的影响，避免重蹈覆辙。

原创文章
转载请注明出处：
http://30daydo.com/article/361

个人公众号：

下篇：
python数据分析入门分析雪球元卫南每个月打赏收入

雪球的元卫南靠打赏收割了多少钱？ python爬虫实例

7 个评论

发起人

推荐内容

雪球的元卫南靠打赏收割了多少钱 ？ python爬虫实例

7 个评论

发起人

推荐内容

相关问题

雪球的元卫南靠打赏收割了多少钱？ python爬虫实例