python用遗传算法 神经网络 模糊逻辑控制算法对乐透进行预测

预测是通过基于来自过去和当前状态的信息来对将要发生的事情做出判断。

由Kaizong Ye,Coin Ge撰写

最近我们被客户要求撰写关于彩票预测的研究报告。每个人每天都以不同程度的成功概率解决预测问题。例如,需要预测天气,收益,能源消耗,外汇货币对或股票,地震和许多其他东西的变动…

预测分析

通过分类,深度学习能够在例如图像中的像素和人的名称之间建立相关性。你可以称之为静态预测。出于同样的原因,暴露于足够的正确数据,深度学习能够建立当前事件和未来事件之间的相关性。从某种意义上说,未来的事件就像标签一样。深度学习并不一定关心时间,或者事情尚未发生。给定时间序列,深度学习可以读取一串数字并预测下一个最可能发生的数字。

阶段I.图形:看数据;

1.单变量:形状和分布; (点/抖动图,直方图和核密度估计,累积分布函数,秩序…)

2.两个变量:建立关系; (散点图,征服噪声,对数图,银行……)

3.时间变量:时间序列分析; (平滑,关联,过滤器,卷积..)

4.两个以上的变量;图形多变量分析;(假彩色图,多图……)

5.Intermezzo:一个数据分析会议;(Session,gnuplot ..)

6 …

数据样例


视频

LSTM神经网络架构和原理及其在Python中的预测应用

探索见解

去bilibili观看

探索更多视频

2011001;3;9;20;24;26;32;10
2011002;6;8;12;17;28;33;5
2011003;13;14;21;22;23;27;4
2011004;4;6;8;10;13;26;5
2011005;6;9;12;14;20;22;13
2011006;1;3;5;13;16;18;5
2011007;1;9;17;24;26;31;5
2011008;10;12;13;17;24;31;15
2011009;17;18;23;24;25;26;4
2011010;1;4;5;9;15;19;13
2011011;1;12;18;19;21;24;10
2011012;7;8;11;13;15;26;13
2011013;1;3;13;16;21;22;8
2011014;5;7;10;11;23;26;1

对数据可视化

import random
for x in range(0,6):#NUM_OF_RED=6
    choice_num_red = random.choice( redBalls )
    print( choice_num_red )
    redBalls.remove(choice_num_red)
for y in range(0,1):#NUM_OF_BLUE=1
    choice_num_blue = random.choice( blueBalls )
    print( choice_num_blue )
#scipy test code

#matplotlib test 
print(pylab.plot(abs(b)))
#show()
#from matplotlib.mlab import normpdf
#import matplotlib.numerix as nx
#import pylab as p
#
#x = nx.arange(-4, 4, 0.01)
#y = normpdf(x, 0, 1) # unit normal
#p.plot(x,y, color='red', lw=2)
#p.show()

数据分布

plt.plot(dfs_blue_balls_count_values,'x',label='Dot plot')
plt.legend()
plt.ylabel('Y-axis,number of blue balls')
plt.xlabel('X-axis,number of duplication')
plt.show()
#Jitter plot
idx_min = min(dfs_blue_balls_count_values)
idx_max = max(dfs_blue_balls_count_values)
idx_len = idx_max-idx_min
print("min:",idx_min,"max:",idx_max)
num_jitter = 0
samplers = random.sample(range(idx_min,idx_max),idx_len)
while num_jitter < 5:
    samplers += random.sample(range(idx_min,idx_max),idx_len)
    num_jitter += 1
##lots of jitter effect
print("samplers:",samplers)
#plt.plot(samplers,'ro',label='Jitter plot')
#plt.ylabel('Y-axis,number of blue balls')
#plt.xlabel('X-axis,number of duplication')
#plt.legend()
#plt.show()
#Histograms and Kernel Density Estimates:
#Scott rule,
#This rule assumes that the data follows a Gaussian distribution;

#Plotting the blue balls appear frequency histograms(x-axis:frequency,y-axis:VIPs)
##@see http://pandas.pydata.org/pandas-docs/dev/basics.html#value-counts-histogramming
num_of_bin = len(series_blue_balls_value_counts)
array_of_ball_names = series_blue_balls_value_counts.keys()
print("Blue ball names:",array_of_ball_names)
list_merged_by_ball_id = []
for x in xrange(0,num_of_bin):
    num_index = x+1.5
    list_merged_by_ball_id += [num_index]*dfs_blue_balls_count_values[x]
print("list_merged_by_ball_id:",list_merged_by_ball_id)  
##Histograms plotting
plt.hist(list_merged_by_ball_id, bins=num_of_bin)
plt.legend()
plt.xlabel('Histograms,number of appear time by blue ball number')
plt.ylabel('Histograms,counter of appear time by blue ball number')
plt.show()
###Gaussian_KDE

##CDF(The Cumulative Distribution Function
from scipy.stats import cumfreq
idx_max = max(dfs_blue_balls_count_values)
hi = idx_max
a = numpy.arange(hi) ** 2
#    for nbins in ( 2, 20, 100 ):
for nbins in dfs_blue_balls_count_values:    
    cf = cumfreq(a, nbins)  # bin values, lowerlimit, binsize, extrapoints
    w = hi / nbins
    x = numpy.linspace( w/2, hi - w/2, nbins )  # care
    # print x, cf
    plt.plot( x, cf[0], label=str(nbins) )

plt.legend()
plt.xlabel('CDF,number of appear time by blue ball number')
plt.ylabel('CDF,counter of appear time by blue ball number')
plt.show()

###Optional: Comparing Distributions with Probability Plots and QQ Plots
###Quantile plot of the server data. A quantile plot is a graph of the CDF with the x and y axes interchanged.
###Probability plot for the data set shown,a standard normal distribution:
###@see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
import scipy.stats as stats
prob_measurements = numpy.random.normal(loc = 20, scale = 5, size=num_of_bin)   
stats.probplot(prob_measurements, dist="norm", plot=plt)
plt.show()

阶段II.Analytics:建模数据;

1.评估;

2.缩放参数的模型;

3.概率模型的分析;

4 …

    ​

 ​

第三阶段。计算:挖掘数据;

1.Simulations;

2.寻找集群;

3.在森林中寻找决策树;

4 ….

第四阶段。应用:使用数据;

1.报告,BI(商业智能),仪表板;

2.财务计算和建模;

3.预测分析;

4 ….


from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from os import path

import numpy
import tensorflow as tf

from tensorflow.contrib.timeseries.python.timeseries import estimators as ts_estimators
from tensorflow.contrib.timeseries.python.timeseries import model as ts_model

try:
  import matplotlib  # pylint: disable=g-import-not-at-top
  matplotlib.use("TkAgg")  # Need Tk for interactive plots.
  from matplotlib import pyplot  # pylint: disable=g-import-not-at-top
  HAS_MATPLOTLIB = True
except ImportError:
  # Plotting requires matplotlib, but the unit test running this code may
  # execute in an environment without it (i.e. matplotlib is not a build
  # dependency). We'd still like to test the TensorFlow-dependent parts of this
  # example.
  HAS_MATPLOTLIB = False

_MODULE_PATH = path.dirname(__file__)
_DATA_FILE = path.join(_MODULE_PATH, "data/multivariate_periods.csv")


可下载资源

关于作者

Kaizong Ye拓端研究室(TRL)的研究员。

本文借鉴了作者最近为《R语言数据分析挖掘必知必会 》课堂做的准备。

​非常感谢您阅读本文,如需帮助请联系我们!

 
QQ在线咨询
售前咨询热线
15121130882
售后咨询热线
0571-63341498

关注有关新文章的微信公众号


永远不要错过任何见解。当新文章发表时,我们会通过微信公众号向您推送。

技术干货

最新洞察

This will close in 0 seconds