预测是通过基于来自过去和当前状态的信息来对将要发生的事情做出判断。

由Kaizong Ye，Coin Ge撰写

最近我们被客户要求撰写关于彩票预测的研究报告。

乐透彩票作为一种广受欢迎的博彩形式，其结果的随机性和不确定性一直吸引着众多参与者试图探寻其中的规律以提高中奖概率。然而，乐透的中奖号码是基于复杂的随机机制生成的，传统的统计分析方法往往难以捕捉到其中的潜在模式。每一次开奖的结果都独立于之前的结果，这使得预测乐透号码成为一项极具挑战性的任务。近年来，随着人工智能和机器学习技术的飞速发展，一些新兴的算法为乐透预测带来了新的思路。

遗传算法是一种模拟自然选择和遗传机制的优化算法，它通过不断迭代和进化，在解空间中搜索最优解。神经网络则具有强大的非线性映射能力，能够自动从大量数据中学习到复杂的模式和规律。模糊逻辑控制算法则可以处理不确定性和模糊信息，对数据中的模糊特征进行有效的分析和处理。

每个人每天都以不同程度的成功概率解决预测问题。

例如，需要预测天气，收益，能源消耗，外汇货币对或股票，地震和许多其他东西的变动…

可下载资源

完整程序、数据和文档（word）

本文分析的智能体、数据、代码、报告分享至会员群

加入会员群

作者

Kaizong Ye
✉ 联系我们

预测分析

通过分类，深度学习能够在例如图像中的像素和人的名称之间建立相关性。你可以称之为静态预测。出于同样的原因，暴露于足够的正确数据，深度学习能够建立当前事件和未来事件之间的相关性。从某种意义上说，未来的事件就像标签一样。深度学习并不一定关心时间，或者事情尚未发生。给定时间序列，深度学习可以读取一串数字并预测下一个最可能发生的数字。

阶段I.图形：看数据;

1.单变量：形状和分布; （点/抖动图，直方图和核密度估计，累积分布函数，秩序…）

2.两个变量：建立关系; （散点图，征服噪声，对数图，银行……）

3.时间变量：时间序列分析; （平滑，关联，过滤器，卷积..）

4.两个以上的变量;图形多变量分析;（假彩色图，多图……）

5.Intermezzo：一个数据分析会议;（Session，gnuplot ..）

6 …

数据样例

视频

LSTM神经网络架构和原理及其在Python中的预测应用

探索见解 ➜

去bilibili观看 ➜

探索更多视频 ➜

2011001;3;9;20;24;26;32;10
2011002;6;8;12;17;28;33;5
2011003;13;14;21;22;23;27;4
2011004;4;6;8;10;13;26;5
2011005;6;9;12;14;20;22;13
2011006;1;3;5;13;16;18;5
2011007;1;9;17;24;26;31;5
2011008;10;12;13;17;24;31;15
2011009;17;18;23;24;25;26;4
2011010;1;4;5;9;15;19;13
2011011;1;12;18;19;21;24;10
2011012;7;8;11;13;15;26;13
2011013;1;3;13;16;21;22;8
2011014;5;7;10;11;23;26;1

对数据可视化

import random
for x in range(0,6):#NUM_OF_RED=6
    choice_num_red = random.choice( redBalls )
    print( choice_num_red )
    redBalls.remove(choice_num_red)
for y in range(0,1):#NUM_OF_BLUE=1
    choice_num_blue = random.choice( blueBalls )
    print( choice_num_blue )
#scipy test code

#matplotlib test 
print(pylab.plot(abs(b)))
#show()
#from matplotlib.mlab import normpdf
#import matplotlib.numerix as nx
#import pylab as p
#
#x = nx.arange(-4, 4, 0.01)
#y = normpdf(x, 0, 1) # unit normal
#p.plot(x,y, color='red', lw=2)
#p.show()

数据分布

plt.plot(dfs_blue_balls_count_values,'x',label='Dot plot')
plt.legend()
plt.ylabel('Y-axis,number of blue balls')
plt.xlabel('X-axis,number of duplication')
plt.show()
#Jitter plot
idx_min = min(dfs_blue_balls_count_values)
idx_max = max(dfs_blue_balls_count_values)
idx_len = idx_max-idx_min
print("min:",idx_min,"max:",idx_max)
num_jitter = 0
samplers = random.sample(range(idx_min,idx_max),idx_len)
while num_jitter < 5:
    samplers += random.sample(range(idx_min,idx_max),idx_len)
    num_jitter += 1
##lots of jitter effect
print("samplers:",samplers)
#plt.plot(samplers,'ro',label='Jitter plot')
#plt.ylabel('Y-axis,number of blue balls')
#plt.xlabel('X-axis,number of duplication')
#plt.legend()
#plt.show()
#Histograms and Kernel Density Estimates:
#Scott rule,
#This rule assumes that the data follows a Gaussian distribution;

#Plotting the blue balls appear frequency histograms(x-axis:frequency,y-axis:VIPs)
##@see http://pandas.pydata.org/pandas-docs/dev/basics.html#value-counts-histogramming
num_of_bin = len(series_blue_balls_value_counts)
array_of_ball_names = series_blue_balls_value_counts.keys()
print("Blue ball names:",array_of_ball_names)
list_merged_by_ball_id = []
for x in xrange(0,num_of_bin):
    num_index = x+1.5
    list_merged_by_ball_id += [num_index]*dfs_blue_balls_count_values[x]
print("list_merged_by_ball_id:",list_merged_by_ball_id)  
##Histograms plotting
plt.hist(list_merged_by_ball_id, bins=num_of_bin)
plt.legend()
plt.xlabel('Histograms,number of appear time by blue ball number')
plt.ylabel('Histograms,counter of appear time by blue ball number')
plt.show()
###Gaussian_KDE

##CDF(The Cumulative Distribution Function
from scipy.stats import cumfreq
idx_max = max(dfs_blue_balls_count_values)
hi = idx_max
a = numpy.arange(hi) ** 2
#    for nbins in ( 2, 20, 100 ):
for nbins in dfs_blue_balls_count_values:    
    cf = cumfreq(a, nbins)  # bin values, lowerlimit, binsize, extrapoints
    w = hi / nbins
    x = numpy.linspace( w/2, hi - w/2, nbins )  # care
    # print x, cf
    plt.plot( x, cf[0], label=str(nbins) )

plt.legend()
plt.xlabel('CDF,number of appear time by blue ball number')
plt.ylabel('CDF,counter of appear time by blue ball number')
plt.show()

###Optional: Comparing Distributions with Probability Plots and QQ Plots
###Quantile plot of the server data. A quantile plot is a graph of the CDF with the x and y axes interchanged.
###Probability plot for the data set shown,a standard normal distribution:
###@see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
import scipy.stats as stats
prob_measurements = numpy.random.normal(loc = 20, scale = 5, size=num_of_bin)   
stats.probplot(prob_measurements, dist="norm", plot=plt)
plt.show()

最受欢迎的见解

1.R语言实现CNN（卷积神经网络）模型进行回归

2.r语言实现拟合神经网络预测和结果可视化

3.python用遗传算法-神经网络-模糊逻辑控制算法对乐透分析

4.R语言结合新冠疫情COVID-19股票价格预测：ARIMA，KNN和神经网络时间序列分析

5.Python TensorFlow循环神经网络RNN-LSTM神经网络预测股票市场价格时间序列和MSE评估准确性

6.Matlab用深度学习长短期记忆（LSTM）神经网络对文本数据进行分类

7.用于NLP的seq2seq模型实例用Keras实现神经机器翻译

8.R语言用FNN-LSTM假近邻长短期记忆人工神经网络模型进行时间序列深度学习预测

9.Python用RNN循环神经网络：LSTM长期记忆、GRU门循环单元、回归和ARIMA对COVID-19新冠疫情新增人数时间序列预测

阶段II.Analytics：建模数据;

1.评估;

2.缩放参数的模型;

3.概率模型的分析;

4 …

第三阶段。计算：挖掘数据;

1.Simulations;

2.寻找集群;

3.在森林中寻找决策树;

4 ….

第四阶段。应用：使用数据;

1.报告，BI（商业智能），仪表板;

2.财务计算和建模;

3.预测分析;

4 ….


from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from os import path

import numpy
import tensorflow as tf

from tensorflow.contrib.timeseries.python.timeseries import estimators as ts_estimators
from tensorflow.contrib.timeseries.python.timeseries import model as ts_model

try:
  import matplotlib  # pylint: disable=g-import-not-at-top
  matplotlib.use("TkAgg")  # Need Tk for interactive plots.
  from matplotlib import pyplot  # pylint: disable=g-import-not-at-top
  HAS_MATPLOTLIB = True
except ImportError:
  # Plotting requires matplotlib, but the unit test running this code may
  # execute in an environment without it (i.e. matplotlib is not a build
  # dependency). We'd still like to test the TensorFlow-dependent parts of this
  # example.
  HAS_MATPLOTLIB = False

_MODULE_PATH = path.dirname(__file__)
_DATA_FILE = path.join(_MODULE_PATH, "data/multivariate_periods.csv")