R语言代做编程辅导Big Data Analytics: Assignment – Hurricane Sandy and Flickr(附答案)

Humans around the world are uploading increasing amounts of information to social media servicessuch as Twitter and Flickr.

To what extent can we exploit this information during catastrophic eventssuch as natural disasters, to gather data about changes to our world at a time when good decisionsmust be reached quickly and effectively?

LE PHUONG撰写

The subject of your current investigation is Hurricane Sandy, a hurricane that devastated portions of theCaribbean and the Mid-Atlantic and Northeastern United States during late October 2012. 

×

现在提到了代写服务,肯定很多人都不会觉得陌生,就算是国内也是有着专业代写作业的服务行业的,能够为有需求的学生提供很多的帮助,不过其实代写机构在国外会更获得学生的支持,这是因为国外的学校对于平时的作业要求比较严格,为了获得更高的分数顺利毕业,不少留学生就会让代写机构帮忙完成作业,比较常见的作业代写类型,就是计算机专业了,因为对于留学生来说这个技术对于Machine Learning或者AI的代码编程要求更高,所以找代写机构完成作业会简单轻松很多,那么代写机构的水平,要怎么选择才会比较高?

1、代写机构正规专业

不论是在什么情况下,选择正规合法经营的机构肯定是首要的操作,这也是为了避免自己在找机构的时候,出现上当受骗的现象,造成自己的经济出现损失,带来的影响还是非常大的,所以需要注意很多细节才可以,所以在这样的情况下,代写机构的选择,也要选择在经营方面属于正规合法的类型,这样才可以保证服务进行的时候,不会出现各种问题,也可以减少损失的出现,而且正规合法也是代写机构的合格基础。

2、代写机构编程能力

作业的难度相信很多人都很熟悉,特别是对于AI深度学习或者是人工神经网络这种算法来说,因为要对SVM、Design Tree、线性回归以及编程有很高的要求,可以说作业的完成要求非常高,因此才会带动代写机构的发展,找专业的代写机构,一般都是会有专业的人员帮忙进行作业的完成,因为这类型的作业对专业要求比较高,因此代写机构也要具备专业能力才可以,否则很容易导致作业的完成出现问题,出现低分的评价。

3、代写机构收费情况

现在有非常多的留学生,都很在意作业的完成度,为了保证作业可以顺利的被完成,要进行的相关操作可是非常多的,代写机构也是因为如此才会延伸出来的,在现在发展也很迅速,现在选择代写机构的时候,一定要重视收费情况的合理性,因为代写作业还是比较费精力的,而且对于专业能力要求也高,所以价格方面一般会收取几千元至万元左右的价格,但是比较简单的也只需要几百元价格。

4、代写机构完成速度

大部分人都很在意代写机构的专业能力,也会很关心要具备什么能力,才可以展现出稳定的代写能力,其实专业的代写机构,对于作业完成度、作业完成时间、作业专业性等方面,都是要有一定的能力的,特别是在完成的时间上,一定要做到可以根据客户规定的时间内完成的操作,才可以作为合格专业的代写机构存在,大众在选择的时候,也可以重视完成时间这一点来。

现在找专业的CS代写机构帮忙完成作业的代写,完全不是奇怪的事情了,而且专业性越强的作业,需要代写机构帮忙的几率就会越高,代写就发展很好,需求量还是非常高的,这也可以很好的说明了,这个专业的难度以及专业性要求,才可以增加代写机构的存在。



As ahurricane approaches, air pressure drops sharply. 

 Your goal is to determine whether a relationshipexists between the progression of Hurricane Sandy, as measured by air pressure, and user behaviouron the photo-sharing site Flickr.

lf you can find a simple relationship between changes in air pressure, and changes in photos taken andthen uploaded to Flickr, then perhaps further investigation of these social media data would give insightinto problems resulting from a hurricane that are harder to measure using environmental sensors alone.This might include the existence of burst pipes, fires,collapsed trees or damaged property. Suchinformation could be of interest both to policy makers charged with emergency crisis management, andinsurance companies too.

Part 1: Acquiring the Flickr data (2%)

Hurricane Sandy,classified as the eighteenth named storm and tenth hurricane of the 2012 Atlantichurricane season,made landfall near Atlantic City,New Jersey at 00:00 Coordinated Universal Time(UTC)on 30 October 2012. You have decided to have a look at how Flickr users behaved for around this date, from 20 October2012, to 10 November 2012. In particular, you are going to look at data on photos uploaded to Flickrwith the text “hurricane sandy”. When were photos with these tags taken?

TASK 1A (1%):

For the period 20 October 2012 00:00 to 10 November 2012 23:59,download hourly counts fromFlickr of the number of photos taken and uploaded to Flickr with labels which include the text “hurricanesandy”. 

Create a data frame containing the hourly counts. Each row of the data frame will need to specify thedate and hour the count relates to, and the number of photos found.(You may wish to use one columnor two for the date and time – either is fine.)

For assessment,,submit the code you wrote to obtain this data,and a CSV file of the data framecontaining the hourly counts.


python岭回归、Lasso、随机森林、XGBoost、Keras神经网络、kmeans聚类链家租房数据地理可视化分析

阅读文章


To solve this exercise, you need to edit the code from the previous lab to:

  • create a function to create the URL you need to acquire one JSON page of data on pages tagged with”hurricane sandy” for a given hour

write code to download this JSON page of data and extract the information on this page whichtells you how many photos were taken in that hour(see Hint 1 below)


随时关注您喜欢的主题


  • write some code to get this count for all the hours in the period above
  • put each count in a row of a data frame, with the relevant date and time in another column

Hint 1:There is more than one way to get hourly counts of the number of photos. Importantly – you donot need to parse all the information about all the individual photos. Instead, the data which is returnedfrom Flickr has some key information in the variable labelled “total” on the first page of the results. Youshould use this or your code will take a very long time to run! On a reasonable broadband connection,the download for Task 1A should take under 15 minutes.Note that because “total” is on the first page,you do not need to download all of the pages.

Hint 2: ln downloading data,you only need to be concerned about min_taken_date andmax_taken_date – you can ignore min_upload_date and max_upload_date.

Hint 3: For the purposes of this exercise, don’t worry about using a ““bbox”when downloading thisinformation.

Hint 4: The “time taken”on a photo is in the photographer’s local time. For the purposes of thisexercise, don’t worry about time zones – just use the times which Flickr specifies.

Hint 5: You can save a data frame to a CSV file using write.csv.See Cookbook for R for more guidance:

TASK 1B(1%):

The hurricane might not be the only infuence on the number of photos people take. Perhaps peopletake more photos at the weekend or at certain times of day, for example.

We should account for this by finding out how many photos were taken in total during each hour.

For the period 20 October 2012 00:00 to 10 November 2012 23:59, download hourly counts fromFlickr of the TOTAL number of photos taken and uploaded to Flickr.

Create a data frame containing the hourly counts, of the same format as the data frame you created forthe last task.

For assessment,submit the code you wrote to obtain this data, and a CSV fle of the data framecontaining the hourly counts.

The solution to this exercise is extremely similar to the solution to Task 1A. However, here you do notwant to only count photos with the text “hurricane sandy” attached.

Hint 1:Again,the download for this exercise should take around 15 minutes on a good broadbandconnection. If your download is taking too long, make sure you are not trying to count the photographsby downloading all the pages.

Hint 2: We have recently seen the Flickr database giving counts of 0 between 5am and 6am in themorning. Don’t worry if this happens to you too. We will clean up the data in the next step.

A:

install.packages("lubridate")
library(lubridate)
install.packages("Rcpp")

buildFlickrURL(hourBegin=as.POSIXct("2012-10-20 00:00:00"), page=1)#查找特定小时内的照片

library(RCurl)
source('E:/davidvictoria/getFlickrData.r')
source('E:/davidvictoria/buildFlickrURL.r')
flickrURL <- buildFlickrURL(hourBegin=as.POSIXct("2012-10-20 00:00:00"),page=1)
flickrData <- getURL(flickrURL,ssl.verifypeer = FALSE)#下载flickr数据
flickrData

install.packages("RJSONIO")
library(RJSONIO)

flickrParsed <- fromJSON(flickrData)#解析JSON格式到R格式 
flickrParsed
str(flickrParsed, max.level=2)#浏览转换后的数据
flickrParsed$photos$photo 

library(plyr)
flickrDF <- ldply(flickrParsed$photos$photo, data.frame)#转换成数据框形式
head(flickrDF)


=1, last=15))#提取前15个字符,到小时为止

head(sandyFlickrData$Date)查看日期数据

sandyFlickrTS <- xtabs(~Date, # Count entries per hour...
                        sandyFlickrData) # ... in the sandyFlickrData计算每个小时的照片数量
head(sandyFlickrTS)#每个小时的照片数量信息
sandyFlickrTS <- as.data.frame(sandyFlickrTS) 将照片数量数据转换成数据框的格式
head(sandyFlickrTS)
str(sandyFlickrTS)

B:
source('E:/davidvictoria/getFlickrData1.r')
source('E:/davidvictoria/buildFlickrURL1.r')

flickrURL <- buildFlickrURL1(hourBegin=as.POSIXct("2012-10-20 00:00:00"

15个字符,到小时为止
head(FlickrData$Date)
FlickrTS <- xtabs(~Date, # Count entries per hour...
                   FlickrData) # ... in the FlickrData计算每个小时的照片数量
head(FlickrTS)#每个小时的照片信息
FlickrTS <- as.data.frame(FlickrTS)#转换成数据框格式,并查看
head(FlickrTS)
str(FlickrTS)

image.png

Part 2:Processing the Flickr data (2%)TASK 2A(2%):

You now want to use the data you downloaded on the total number of photos taken to normalise thedata you have on the number of Hurricane Sandy photos taken.

First, clean up your total hourly counts data. Change any entries where Flickr has given you counts of O

total photos (very unlikely given the distribution of the rest of the data) to NA values. This is one line ofcode. lf you’re not sure how to replace Os with NAs, try Googling “r replace with na” for some hints.Second, merge the total hourly counts data into your Hurricane Sandy count data frame, so that eachrow has an entry for the hourly count of Hurricane Sandy photos, and the total hourly count of photos.This is one line of code. You will find the command “merge”useful.

Finally, create a new column which contains the normalised count of Hurricane Sandy photos. Yournew column should contain the result of dividing the hourly count of Hurricane Sandy photos by thetotal hourly count of photos.This is one line of code.You will find the command “transform”useful. For assessment,submit the code you wrote to process this data,and a CSV file of the

data frame containing the Hurricane Sandy hourly counts, the total hourly counts with the Os replaced,and thenormalised Hurricane Sandy hourly counts.


allhours <- seq(as.POSIXct("2012-10-20 00:00:00"), 
                   as.POSIXct("2012-11-10 23:59:00"), 
                   by="1 hour")#生成桑迪飓风时间段所有小时的时间向量
allhours <- data.frame(Date=allhours)
head(allhours)

sandyFlickrTS <- merge(sandyFlickrTS, # Merge the time series data将时间向量和照片数量向量合并
                       allhours, # ... with the list of days
                       by="Date",  # Matching rows based on "Date"
                       all=T)  # Keeping all entries - not only those
                       # which exist in both data frames
)#重命名列名为照片数量

head(sandyFlickrTS)

sandyFlickrTS <- transform(sandyFlickrTS, 
                           sandyFlickrTS$ncount =sandyFlickrTS$Freq/FlickrTS$Freq )#将照片数量进行标准化(用包含关键词桑迪飓风的照片数量除以总的照片数量)
                           

Part 3: Acquiring and processing the environmental data (2%)

As a hurricane approaches an area, atmospheric pressure falls. We can therefore use data on atmospheric pressure as a measure of the hurricane’s progress.

TASK 3A (1%):

Hurricane Sandy made landfall very close to Atlantic City in New Jersey.
We can retrieve atmospheric pressure readings from Atlantic City from the following website:
http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/automatedsurface-observing-system-asos
Click on User Interface Page. Select the Advanced Options. Agree to the terms.
You now want data for the United States.
On the next page, you want data for New Jersey, where you will retrieve data for selected stations.
On the next page, select the first entry for “Atlantic City”.
On the page after that, select Atmospheric Pressure Observation.
Pick the period we want the data for (from 20 October 2012 to 10 November 2012, the same as the
Flickr data) via “Use Date Range”. Do not select “Select Only Obs. on the hour”.
Output the data with comma delimiters, including the station name.
Continue, and on the next page enter your email address. The data will be sent to you shortly.
For assessment, submit the text file you can download from the line in the email you have received
labelled “Data File”.

TASK 3B (1%):

Once the data has arrived, read in the file. Leave out the first line, and the headers too, as there are no commas in the headers, making them trickier to parse. (Remember how you left out lines in the second
lab exercise when loading Google data.) This is one line of code.
The information you require is the date on which each reading was taken, the time at which it was taken, and the atmospheric pressure measurement.
Identify which columns contain the data you require using information in the other files which the NOAA sent you. Create a data frame which contains only these three columns. This is one line of code. You will find the subset command useful for this.
Label the columns “Date”, “Time” and “AtmosPressure”. This is one line of code. Look at how you have
renamed columns in the labs!
For assessment, submit the code you wrote to process this data, and a CSV file of the data frame with three columns which you just created.
Hint 1: If you can’t work out which columns you need, look at the “format documentation” NOAA sent

Part 4: Combining the Flickr and environmental data (2%)

Now you have the Flickr data and the environmental data.

To work out how these data sets relate, you need to merge them.

For each hour from the beginning of 20 October 2012 to 10 November 2012,you have both anormalised count of the number of Hurricane Sandy Flickr photos taken, and a measurement ofatmospheric pressure in Atlantic City. However, the atmospheric pressure data uses a different format than the Flickr data for specifying thedate and time.

You need to work out how to change the format of the atmospheric pressure date and time, so that itmatches the format used in the Flickr data. This might need about three lines of code, but there are lotsof different solutions.

You then need to merge the two datasets to create one data frame, where each row represents onehour, and contains a measurement of the atmospheric pressure and the normalised count of HurricaneSandy Flickr photos.This is one line of code.The merge function will be useful here.

For assessment, submit the code you wrote to process this data,and a CSV file of the data frame withthe atmospheric pressure data and Flickr counts you created.

Hint 1: There are many different ways to change the date formatl You might want to look atas.POSIXct(, which does something similar to as.Date(), but can represent times as well as Dates.

Part 5: Visualising and analysing the data(2%)

Now you have your data organised, you can plot some graphs to take a look at your data, and begin toanalyse the relationship between these two time series.

TASK 5A (1%):

First, use ggplot to create a line graph of the normalised Hurricane Sandy Flickr photos time series, sowe can see how this count changed across time. Second, use ggplot to create a line graph of the atmospheric pressure in Atlantic City, so that we cansee how atmospheric pressure changed across time.

Make the plots look as nice as you can in ggplot. Include these two plots as two panels of the samefigure in your PDF answer sheet.Below your figure, write a caption describing what your figure shows.For assessment,submit the code you wrote to create these figures, and your figures and caption inyour PDF answer sheet as described above.

Hint 1: lf you’re not sure what a figure caption should look like,look at some of the papers we haveprovided as further reading for examples.

TASK 5B(1%):

Finally, in your answer sheet PDF, explain how you would carry out a correlational analysis to determine whether there is a relationship between these two time series.

Would your analysis make any assumptions about the distribution of the data? In R, create any graphsand run any tests you need to run to check these assumptions.

Now carry out a correlational analysis. In your answer sheet PDF, write a short description of the resultsyou have found. Keep this under 150 words. For assessment, submit your answer sheet PDF, describing your analysis method and your results asspecified above. Submit any code you wrote to check assumptions for your analysis and carry youranalysis out. Include any graphs you generated in the PDF with a short caption to explain what theyshow.

Hint 1: There are various ways of analysing such relationships. For the purposes of this assessment, wewill restrict the analysis to a correlational analysis.

flickr 气压对比图.png
气压-日期折线图.png

关于分析师

在此对LE PHUONG对本文所作的贡献表示诚挚感谢,她在山东大学完成了计算机科学与技术专业的硕士学位,专注数据分析、数据可视化、数据采集等。擅长Python、SQL、C/C++、HTML、CSS、VSCode、Linux、Jupyter Notebook。

 
QQ在线咨询
售前咨询热线
15121130882
售后咨询热线
0571-63341498

关注有关新文章的微信公众号


永远不要错过任何见解。当新文章发表时,我们会通过微信公众号向您推送。

技术干货

最新洞察

This will close in 0 seconds