当前位置:天才代写 > 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 > 统计代写 XML data source代写 SQL database代写 R代写

统计代写 XML data source代写 SQL database代写 R代写

2020-11-29 11:03 星期日 所属: 作业代写,留学生作业代写-北美、澳洲、英国等靠谱代写 浏览:1537

assignment 3B代写

assignment 3B

统计代写 In this assignment we are going to explore accessing data from different sources. You have already learned how to read text

Peter Dolan 3/6/2017

INSTRUCTIONS:

In this assignment we are going to explore accessing data from different sources. You have already learned how to

read text data in table format using (or the Tools…. Import Data-set… functions ). Now you will learn

how to access data from a web page, an XML data source, and data stored in an SQL database.

Problem 1. (8 points)统计代写

Web Scraping Background:

Questions (intermingle your answers)

The Web-site “http://www.50states.com/news/minn.htm” has a list of newspapers in Minnesota

  1. Use to obtain all HTML-lists from this webpage

ANSWER:

  1. Identify the entry containing the list of newspapers

ANSWER:

  1. Use , your knowledge of regular expressions, and whatever other techniques youdeem

necessary to extract the names of every newspaper.assignment 3B代写

ANSWER:

  1. Extract the names of every town

ANSWER:

  1. Generate a data-frame with two columns: One for the newspaper’s name, and the other for the town. Use to output the first 10 rows of your data-frame.

ANSWER:

Problem 2. (8 points)统计代写

Create PDF in your applications with the Pdfcrowd HTML to PDF API

The basic strategy of extracting data from HTML (a web-page) is the same for tabular data (tables) as it is for lists. Look over the web page https://seer.cancer.gov/statfacts/html/thyro.html for some statistical data on Thyroid Cancer. In particular, look for the types of cancer, their number of occurrences, and the estimated deaths (this will appear as one table. For this problem you will want to use readHTMLTable()to scrape the data from the web-page.

Questionsassignment统计代写

to scrape the data from the web-page.

  1. Tryusing on the URL. Does the data you want appear?

ANSWER:

url=”https://seer.cancer.gov/statfacts/html/thyro.html”

  1. Use to get a local copy of the data in the URL and then use on that file.

Did that provide the data you want?

ANSWER:

url=”https://seer.cancer.gov/statfacts/html/thyro.html”

  1. Nowuse on the downloaded file. You should be able to extract the desired data this
ANSWER:

url=”https://seer.cancer.gov/statfacts/html/thyro.html”

  1. Store the table data in avariable named . The first column is superfluous– remove it. Now rename the remaining columns as “type”, “new”, and “death”. Assign row-names to the table using the cancer type. Then remove that column. Your data should have a format, somewhat like this (the table below is from 2015, not 2016):
new death
Breast Cancer (Female) 231,840 40,290
Lung and Bronchus Cancer 221,200 158,040
Prostate Cancer 220,800 27,540
Colon and Rectum Cancer 132,700 49,700
new death
Bladder Cancer 74,000 16,000
Melanoma of the Skin 73,870 9,940
Non-Hodgkin Lymphoma 71,850 19,790
Thyroid Cancer 62,450 1,950
Kidney and Renal Pelvis Cancer 61,560 14,080
Endometrial Cancer 54,870 10,170

Now the problem is that the all the data in your table is “character” (it’s the commas). You could try to use, but that won’t work (again, because of the commas). Remove them first using (look it up if you need to). Perhaps more than anything else in these assignments, this requires more independent understanding on your part. If you are not certain how to proceed, please contact one of the instructors to ask for some help.统计代写

ANSWER:

  1. If you have managed to produce a 10-row by 2-col table with numeric entries, named rows, and named columns then create a bar plot showing the ratio of deaths (in 2016) to new cases (again in 2016). Based on this chart which cancer shows the best survival rate? (This is a bit unfair since new cases in 2016 are not really comparable to the death rate in 2016– but it’s not a horribleexaggeration)

Problem 3. (6 points)

XML background:

Questions

  1. Use and to import the data from the URL into a data.frame.

ANSWER:

url=“http://api.sba.gov/geodata/city_county_links_for_state_of/mn.xml

  1. Use the shape file, http://www2.census.gov/geo/tiger/GENZ2010/gz_2010_27_700_00_500k.zip, and the techniques of chapter 15 to plot all 81 “Civil” entries from your SBA data on a map ofMinnesota

ANSWER:

library(maptools)

library(PBSmapping)统计代写

统计代写
统计代写

Problem 4. (8 points)

XPath Background:

Questions

  1. What is the structure of your uncompressed file? (Ie, what are the directories, subdirectories, andfiles)?

ANSWER:

  1. Extract all the text that is in bold 统计代写

ANSWER:

  1. Suppose you are hired as a TA for Introduction to Statistics and are asked to create a spreadsheet that details the questions asked and their point value. You decide it would be far more interesting to automate the process then to do it by hand. You begin to look carefully at the documents and realize that there are a few consistent patterns:
  • The question number and its point value are always bold

  • The question number and its point value are always bold The question number always has a period in it
  • The point value is always followed by the word ‘points’ 统计代写

Demonstrate how to use R to extract this information from and turn it into a data.frame– the first column should contain the problems and be of type character, and the second should be of type numeric and contain the point values. Your data should look something like this:

Problem Point Value
4.8 6
4.24 2
4.33 4
4.32 4
4.36 4
Problem Point Value
4.56 14
4.74 3
4.89 5
4.101 8
4.102 6
4.116 8
4.117 6
4.128 3
4.144 6

ANSWER:

  1. What is the total number of points that you extracted? How many points does the document think you need? If there is a difference where does the mistake lie?

ANSWER:

Problem 5. (12 points)

SQL Background:

Questions

  1. The SQL equivalent of the R command str() is DESCRIBE. We know that the database SUMMER_2015 has two tables. Let’s examine the structure of prices :

ANSWER:

  1. The basic form of the SELECT command is: SELECT <cols> FROM <table>. Experiment with that. For example try the following SQL commands (you’ll need to R-ify them of course):
 

SELECT item,unit FROM inventory; SELECT unit FROM inventory;

SELECT unit, unit, item FROM inventory; 统计代写

The * option will show all fields in a table. Try these too:

 

SELECT * FROM inventory; SELECT *, item FROM inventory;

ANSWER:

3.You can control the name of a column in theoutput:

 

SELECT item AS ‘mega stuff’ FROM inventory;

SELECT item AS stuff, amount, unit AS ‘counting thing’ FROM inventory;

Try returning the items and their amount, but call the columns “junk_food” and “count”

ANSWER:

  1. SQL has a large collection of functions and operators (see https://mariadb.com/kb/en/functions-and-operators/if you want to know more– but it’s pretty overwhelming, so don’t feel obligated to follow that link). For example:
 

dbGetQuery(con,”SELECT right(item,4) AS the_righiest FROM inventory”)

## the_righiest ## 1 hips 统计代写

## 2 eese

## 3 eese

## 4 corn

## 5 dogs

## 6 buns

Now you should: * Use

to store the contents of the

table in a data-frame called

ANSWER:

Use R’s column.function on thecolumn into return the first 3 letters in the

   ANSWER:

Useand the SQL function

to do the same via SQL.

ANSWER:

  1. SQL divides its commands into clauses. You have already beenusing the clause and theclause. SQL is picky and wants the clauses to go in the proper order. You are now going to see an example of the clause: 统计代写
 

dbGetQuery(con,”SELECT * FROM inventory WHERE amount < 50″)

 

## item unit amount id ## 1 nacho cheese gallons 20 3

## 2 popcorn gallons 5 4

## 3 buns units 3 6

At this point you may find the tutorials at http://www.sqlclauses.com to be helpful. (although you may ignore alot of them).

SQL also has a limited ability to use regular expressions. The following command will find all rows in which contains an ‘n’:
 

dbGetQuery(con,”SELECT * FROM inventory WHERE item REGEXP ‘n'”)

 

## item unit amount id ## 1 nacho cheese gallons 100 2

## 2 nacho cheese gallons 20 3

## 3 popcorn gallons 5 4

## 4 buns units 3 6

Many variants of SQL even support a option:

This options is particularly helpful when searching through people’s names.

See if you can figure out how to return all the rows where the item name has an H OR, the amount is above 30.

ANSWER:

  1. Two other clauses arehelpful: The clause allows you to only return a few results from your query: 统计代写

dbGetQuery(con,“SELECT * FROM inventory LIMIT 2”)

## item unit amount id

## 1 Buffalo chips patty 1200 1

## 2 nacho cheese gallons 100 2

and the DISTINCT clause acts like R’s unique() function:

dbGetQuery(con,“SELECT distinct item from inventory”)

## item

## 1 Buffalo chips

## 2 nacho cheese

## 3 popcorn

## 4 hot dogs

## 5 buns

However the real power in SQL comes from joins. Again, this is a complicated topic. This URL http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sqljoins.html has a good explanation (if you are interested), but our database server doesn’t do OUTER JOINS correctly… so some of this knowledge is,for the moment, theoretical).

In any event, R-ify the following commands:

SELECT * FROM inventory

SELECT * FROM prices

SELECT * FROM inventory,prices

  1. Noticethat using two tables in the clause created a result with 6 cross 4  The tricky bit is beingclever about your use of :
## item unit amount id id price notes
## 1 Buffalo chips patty 1200 1 1 12.43 Buy from Ritchie
## 2 nacho cheese gallons 100 2 2 100.19 Version A
## 3 nacho cheese gallons 20 3 3 200.19 Premium

You will notice a few new things. Since both tables had an field– we can’t use “id” alone to specify its values. Instead we need to use SQL’s table.field notation. Second, this time we only had 3 rows in our result. Pay close attention to this next command. It uses everything that we have learned, introduces a new clause and shows how new fields can be constructed on the fly:

dbGetQuery(con,“SELECT amount,item,price as unit_price, price*amount as total_price F ROM inventory,prices WHERE inventory.id=prices.id AND amount > 50 ORDER BY amount”)统计代写

## Warning in .local(conn, statement, …): Decimal MySQL column 2 imported as

numeric

## Warning in .local(conn, statement, …): Decimal MySQL column 3 imported as

numeric

## amount item unit_price total_price

## 1 100 nacho cheese 100.19 10019

## 2 1200 Buffalo chips 12.43 14916

See if you can produce a query that will return the unit, item, and price of everything that costs more than $100.00. Don’t leave out inventory.id=prices.id .

ANSWER:

统计代写
统计代写

更多其他:C++代写 考试助攻 C语言代写 计算机代写 report代写 project代写 数学代写 java代写 程序代写 algorithm代写 C++代写 r代写 北美代写

合作平台:天才代写 幽灵代写 写手招聘 Essay代写

 

天才代写-代写联系方式