forked from resbaz/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-reading-data.html
94 lines (94 loc) · 5.7 KB
/
03-reading-data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-responsive.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="stylesheet" type="text/css" href="css/swc-workshop-and-lesson.css" />
<link rel="stylesheet" type="text/css" href="css/lesson.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container container-full-width card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<div class="row-fluid">
<div class="span10 offset1">
<h1 class="title">R for reproducible scientific analysis</h1>
<h2 class="subtitle">Reading data</h2>
<div id="learning-objectives" class="objectives">
<h2>Learning Objectives</h2>
<ul>
<li>To be able to read in regular data into R</li>
</ul>
</div>
<h3 id="reading-in-data">Reading in data</h3>
<p>Now that we've obtained the gapminder dataset, we want to load it into R. Before reading in data, it's a good idea to have a look at its structure. Let's take a look using our newly-learned shell skills:</p>
<pre class="shell"><code>cd gapminder/data/ # navigate to the data directory of the project folder
head gapminder-FiveYearData.csv </code></pre>
<pre class="output"><code>country,year,pop,continent,lifeExp,gdpPercap
Afghanistan,1952,8425333,Asia,28.801,779.4453145
Afghanistan,1957,9240934,Asia,30.332,820.8530296
Afghanistan,1962,10267083,Asia,31.997,853.10071
Afghanistan,1967,11537966,Asia,34.02,836.1971382
Afghanistan,1972,13079460,Asia,36.088,739.9811058
Afghanistan,1977,14880372,Asia,38.438,786.11336
Afghanistan,1982,12881816,Asia,39.854,978.0114388
Afghanistan,1987,13867957,Asia,40.822,852.3959448
Afghanistan,1992,16317921,Asia,41.674,649.3413952</code></pre>
<p>As its file extension would suggest, the file contains comma-separated values, and seems to contain a header row.</p>
<p>We can use <code>read.table</code> to read this into R</p>
<pre class="sourceCode r"><code class="sourceCode r">gapminder <-<span class="st"> </span><span class="kw">read.table</span>(
<span class="dt">file=</span><span class="st">"data/gapminder-FiveYearData.csv"</span>,
<span class="dt">header=</span><span class="ot">TRUE</span>, <span class="dt">sep=</span><span class="st">","</span>
)
<span class="kw">head</span>(gapminder)</code></pre>
<pre class="output"><code> country year pop continent lifeExp gdpPercap
1 Afghanistan 1952 8425333 Asia 28.801 779.4453
2 Afghanistan 1957 9240934 Asia 30.332 820.8530
3 Afghanistan 1962 10267083 Asia 31.997 853.1007
4 Afghanistan 1967 11537966 Asia 34.020 836.1971
5 Afghanistan 1972 13079460 Asia 36.088 739.9811
6 Afghanistan 1977 14880372 Asia 38.438 786.1134</code></pre>
<p>Because we know the structure of the data, we're able to specify the appropriate arguments to <code>read.table</code>. Without these arguments, <code>read.table</code> will do its best to do something sensible, but it is always more reliable to explicitly tell <code>read.table</code> the structure of the data. <code>read.csv</code> function provides a convenient shortcut for loading in CSV files.</p>
<div id="miscellaneous-tips" class="callout">
<h4>Miscellaneous Tips</h4>
<ol style="list-style-type: decimal">
<li><p>Another type of file you might encounter are tab-separated format. To specify a tab as a separator, use <code>"\t"</code>.</p></li>
<li><p>You can also read in files from the internet by replacing the file paths with a web address.</p></li>
<li><p>You can read directly from excel spreadsheets without converting them to plain text by using the <code>xlsx</code> package.</p></li>
</ol>
</div>
<p>To make sure our analysis is reproducible, we should put the code into a script file so we can come back to it later.</p>
<div id="challenge" class="challenge">
<h4>Challenge</h4>
<p>Go to file -> new file -> R script, and write an R script to load in the gapminder dataset. Put it in the <code>scripts/</code> directory and add it to version control.</p>
<p>Run the script using the <code>source</code> function, using the file path as its argument (or by pressing the "source" button in RStudio).</p>
</div>
</div>
</div>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="http://software-carpentry.org/v5/js/bootstrap/bootstrap.min.js"></script>
</body>
</html>