forked from info201/info201.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdata-tables.html
More file actions
1011 lines (974 loc) · 87 KB
/
data-tables.html
File metadata and controls
1011 lines (974 loc) · 87 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>C Thinking Big: Data Tables | Technical Foundations of Informatics</title>
<meta name="description" content="The course reader for INFO 201: Technical Foundations of Informatics." />
<meta name="generator" content="bookdown 0.13 and GitBook 2.6.7" />
<meta property="og:title" content="C Thinking Big: Data Tables | Technical Foundations of Informatics" />
<meta property="og:type" content="book" />
<meta property="og:url" content="https://info201.github.io/" />
<meta property="og:image" content="https://info201.github.io/img/cover-img.png" />
<meta property="og:description" content="The course reader for INFO 201: Technical Foundations of Informatics." />
<meta name="github-repo" content="info201/book" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="C Thinking Big: Data Tables | Technical Foundations of Informatics" />
<meta name="twitter:description" content="The course reader for INFO 201: Technical Foundations of Informatics." />
<meta name="twitter:image" content="https://info201.github.io/img/cover-img.png" />
<meta name="author" content="Michael Freeman and Joel Ross" />
<meta name="date" content="2019-09-11" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="shortcut icon" href="img/favicon.png" type="image/x-icon" />
<link rel="prev" href="control-structures.html"/>
<link rel="next" href="remote-server.html"/>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-98444716-2', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
{ position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
{ content: attr(data-line-number);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; pointer-events: all; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
<link rel="stylesheet" href="css/style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li><a href="./">Technical Foundations of Informatics</a></li>
<li class="divider"></li>
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>About the Book</a></li>
<li class="chapter" data-level="1" data-path="setup-machine.html"><a href="setup-machine.html"><i class="fa fa-check"></i><b>1</b> Setting up your Machine</a><ul>
<li class="chapter" data-level="1.1" data-path="setup-machine.html"><a href="setup-machine.html#git"><i class="fa fa-check"></i><b>1.1</b> Git</a><ul>
<li class="chapter" data-level="1.1.1" data-path="setup-machine.html"><a href="setup-machine.html#github"><i class="fa fa-check"></i><b>1.1.1</b> GitHub</a></li>
</ul></li>
<li class="chapter" data-level="1.2" data-path="setup-machine.html"><a href="setup-machine.html#command-line-tools-bash"><i class="fa fa-check"></i><b>1.2</b> Command-line Tools (Bash)</a><ul>
<li class="chapter" data-level="1.2.1" data-path="setup-machine.html"><a href="setup-machine.html#command-line-on-a-mac"><i class="fa fa-check"></i><b>1.2.1</b> Command-line on a Mac</a></li>
<li class="chapter" data-level="1.2.2" data-path="setup-machine.html"><a href="setup-machine.html#command-line-on-windows"><i class="fa fa-check"></i><b>1.2.2</b> Command-line on Windows</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="setup-machine.html"><a href="setup-machine.html#text-editors"><i class="fa fa-check"></i><b>1.3</b> Text Editors</a><ul>
<li class="chapter" data-level="1.3.1" data-path="setup-machine.html"><a href="setup-machine.html#atom"><i class="fa fa-check"></i><b>1.3.1</b> Atom</a></li>
<li class="chapter" data-level="1.3.2" data-path="setup-machine.html"><a href="setup-machine.html#visual-studio-code"><i class="fa fa-check"></i><b>1.3.2</b> Visual Studio Code</a></li>
<li class="chapter" data-level="1.3.3" data-path="setup-machine.html"><a href="setup-machine.html#sublime-text"><i class="fa fa-check"></i><b>1.3.3</b> Sublime Text</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="setup-machine.html"><a href="setup-machine.html#r-language"><i class="fa fa-check"></i><b>1.4</b> R Language</a></li>
<li class="chapter" data-level="1.5" data-path="setup-machine.html"><a href="setup-machine.html#rstudio"><i class="fa fa-check"></i><b>1.5</b> RStudio</a></li>
<li class="chapter" data-level="" data-path="setup-machine.html"><a href="setup-machine.html#resources"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="2" data-path="command-line.html"><a href="command-line.html"><i class="fa fa-check"></i><b>2</b> The Command Line</a><ul>
<li class="chapter" data-level="2.1" data-path="command-line.html"><a href="command-line.html#accessing-the-command-line"><i class="fa fa-check"></i><b>2.1</b> Accessing the Command-Line</a></li>
<li class="chapter" data-level="2.2" data-path="command-line.html"><a href="command-line.html#navigating-the-command-line"><i class="fa fa-check"></i><b>2.2</b> Navigating the Command Line</a><ul>
<li class="chapter" data-level="2.2.1" data-path="command-line.html"><a href="command-line.html#changing-directories"><i class="fa fa-check"></i><b>2.2.1</b> Changing Directories</a></li>
<li class="chapter" data-level="2.2.2" data-path="command-line.html"><a href="command-line.html#listing-files"><i class="fa fa-check"></i><b>2.2.2</b> Listing Files</a></li>
<li class="chapter" data-level="2.2.3" data-path="command-line.html"><a href="command-line.html#paths"><i class="fa fa-check"></i><b>2.2.3</b> Paths</a></li>
</ul></li>
<li class="chapter" data-level="2.3" data-path="command-line.html"><a href="command-line.html#file-commands"><i class="fa fa-check"></i><b>2.3</b> File Commands</a><ul>
<li class="chapter" data-level="2.3.1" data-path="command-line.html"><a href="command-line.html#learning-new-commands"><i class="fa fa-check"></i><b>2.3.1</b> Learning New Commands</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="command-line.html"><a href="command-line.html#dealing-with-errors"><i class="fa fa-check"></i><b>2.4</b> Dealing With Errors</a></li>
<li class="chapter" data-level="" data-path="command-line.html"><a href="command-line.html#resources-1"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="3" data-path="markdown.html"><a href="markdown.html"><i class="fa fa-check"></i><b>3</b> Markdown</a><ul>
<li class="chapter" data-level="3.1" data-path="markdown.html"><a href="markdown.html#writing-markdown"><i class="fa fa-check"></i><b>3.1</b> Writing Markdown</a><ul>
<li class="chapter" data-level="3.1.1" data-path="markdown.html"><a href="markdown.html#text-formatting"><i class="fa fa-check"></i><b>3.1.1</b> Text Formatting</a></li>
<li class="chapter" data-level="3.1.2" data-path="markdown.html"><a href="markdown.html#text-blocks"><i class="fa fa-check"></i><b>3.1.2</b> Text Blocks</a></li>
</ul></li>
<li class="chapter" data-level="3.2" data-path="markdown.html"><a href="markdown.html#rendering-markdown"><i class="fa fa-check"></i><b>3.2</b> Rendering Markdown</a></li>
<li class="chapter" data-level="" data-path="markdown.html"><a href="markdown.html#resources-2"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="4" data-path="git-basics.html"><a href="git-basics.html"><i class="fa fa-check"></i><b>4</b> Git and GitHub</a><ul>
<li class="chapter" data-level="4.1" data-path="git-basics.html"><a href="git-basics.html#what-is-this-git-thing-anyway"><i class="fa fa-check"></i><b>4.1</b> What is this <em>git</em> thing anyway?</a><ul>
<li class="chapter" data-level="4.1.1" data-path="git-basics.html"><a href="git-basics.html#git-core-concepts"><i class="fa fa-check"></i><b>4.1.1</b> Git Core Concepts</a></li>
<li class="chapter" data-level="4.1.2" data-path="git-basics.html"><a href="git-basics.html#wait-but-what-is-github-then"><i class="fa fa-check"></i><b>4.1.2</b> Wait, but what is GitHub then?</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="git-basics.html"><a href="git-basics.html#installation-setup"><i class="fa fa-check"></i><b>4.2</b> Installation & Setup</a><ul>
<li class="chapter" data-level="4.2.1" data-path="git-basics.html"><a href="git-basics.html#creating-a-repo"><i class="fa fa-check"></i><b>4.2.1</b> Creating a Repo</a></li>
<li class="chapter" data-level="4.2.2" data-path="git-basics.html"><a href="git-basics.html#checking-status"><i class="fa fa-check"></i><b>4.2.2</b> Checking Status</a></li>
</ul></li>
<li class="chapter" data-level="4.3" data-path="git-basics.html"><a href="git-basics.html#making-changes"><i class="fa fa-check"></i><b>4.3</b> Making Changes</a><ul>
<li class="chapter" data-level="4.3.1" data-path="git-basics.html"><a href="git-basics.html#adding-files"><i class="fa fa-check"></i><b>4.3.1</b> Adding Files</a></li>
<li class="chapter" data-level="4.3.2" data-path="git-basics.html"><a href="git-basics.html#committing"><i class="fa fa-check"></i><b>4.3.2</b> Committing</a></li>
<li class="chapter" data-level="4.3.3" data-path="git-basics.html"><a href="git-basics.html#commit-history"><i class="fa fa-check"></i><b>4.3.3</b> Commit History</a></li>
<li class="chapter" data-level="4.3.4" data-path="git-basics.html"><a href="git-basics.html#reviewing-the-process"><i class="fa fa-check"></i><b>4.3.4</b> Reviewing the Process</a></li>
<li class="chapter" data-level="4.3.5" data-path="git-basics.html"><a href="git-basics.html#gitignore"><i class="fa fa-check"></i><b>4.3.5</b> The <code>.gitignore</code> File</a></li>
</ul></li>
<li class="chapter" data-level="4.4" data-path="git-basics.html"><a href="git-basics.html#github-and-remotes"><i class="fa fa-check"></i><b>4.4</b> GitHub and Remotes</a><ul>
<li class="chapter" data-level="4.4.1" data-path="git-basics.html"><a href="git-basics.html#forking-and-cloning"><i class="fa fa-check"></i><b>4.4.1</b> Forking and Cloning</a></li>
<li class="chapter" data-level="4.4.2" data-path="git-basics.html"><a href="git-basics.html#pushing-and-pulling"><i class="fa fa-check"></i><b>4.4.2</b> Pushing and Pulling</a></li>
<li class="chapter" data-level="4.4.3" data-path="git-basics.html"><a href="git-basics.html#reviewing-the-process-1"><i class="fa fa-check"></i><b>4.4.3</b> Reviewing The Process</a></li>
</ul></li>
<li class="chapter" data-level="4.5" data-path="git-basics.html"><a href="git-basics.html#course-assignments-on-github"><i class="fa fa-check"></i><b>4.5</b> Course Assignments on GitHub</a></li>
<li class="chapter" data-level="4.6" data-path="git-basics.html"><a href="git-basics.html#command-summary"><i class="fa fa-check"></i><b>4.6</b> Command Summary</a></li>
<li class="chapter" data-level="" data-path="git-basics.html"><a href="git-basics.html#resources-3"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="5" data-path="r-intro.html"><a href="r-intro.html"><i class="fa fa-check"></i><b>5</b> Introduction to R</a><ul>
<li class="chapter" data-level="5.1" data-path="r-intro.html"><a href="r-intro.html#programming-with-r"><i class="fa fa-check"></i><b>5.1</b> Programming with R</a></li>
<li class="chapter" data-level="5.2" data-path="r-intro.html"><a href="r-intro.html#running-r-scripts"><i class="fa fa-check"></i><b>5.2</b> Running R Scripts</a><ul>
<li class="chapter" data-level="5.2.1" data-path="r-intro.html"><a href="r-intro.html#running-r-cmd"><i class="fa fa-check"></i><b>5.2.1</b> Command-Line</a></li>
<li class="chapter" data-level="5.2.2" data-path="r-intro.html"><a href="r-intro.html#running-r-rstudio"><i class="fa fa-check"></i><b>5.2.2</b> RStudio</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="r-intro.html"><a href="r-intro.html#comments"><i class="fa fa-check"></i><b>5.3</b> Comments</a></li>
<li class="chapter" data-level="5.4" data-path="r-intro.html"><a href="r-intro.html#variables"><i class="fa fa-check"></i><b>5.4</b> Variables</a><ul>
<li class="chapter" data-level="5.4.1" data-path="r-intro.html"><a href="r-intro.html#basic-data-types"><i class="fa fa-check"></i><b>5.4.1</b> Basic Data Types</a></li>
</ul></li>
<li class="chapter" data-level="5.5" data-path="r-intro.html"><a href="r-intro.html#gettinghelp"><i class="fa fa-check"></i><b>5.5</b> Getting Help</a></li>
<li class="chapter" data-level="" data-path="r-intro.html"><a href="r-intro.html#resources-4"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="6" data-path="functions.html"><a href="functions.html"><i class="fa fa-check"></i><b>6</b> Functions</a><ul>
<li class="chapter" data-level="6.1" data-path="functions.html"><a href="functions.html#what-are-functions"><i class="fa fa-check"></i><b>6.1</b> What are Functions?</a></li>
<li class="chapter" data-level="6.2" data-path="functions.html"><a href="functions.html#how-to-use-functions"><i class="fa fa-check"></i><b>6.2</b> How to Use Functions</a></li>
<li class="chapter" data-level="6.3" data-path="functions.html"><a href="functions.html#built-in-r-functions"><i class="fa fa-check"></i><b>6.3</b> Built-in R Functions</a></li>
<li class="chapter" data-level="6.4" data-path="functions.html"><a href="functions.html#loading-functions"><i class="fa fa-check"></i><b>6.4</b> Loading Functions</a></li>
<li class="chapter" data-level="6.5" data-path="functions.html"><a href="functions.html#writing-functions"><i class="fa fa-check"></i><b>6.5</b> Writing Functions</a></li>
<li class="chapter" data-level="6.6" data-path="functions.html"><a href="functions.html#conditional-statements"><i class="fa fa-check"></i><b>6.6</b> Conditional Statements</a></li>
<li class="chapter" data-level="" data-path="functions.html"><a href="functions.html#resources-5"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="7" data-path="vectors.html"><a href="vectors.html"><i class="fa fa-check"></i><b>7</b> Vectors</a><ul>
<li class="chapter" data-level="7.1" data-path="vectors.html"><a href="vectors.html#what-is-a-vector"><i class="fa fa-check"></i><b>7.1</b> What is a Vector?</a></li>
<li class="chapter" data-level="7.2" data-path="vectors.html"><a href="vectors.html#creating-vectors"><i class="fa fa-check"></i><b>7.2</b> Creating Vectors</a></li>
<li class="chapter" data-level="7.3" data-path="vectors.html"><a href="vectors.html#vector-indices"><i class="fa fa-check"></i><b>7.3</b> Vector Indices</a><ul>
<li class="chapter" data-level="7.3.1" data-path="vectors.html"><a href="vectors.html#simple-numeric-indices"><i class="fa fa-check"></i><b>7.3.1</b> Simple Numeric Indices</a></li>
<li class="chapter" data-level="7.3.2" data-path="vectors.html"><a href="vectors.html#multiple-indices"><i class="fa fa-check"></i><b>7.3.2</b> Multiple Indices</a></li>
<li class="chapter" data-level="7.3.3" data-path="vectors.html"><a href="vectors.html#logical-indexing"><i class="fa fa-check"></i><b>7.3.3</b> Logical Indexing</a></li>
<li class="chapter" data-level="7.3.4" data-path="vectors.html"><a href="vectors.html#named-vectors-and-character-indexing"><i class="fa fa-check"></i><b>7.3.4</b> Named Vectors and Character Indexing</a></li>
</ul></li>
<li class="chapter" data-level="7.4" data-path="vectors.html"><a href="vectors.html#modifying-vectors"><i class="fa fa-check"></i><b>7.4</b> Modifying Vectors</a></li>
<li class="chapter" data-level="7.5" data-path="vectors.html"><a href="vectors.html#vectorized-operations"><i class="fa fa-check"></i><b>7.5</b> Vectorized Operations</a><ul>
<li class="chapter" data-level="7.5.1" data-path="vectors.html"><a href="vectors.html#vectorized-operators"><i class="fa fa-check"></i><b>7.5.1</b> Vectorized Operators</a></li>
<li class="chapter" data-level="7.5.2" data-path="vectors.html"><a href="vectors.html#vectorized-functions"><i class="fa fa-check"></i><b>7.5.2</b> Vectorized Functions</a></li>
<li class="chapter" data-level="7.5.3" data-path="vectors.html"><a href="vectors.html#recycling"><i class="fa fa-check"></i><b>7.5.3</b> Recycling</a></li>
<li class="chapter" data-level="7.5.4" data-path="vectors.html"><a href="vectors.html#r-is-a-vectorized-world"><i class="fa fa-check"></i><b>7.5.4</b> R Is a Vectorized World</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="vectors.html"><a href="vectors.html#resources-6"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="8" data-path="lists.html"><a href="lists.html"><i class="fa fa-check"></i><b>8</b> Lists</a><ul>
<li class="chapter" data-level="8.1" data-path="lists.html"><a href="lists.html#what-is-a-list"><i class="fa fa-check"></i><b>8.1</b> What is a List?</a></li>
<li class="chapter" data-level="8.2" data-path="lists.html"><a href="lists.html#creating-lists"><i class="fa fa-check"></i><b>8.2</b> Creating Lists</a></li>
<li class="chapter" data-level="8.3" data-path="lists.html"><a href="lists.html#accessing-list-elements"><i class="fa fa-check"></i><b>8.3</b> Accessing List Elements</a><ul>
<li class="chapter" data-level="8.3.1" data-path="lists.html"><a href="lists.html#lists-indexing-by-position"><i class="fa fa-check"></i><b>8.3.1</b> Indexing by position</a></li>
<li class="chapter" data-level="8.3.2" data-path="lists.html"><a href="lists.html#indexing-by-name"><i class="fa fa-check"></i><b>8.3.2</b> Indexing by Name</a></li>
<li class="chapter" data-level="8.3.3" data-path="lists.html"><a href="lists.html#indexing-by-logical-vector"><i class="fa fa-check"></i><b>8.3.3</b> Indexing by Logical Vector</a></li>
<li class="chapter" data-level="8.3.4" data-path="lists.html"><a href="lists.html#lists-dollar-shortcut"><i class="fa fa-check"></i><b>8.3.4</b> Extracting named elements with <code>$</code></a></li>
<li class="chapter" data-level="8.3.5" data-path="lists.html"><a href="lists.html#single-vs.double-brackets-vs.dollar"><i class="fa fa-check"></i><b>8.3.5</b> Single vs. Double Brackets vs. Dollar</a></li>
</ul></li>
<li class="chapter" data-level="8.4" data-path="lists.html"><a href="lists.html#modifying-lists"><i class="fa fa-check"></i><b>8.4</b> Modifying Lists</a></li>
<li class="chapter" data-level="8.5" data-path="lists.html"><a href="lists.html#the-lapply-function"><i class="fa fa-check"></i><b>8.5</b> The <code>lapply()</code> Function</a></li>
<li class="chapter" data-level="" data-path="lists.html"><a href="lists.html#resources-7"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="9" data-path="data-frames.html"><a href="data-frames.html"><i class="fa fa-check"></i><b>9</b> Data Frames</a><ul>
<li class="chapter" data-level="9.1" data-path="data-frames.html"><a href="data-frames.html#what-is-a-data-frame"><i class="fa fa-check"></i><b>9.1</b> What is a Data Frame?</a><ul>
<li class="chapter" data-level="9.1.1" data-path="data-frames.html"><a href="data-frames.html#creating-data-frames"><i class="fa fa-check"></i><b>9.1.1</b> Creating Data Frames</a></li>
<li class="chapter" data-level="9.1.2" data-path="data-frames.html"><a href="data-frames.html#describing-structure-of-data-frames"><i class="fa fa-check"></i><b>9.1.2</b> Describing Structure of Data Frames</a></li>
<li class="chapter" data-level="9.1.3" data-path="data-frames.html"><a href="data-frames.html#accessing-data-in-data-frames"><i class="fa fa-check"></i><b>9.1.3</b> Accessing Data in Data Frames</a></li>
</ul></li>
<li class="chapter" data-level="9.2" data-path="data-frames.html"><a href="data-frames.html#csv-files"><i class="fa fa-check"></i><b>9.2</b> Working with CSV Data</a><ul>
<li class="chapter" data-level="9.2.1" data-path="data-frames.html"><a href="data-frames.html#working-directory"><i class="fa fa-check"></i><b>9.2.1</b> Working Directory</a></li>
</ul></li>
<li class="chapter" data-level="9.3" data-path="data-frames.html"><a href="data-frames.html#factors"><i class="fa fa-check"></i><b>9.3</b> Factor Variables</a></li>
<li class="chapter" data-level="" data-path="data-frames.html"><a href="data-frames.html#resources-8"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="10" data-path="dplyr.html"><a href="dplyr.html"><i class="fa fa-check"></i><b>10</b> The <code>dplyr</code> Library</a><ul>
<li class="chapter" data-level="10.1" data-path="dplyr.html"><a href="dplyr.html#a-grammar-of-data-manipulation"><i class="fa fa-check"></i><b>10.1</b> A Grammar of Data Manipulation</a></li>
<li class="chapter" data-level="10.2" data-path="dplyr.html"><a href="dplyr.html#using-dplyr-functions"><i class="fa fa-check"></i><b>10.2</b> Using <code>dplyr</code> Functions</a><ul>
<li class="chapter" data-level="10.2.1" data-path="dplyr.html"><a href="dplyr.html#select"><i class="fa fa-check"></i><b>10.2.1</b> Select</a></li>
<li class="chapter" data-level="10.2.2" data-path="dplyr.html"><a href="dplyr.html#filter"><i class="fa fa-check"></i><b>10.2.2</b> Filter</a></li>
<li class="chapter" data-level="10.2.3" data-path="dplyr.html"><a href="dplyr.html#mutate"><i class="fa fa-check"></i><b>10.2.3</b> Mutate</a></li>
<li class="chapter" data-level="10.2.4" data-path="dplyr.html"><a href="dplyr.html#arrange"><i class="fa fa-check"></i><b>10.2.4</b> Arrange</a></li>
<li class="chapter" data-level="10.2.5" data-path="dplyr.html"><a href="dplyr.html#summarize"><i class="fa fa-check"></i><b>10.2.5</b> Summarize</a></li>
<li class="chapter" data-level="10.2.6" data-path="dplyr.html"><a href="dplyr.html#distinct"><i class="fa fa-check"></i><b>10.2.6</b> Distinct</a></li>
</ul></li>
<li class="chapter" data-level="10.3" data-path="dplyr.html"><a href="dplyr.html#multiple-operations"><i class="fa fa-check"></i><b>10.3</b> Multiple Operations</a><ul>
<li class="chapter" data-level="10.3.1" data-path="dplyr.html"><a href="dplyr.html#the-pipe-operator"><i class="fa fa-check"></i><b>10.3.1</b> The Pipe Operator</a></li>
</ul></li>
<li class="chapter" data-level="10.4" data-path="dplyr.html"><a href="dplyr.html#grouped-operations"><i class="fa fa-check"></i><b>10.4</b> Grouped Operations</a></li>
<li class="chapter" data-level="10.5" data-path="dplyr.html"><a href="dplyr.html#joins"><i class="fa fa-check"></i><b>10.5</b> Joins</a></li>
<li class="chapter" data-level="10.6" data-path="dplyr.html"><a href="dplyr.html#non-standard-evaluation-vs.standard-evaluation"><i class="fa fa-check"></i><b>10.6</b> Non-Standard Evaluation vs. Standard Evaluation</a><ul>
<li class="chapter" data-level="10.6.1" data-path="dplyr.html"><a href="dplyr.html#explicit-standard-evaluation"><i class="fa fa-check"></i><b>10.6.1</b> Explicit Standard Evaluation</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="dplyr.html"><a href="dplyr.html#resources-9"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="11" data-path="apis.html"><a href="apis.html"><i class="fa fa-check"></i><b>11</b> Accessing Web APIs</a><ul>
<li class="chapter" data-level="11.1" data-path="apis.html"><a href="apis.html#what-is-a-web-api"><i class="fa fa-check"></i><b>11.1</b> What is a Web API?</a></li>
<li class="chapter" data-level="11.2" data-path="apis.html"><a href="apis.html#restful-requests"><i class="fa fa-check"></i><b>11.2</b> RESTful Requests</a><ul>
<li class="chapter" data-level="11.2.1" data-path="apis.html"><a href="apis.html#uris"><i class="fa fa-check"></i><b>11.2.1</b> URIs</a></li>
<li class="chapter" data-level="11.2.2" data-path="apis.html"><a href="apis.html#http-verbs"><i class="fa fa-check"></i><b>11.2.2</b> HTTP Verbs</a></li>
</ul></li>
<li class="chapter" data-level="11.3" data-path="apis.html"><a href="apis.html#accessing-web-apis"><i class="fa fa-check"></i><b>11.3</b> Accessing Web APIs</a></li>
<li class="chapter" data-level="11.4" data-path="apis.html"><a href="apis.html#json"><i class="fa fa-check"></i><b>11.4</b> JSON Data</a><ul>
<li class="chapter" data-level="11.4.1" data-path="apis.html"><a href="apis.html#parsing-json"><i class="fa fa-check"></i><b>11.4.1</b> Parsing JSON</a></li>
<li class="chapter" data-level="11.4.2" data-path="apis.html"><a href="apis.html#flattening-data"><i class="fa fa-check"></i><b>11.4.2</b> Flattening Data</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="apis.html"><a href="apis.html#resources-10"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="12" data-path="r-markdown.html"><a href="r-markdown.html"><i class="fa fa-check"></i><b>12</b> R Markdown</a><ul>
<li class="chapter" data-level="12.1" data-path="r-markdown.html"><a href="r-markdown.html#r-markdown-and-rstudio"><i class="fa fa-check"></i><b>12.1</b> R Markdown and RStudio</a><ul>
<li class="chapter" data-level="12.1.1" data-path="r-markdown.html"><a href="r-markdown.html#creating-.rmd-files"><i class="fa fa-check"></i><b>12.1.1</b> Creating <code>.Rmd</code> Files</a></li>
<li class="chapter" data-level="12.1.2" data-path="r-markdown.html"><a href="r-markdown.html#rmd-content"><i class="fa fa-check"></i><b>12.1.2</b> <code>.Rmd</code> Content</a></li>
<li class="chapter" data-level="12.1.3" data-path="r-markdown.html"><a href="r-markdown.html#knitting-documents"><i class="fa fa-check"></i><b>12.1.3</b> Knitting Documents</a></li>
<li class="chapter" data-level="12.1.4" data-path="r-markdown.html"><a href="r-markdown.html#html"><i class="fa fa-check"></i><b>12.1.4</b> HTML</a></li>
</ul></li>
<li class="chapter" data-level="12.2" data-path="r-markdown.html"><a href="r-markdown.html#r-markdown-syntax"><i class="fa fa-check"></i><b>12.2</b> R Markdown Syntax</a><ul>
<li class="chapter" data-level="12.2.1" data-path="r-markdown.html"><a href="r-markdown.html#r-code-chunks"><i class="fa fa-check"></i><b>12.2.1</b> R Code Chunks</a></li>
<li class="chapter" data-level="12.2.2" data-path="r-markdown.html"><a href="r-markdown.html#inline-code"><i class="fa fa-check"></i><b>12.2.2</b> Inline Code</a></li>
</ul></li>
<li class="chapter" data-level="12.3" data-path="r-markdown.html"><a href="r-markdown.html#rendering-data"><i class="fa fa-check"></i><b>12.3</b> Rendering Data</a><ul>
<li class="chapter" data-level="12.3.1" data-path="r-markdown.html"><a href="r-markdown.html#rendering-strings"><i class="fa fa-check"></i><b>12.3.1</b> Rendering Strings</a></li>
<li class="chapter" data-level="12.3.2" data-path="r-markdown.html"><a href="r-markdown.html#rendering-lists"><i class="fa fa-check"></i><b>12.3.2</b> Rendering Lists</a></li>
<li class="chapter" data-level="12.3.3" data-path="r-markdown.html"><a href="r-markdown.html#rendering-tables"><i class="fa fa-check"></i><b>12.3.3</b> Rendering Tables</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="r-markdown.html"><a href="r-markdown.html#resources-11"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="13" data-path="ggplot2.html"><a href="ggplot2.html"><i class="fa fa-check"></i><b>13</b> The <code>gglot2</code> Library</a><ul>
<li class="chapter" data-level="13.1" data-path="ggplot2.html"><a href="ggplot2.html#a-grammar-of-graphics"><i class="fa fa-check"></i><b>13.1</b> A Grammar of Graphics</a></li>
<li class="chapter" data-level="13.2" data-path="ggplot2.html"><a href="ggplot2.html#basic-plotting-with-ggplot2"><i class="fa fa-check"></i><b>13.2</b> Basic Plotting with <code>ggplot2</code></a><ul>
<li class="chapter" data-level="13.2.1" data-path="ggplot2.html"><a href="ggplot2.html#ggplot2-library"><i class="fa fa-check"></i><b>13.2.1</b> <em>ggplot2</em> library</a></li>
<li class="chapter" data-level="13.2.2" data-path="ggplot2.html"><a href="ggplot2.html#mpg-data"><i class="fa fa-check"></i><b>13.2.2</b> <em>mpg</em> data</a></li>
<li class="chapter" data-level="13.2.3" data-path="ggplot2.html"><a href="ggplot2.html#our-first-ggplot"><i class="fa fa-check"></i><b>13.2.3</b> Our first ggplot</a></li>
<li class="chapter" data-level="13.2.4" data-path="ggplot2.html"><a href="ggplot2.html#aesthetic-mappings"><i class="fa fa-check"></i><b>13.2.4</b> Aesthetic Mappings</a></li>
</ul></li>
<li class="chapter" data-level="13.3" data-path="ggplot2.html"><a href="ggplot2.html#complex-plots"><i class="fa fa-check"></i><b>13.3</b> Complex Plots</a><ul>
<li class="chapter" data-level="13.3.1" data-path="ggplot2.html"><a href="ggplot2.html#specifying-geometry"><i class="fa fa-check"></i><b>13.3.1</b> Specifying Geometry</a></li>
<li class="chapter" data-level="13.3.2" data-path="ggplot2.html"><a href="ggplot2.html#styling-with-scales"><i class="fa fa-check"></i><b>13.3.2</b> Styling with Scales</a></li>
<li class="chapter" data-level="13.3.3" data-path="ggplot2.html"><a href="ggplot2.html#coordinate-systems"><i class="fa fa-check"></i><b>13.3.3</b> Coordinate Systems</a></li>
<li class="chapter" data-level="13.3.4" data-path="ggplot2.html"><a href="ggplot2.html#facets"><i class="fa fa-check"></i><b>13.3.4</b> Facets</a></li>
<li class="chapter" data-level="13.3.5" data-path="ggplot2.html"><a href="ggplot2.html#labels-annotations"><i class="fa fa-check"></i><b>13.3.5</b> Labels & Annotations</a></li>
</ul></li>
<li class="chapter" data-level="13.4" data-path="ggplot2.html"><a href="ggplot2.html#plotting-in-scripts"><i class="fa fa-check"></i><b>13.4</b> Plotting in Scripts</a></li>
<li class="chapter" data-level="13.5" data-path="ggplot2.html"><a href="ggplot2.html#other-visualization-libraries"><i class="fa fa-check"></i><b>13.5</b> Other Visualization Libraries</a></li>
<li class="chapter" data-level="" data-path="ggplot2.html"><a href="ggplot2.html#resources-12"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="14" data-path="git-branches.html"><a href="git-branches.html"><i class="fa fa-check"></i><b>14</b> Git Branches</a><ul>
<li class="chapter" data-level="14.1" data-path="git-branches.html"><a href="git-branches.html#git-branches-1"><i class="fa fa-check"></i><b>14.1</b> Git Branches</a></li>
<li class="chapter" data-level="14.2" data-path="git-branches.html"><a href="git-branches.html#merging"><i class="fa fa-check"></i><b>14.2</b> Merging</a><ul>
<li class="chapter" data-level="14.2.1" data-path="git-branches.html"><a href="git-branches.html#merge-conflicts"><i class="fa fa-check"></i><b>14.2.1</b> Merge Conflicts</a></li>
</ul></li>
<li class="chapter" data-level="14.3" data-path="git-branches.html"><a href="git-branches.html#undoing-changes"><i class="fa fa-check"></i><b>14.3</b> Undoing Changes</a></li>
<li class="chapter" data-level="14.4" data-path="git-branches.html"><a href="git-branches.html#github-and-branches"><i class="fa fa-check"></i><b>14.4</b> GitHub and Branches</a><ul>
<li class="chapter" data-level="14.4.1" data-path="git-branches.html"><a href="git-branches.html#github-pages"><i class="fa fa-check"></i><b>14.4.1</b> GitHub Pages</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="git-branches.html"><a href="git-branches.html#resources-13"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="15" data-path="git-collaboration.html"><a href="git-collaboration.html"><i class="fa fa-check"></i><b>15</b> Git Collaboration</a><ul>
<li class="chapter" data-level="15.1" data-path="git-collaboration.html"><a href="git-collaboration.html#centralized-workflow"><i class="fa fa-check"></i><b>15.1</b> Centralized Workflow</a></li>
<li class="chapter" data-level="15.2" data-path="git-collaboration.html"><a href="git-collaboration.html#feature-branch-workflow"><i class="fa fa-check"></i><b>15.2</b> Feature Branch Workflow</a></li>
<li class="chapter" data-level="15.3" data-path="git-collaboration.html"><a href="git-collaboration.html#forking-workflow"><i class="fa fa-check"></i><b>15.3</b> Forking Workflow</a><ul>
<li class="chapter" data-level="15.3.1" data-path="git-collaboration.html"><a href="git-collaboration.html#pull-requests"><i class="fa fa-check"></i><b>15.3.1</b> Pull Requests</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="git-collaboration.html"><a href="git-collaboration.html#resources-14"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="16" data-path="shiny.html"><a href="shiny.html"><i class="fa fa-check"></i><b>16</b> The <code>shiny</code> Framework</a><ul>
<li class="chapter" data-level="16.1" data-path="shiny.html"><a href="shiny.html#creating-shiny-apps"><i class="fa fa-check"></i><b>16.1</b> Creating Shiny Apps</a><ul>
<li class="chapter" data-level="16.1.1" data-path="shiny.html"><a href="shiny.html#application-structure"><i class="fa fa-check"></i><b>16.1.1</b> Application Structure</a></li>
<li class="chapter" data-level="16.1.2" data-path="shiny.html"><a href="shiny.html#the-ui"><i class="fa fa-check"></i><b>16.1.2</b> The UI</a></li>
<li class="chapter" data-level="16.1.3" data-path="shiny.html"><a href="shiny.html#the-server"><i class="fa fa-check"></i><b>16.1.3</b> The Server</a></li>
</ul></li>
<li class="chapter" data-level="16.2" data-path="shiny.html"><a href="shiny.html#publishing-shiny-apps"><i class="fa fa-check"></i><b>16.2</b> Publishing Shiny Apps</a></li>
<li class="chapter" data-level="" data-path="shiny.html"><a href="shiny.html#resources-15"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="appendix"><span><b>Special Topics</b></span></li>
<li class="chapter" data-level="A" data-path="plotly.html"><a href="plotly.html"><i class="fa fa-check"></i><b>A</b> Plotly</a><ul>
<li class="chapter" data-level="A.1" data-path="plotly.html"><a href="plotly.html#getting-started"><i class="fa fa-check"></i><b>A.1</b> Getting Started</a></li>
<li class="chapter" data-level="A.2" data-path="plotly.html"><a href="plotly.html#basic-charts"><i class="fa fa-check"></i><b>A.2</b> Basic Charts</a></li>
<li class="chapter" data-level="A.3" data-path="plotly.html"><a href="plotly.html#layout"><i class="fa fa-check"></i><b>A.3</b> Layout</a></li>
<li class="chapter" data-level="A.4" data-path="plotly.html"><a href="plotly.html#hovers"><i class="fa fa-check"></i><b>A.4</b> Hovers</a></li>
<li class="chapter" data-level="" data-path="plotly.html"><a href="plotly.html#resources-16"><i class="fa fa-check"></i>Resources</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="control-structures.html"><a href="control-structures.html"><i class="fa fa-check"></i><b>B</b> R Language Control Structures</a><ul>
<li class="chapter" data-level="B.1" data-path="control-structures.html"><a href="control-structures.html#loops"><i class="fa fa-check"></i><b>B.1</b> Loops</a><ul>
<li class="chapter" data-level="B.1.1" data-path="control-structures.html"><a href="control-structures.html#for-loop"><i class="fa fa-check"></i><b>B.1.1</b> For Loop</a></li>
<li class="chapter" data-level="B.1.2" data-path="control-structures.html"><a href="control-structures.html#while-loop"><i class="fa fa-check"></i><b>B.1.2</b> While-loop</a></li>
<li class="chapter" data-level="B.1.3" data-path="control-structures.html"><a href="control-structures.html#repeat-loop"><i class="fa fa-check"></i><b>B.1.3</b> repeat-loop</a></li>
<li class="chapter" data-level="B.1.4" data-path="control-structures.html"><a href="control-structures.html#leaving-early-break-and-next"><i class="fa fa-check"></i><b>B.1.4</b> Leaving Early: <code>break</code> and <code>next</code></a></li>
<li class="chapter" data-level="B.1.5" data-path="control-structures.html"><a href="control-structures.html#when-not-to-use-loops-in-r"><i class="fa fa-check"></i><b>B.1.5</b> When (Not) To Use Loops In R?</a></li>
</ul></li>
<li class="chapter" data-level="B.2" data-path="control-structures.html"><a href="control-structures.html#more-about-if-and-else"><i class="fa fa-check"></i><b>B.2</b> More about <code>if</code> and <code>else</code></a><ul>
<li class="chapter" data-level="B.2.1" data-path="control-structures.html"><a href="control-structures.html#where-to-put-else"><i class="fa fa-check"></i><b>B.2.1</b> Where To Put Else</a></li>
<li class="chapter" data-level="B.2.2" data-path="control-structures.html"><a href="control-structures.html#return-value"><i class="fa fa-check"></i><b>B.2.2</b> Return Value</a></li>
</ul></li>
<li class="chapter" data-level="B.3" data-path="control-structures.html"><a href="control-structures.html#switch-choosing-between-multiple-conditions"><i class="fa fa-check"></i><b>B.3</b> <code>switch</code>: Choosing Between Multiple Conditions</a></li>
</ul></li>
<li class="chapter" data-level="C" data-path="data-tables.html"><a href="data-tables.html"><i class="fa fa-check"></i><b>C</b> Thinking Big: Data Tables</a><ul>
<li class="chapter" data-level="C.1" data-path="data-tables.html"><a href="data-tables.html#background-passing-by-value-and-passing-by-reference"><i class="fa fa-check"></i><b>C.1</b> Background: Passing By Value And Passing By Reference</a></li>
<li class="chapter" data-level="C.2" data-path="data-tables.html"><a href="data-tables.html#data-tables-introduction"><i class="fa fa-check"></i><b>C.2</b> Data Tables: Introduction</a><ul>
<li class="chapter" data-level="C.2.1" data-path="data-tables.html"><a href="data-tables.html#replacement-for-data-frames-sort-of"><i class="fa fa-check"></i><b>C.2.1</b> Replacement for Data Frames (Sort of)</a></li>
<li class="chapter" data-level="C.2.2" data-path="data-tables.html"><a href="data-tables.html#fast-reading-and-writing"><i class="fa fa-check"></i><b>C.2.2</b> Fast Reading and Writing</a></li>
</ul></li>
<li class="chapter" data-level="C.3" data-path="data-tables.html"><a href="data-tables.html#indexing-the-major-powerhorse-of-data-tables"><i class="fa fa-check"></i><b>C.3</b> Indexing: The Major Powerhorse of Data Tables</a><ul>
<li class="chapter" data-level="C.3.1" data-path="data-tables.html"><a href="data-tables.html#i-select-observations"><i class="fa fa-check"></i><b>C.3.1</b> i: Select Observations</a></li>
<li class="chapter" data-level="C.3.2" data-path="data-tables.html"><a href="data-tables.html#j-work-with-columns"><i class="fa fa-check"></i><b>C.3.2</b> j: Work with Columns</a></li>
<li class="chapter" data-level="C.3.3" data-path="data-tables.html"><a href="data-tables.html#group-in-by"><i class="fa fa-check"></i><b>C.3.3</b> Group in <code>by</code></a></li>
</ul></li>
<li class="chapter" data-level="C.4" data-path="data-tables.html"><a href="data-tables.html#create-variables-by-reference"><i class="fa fa-check"></i><b>C.4</b> <code>:=</code>–Create variables by reference</a></li>
<li class="chapter" data-level="C.5" data-path="data-tables.html"><a href="data-tables.html#keys"><i class="fa fa-check"></i><b>C.5</b> keys</a></li>
<li class="chapter" data-level="C.6" data-path="data-tables.html"><a href="data-tables.html#resources-17"><i class="fa fa-check"></i><b>C.6</b> Resources</a></li>
</ul></li>
<li class="chapter" data-level="D" data-path="remote-server.html"><a href="remote-server.html"><i class="fa fa-check"></i><b>D</b> Using Remote Server</a><ul>
<li class="chapter" data-level="D.1" data-path="remote-server.html"><a href="remote-server.html#server-setup"><i class="fa fa-check"></i><b>D.1</b> Server Setup</a></li>
<li class="chapter" data-level="D.2" data-path="remote-server.html"><a href="remote-server.html#connecting-to-the-remote-server"><i class="fa fa-check"></i><b>D.2</b> Connecting to the Remote Server</a></li>
<li class="chapter" data-level="D.3" data-path="remote-server.html"><a href="remote-server.html#copying-files"><i class="fa fa-check"></i><b>D.3</b> Copying Files</a><ul>
<li class="chapter" data-level="D.3.1" data-path="remote-server.html"><a href="remote-server.html#scp"><i class="fa fa-check"></i><b>D.3.1</b> scp</a></li>
<li class="chapter" data-level="D.3.2" data-path="remote-server.html"><a href="remote-server.html#rsync"><i class="fa fa-check"></i><b>D.3.2</b> rsync</a></li>
<li class="chapter" data-level="D.3.3" data-path="remote-server.html"><a href="remote-server.html#graphical-frontends"><i class="fa fa-check"></i><b>D.3.3</b> Graphical Frontends</a></li>
<li class="chapter" data-level="D.3.4" data-path="remote-server.html"><a href="remote-server.html#remote-editing"><i class="fa fa-check"></i><b>D.3.4</b> Remote Editing</a></li>
</ul></li>
<li class="chapter" data-level="D.4" data-path="remote-server.html"><a href="remote-server.html#r-and-rscript"><i class="fa fa-check"></i><b>D.4</b> R and Rscript</a><ul>
<li class="chapter" data-level="D.4.1" data-path="remote-server.html"><a href="remote-server.html#graphics-output-with-no-gui"><i class="fa fa-check"></i><b>D.4.1</b> Graphics Output with No GUI</a></li>
</ul></li>
<li class="chapter" data-level="D.5" data-path="remote-server.html"><a href="remote-server.html#life-on-server"><i class="fa fa-check"></i><b>D.5</b> Life on Server</a><ul>
<li class="chapter" data-level="D.5.1" data-path="remote-server.html"><a href="remote-server.html#be-social"><i class="fa fa-check"></i><b>D.5.1</b> Be Social!</a></li>
<li class="chapter" data-level="D.5.2" data-path="remote-server.html"><a href="remote-server.html#useful-things-to-do"><i class="fa fa-check"></i><b>D.5.2</b> Useful Things to Do</a></li>
<li class="chapter" data-level="D.5.3" data-path="remote-server.html"><a href="remote-server.html#permissions-and-ownership"><i class="fa fa-check"></i><b>D.5.3</b> Permissions and ownership</a></li>
<li class="chapter" data-level="D.5.4" data-path="remote-server.html"><a href="remote-server.html#more-than-one-connection"><i class="fa fa-check"></i><b>D.5.4</b> More than One Connection</a></li>
</ul></li>
<li class="chapter" data-level="D.6" data-path="remote-server.html"><a href="remote-server.html#advanced-usage"><i class="fa fa-check"></i><b>D.6</b> Advanced Usage</a><ul>
<li class="chapter" data-level="D.6.1" data-path="remote-server.html"><a href="remote-server.html#ssh-keys-.sshconfig"><i class="fa fa-check"></i><b>D.6.1</b> ssh keys, .ssh/config</a></li>
<li class="chapter" data-level="D.6.2" data-path="remote-server.html"><a href="remote-server.html#more-about-command-line-pipes-and-shell-patterns"><i class="fa fa-check"></i><b>D.6.2</b> More about command line: pipes and shell patterns</a></li>
<li class="chapter" data-level="D.6.3" data-path="remote-server.html"><a href="remote-server.html#running-rscript-in-ssh-session"><i class="fa fa-check"></i><b>D.6.3</b> Running RScript in ssh Session</a></li>
</ul></li>
</ul></li>
<li class="divider"></li>
<li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Technical Foundations of Informatics</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<div id="data-tables" class="section level1">
<h1><span class="header-section-number">C</span> Thinking Big: Data Tables</h1>
<p>Data frames are core elements for data handling in R. However, they
suffer from several limitations. One of the major issue with data
frames is that they are memory hungry and slow. This is not an issue
when working with relatively small datasets (say, up to 100,000
rows). However, when your dataset size exceed gigabytes, dataframes
may be infeasibly slow and too memory hungry.</p>
<div id="background-passing-by-value-and-passing-by-reference" class="section level2">
<h2><span class="header-section-number">C.1</span> Background: Passing By Value And Passing By Reference</h2>
<p>R is (mostly) a pass-by-value language. This means that when you
modify the data, at every step a new copy of the complete modified
object is created, stored in memory, and the former object is freed
(carbage-collected) if not in use any more.</p>
<p>The main advantage of this approach is consistency: we have the
guarantee that functions do not modify their inputs. However, in case
of large objects, copying may be slow, and even more, it requires
at least twice as much memory before the old object is freed. In case
of more complex process pipelines, the memory consumption may be even more
than twice of the size of the original object.</p>
<p>Data tables implement a number of pass-by-reference functions. In
pass-by-reference, the function is not given a fresh copy of the
inputs, but is instead told where the object is in memory. Instead of
copying gigabytes of data, only a single tiny memory pointer is
passed. But this also means the function now is accessing and
modifying the original object, not a copy of it. This may sometimes
lead to bugs and unexpected behavior, but professional use of
pass-by-reference approach may improve the speed and lower the memory
footprint substantially.</p>
</div>
<div id="data-tables-introduction" class="section level2">
<h2><span class="header-section-number">C.2</span> Data Tables: Introduction</h2>
<p>Data tables and most of the related goodies live in <em>data.table</em>
library, so you either have to load the library or specify the
namespace when using the functions.</p>
<div id="replacement-for-data-frames-sort-of" class="section level3">
<h3><span class="header-section-number">C.2.1</span> Replacement for Data Frames (Sort of)</h3>
<p>Data tables are designed to be largely a replacement to data frames.
The syntax is similar and they are largely replaceable. For instance,
we can create and play with a data table as</p>
<div class="sourceCode" id="cb283"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb283-1" data-line-number="1"><span class="kw">library</span>(data.table)</a>
<a class="sourceLine" id="cb283-2" data-line-number="2">dt <-<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">id=</span><span class="dv">1</span><span class="op">:</span><span class="dv">5</span>, <span class="dt">x=</span><span class="kw">rnorm</span>(<span class="dv">5</span>), <span class="dt">y=</span><span class="kw">runif</span>(<span class="dv">5</span>))</a>
<a class="sourceLine" id="cb283-3" data-line-number="3">dt</a></code></pre></div>
<pre><code>## id x y
## 1: 1 -0.1759081 0.3587206
## 2: 2 -1.0397919 0.7036122
## 3: 3 1.3415712 0.6285581
## 4: 4 -0.7327195 0.6126032
## 5: 5 -1.7042843 0.7579853</code></pre>
<p>The result looks almost identical to a similar data frame (the only
difference are the colons after the row numbers). Behind the scenes
these objects are almost identical too–both objects are lists of
vectors. This structural similarity allows to use data tables
as drop-in replacements for dataframes, at least in some
circumstances. For instance, we can extract variables with <code>$</code>:</p>
<div class="sourceCode" id="cb285"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb285-1" data-line-number="1">dt<span class="op">$</span>x</a></code></pre></div>
<pre><code>## [1] -0.1759081 -1.0397919 1.3415712 -0.7327195 -1.7042843</code></pre>
<p>or rows with row indices:</p>
<div class="sourceCode" id="cb287"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb287-1" data-line-number="1">dt[<span class="kw">c</span>(<span class="dv">2</span>,<span class="dv">4</span>),]</a></code></pre></div>
<pre><code>## id x y
## 1: 2 -1.0397919 0.7036122
## 2: 4 -0.7327195 0.6126032</code></pre>
<p>However, data tables use unquoted variables names (like <em>dplyr</em>) by
default:</p>
<div class="sourceCode" id="cb289"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb289-1" data-line-number="1">dt[,x]</a></code></pre></div>
<pre><code>## [1] -0.1759081 -1.0397919 1.3415712 -0.7327195 -1.7042843</code></pre>
<p>In case we need to store the variable name into another variable, with
have to use the additional argument <code>with</code>:</p>
<div class="sourceCode" id="cb291"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb291-1" data-line-number="1">var <-<span class="st"> "x"</span></a>
<a class="sourceLine" id="cb291-2" data-line-number="2">dt[, var, with=<span class="ot">FALSE</span>]</a></code></pre></div>
<pre><code>## x
## 1: -0.1759081
## 2: -1.0397919
## 3: 1.3415712
## 4: -0.7327195
## 5: -1.7042843</code></pre>
<p>Note also that instead of getting a vector, now we get a data.table
with a single column “x” in the first. This behavior is the main culprit that when
replacing data frames with data tables one may need to change quite a
bit of code.</p>
</div>
<div id="fast-reading-and-writing" class="section level3">
<h3><span class="header-section-number">C.2.2</span> Fast Reading and Writing</h3>
<p>Many data frame users may appreciate the fact that the data
input-output function <code>fread</code> and <code>fwrite</code> run at least a magnitude
faster on large files. These are largely replacement for <code>read.table</code>
and <code>write.table</code>, however they syntax differs noticeably in places.
In particular, <code>fread</code> accepts either a file name, http-url, or a <em>shell command
that prints output</em>; it automatically detects the column separator,
but it
does not automatically open compressed files. The latter is not a big
deal when using unix where one can just issue</p>
<div class="sourceCode" id="cb293"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb293-1" data-line-number="1">data <-<span class="st"> </span><span class="kw">fread</span>(<span class="st">"bzcat data.csv.bz2"</span>)</a></code></pre></div>
<p>However, the decompression is not that simple on windows and hence it
is hard to write platform-independent code that opens compressed
files.<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a></p>
<p>If your computer has enough memory and speed is not an issue, your
interest for data tables may end here. You can just transform data
table into a data frame with <code>setDF</code> (and the other way around with <code>setDT</code>). Let’s transform our data table to data
frame:</p>
<div class="sourceCode" id="cb294"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb294-1" data-line-number="1"><span class="kw">setDF</span>(dt)</a>
<a class="sourceLine" id="cb294-2" data-line-number="2">dt</a></code></pre></div>
<pre><code>## id x y
## 1 1 -0.1759081 0.3587206
## 2 2 -1.0397919 0.7036122
## 3 3 1.3415712 0.6285581
## 4 4 -0.7327195 0.6126032
## 5 5 -1.7042843 0.7579853</code></pre>
<p>Do you see that the colons after row names are gone? This means <code>dt</code>
now is a data frame.</p>
<p>Note that this function behaves very differently from what we have
learned earlier: it modifies the object <em>in place</em> (by reference). We
do not have to assign the result into a new variable using a construct
like <code>df <- setDF(dt)</code> (but we still can write like this, handy when
using magrittr pipes). This is a manifestation of the power of
data.tables: the object is not copied but the same object is modified
in memory instead. <code>setDF</code> and <code>setDT</code> are very efficient, even huge
tables are converted instantly with virtually no need for any
additional memory.</p>
<p>However, big powers come hand-in-hand with big responsibility:
it is easy to forget that <code>setDF</code> modifies the function argument.</p>
</div>
</div>
<div id="indexing-the-major-powerhorse-of-data-tables" class="section level2">
<h2><span class="header-section-number">C.3</span> Indexing: The Major Powerhorse of Data Tables</h2>
<p>Data tables’ indexing is much more powerful than that of data frames.
The single-bracket indexing is a powerful (albeit confusing) set of
functions. It’s general syntax is as follows:</p>
<div class="sourceCode" id="cb296"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb296-1" data-line-number="1">dt[i, j, by]</a></code></pre></div>
<p>where <code>i</code> specifies what to do with rows (for instance, select certain
rows), <code>j</code> tells what to do with columns (such as select columns,
compute new columns, aggregate columns), and <code>by</code> contains the
grouping variables.</p>
<p>Let’s demonstrate this with the <em>flights</em> data from <em>nycflights13</em>
package. We load the data and transform it into data.table:</p>
<div class="sourceCode" id="cb297"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb297-1" data-line-number="1"><span class="kw">data</span>(flights, <span class="dt">package=</span><span class="st">"nycflights13"</span>)</a>
<a class="sourceLine" id="cb297-2" data-line-number="2"><span class="kw">setDT</span>(flights)</a>
<a class="sourceLine" id="cb297-3" data-line-number="3"><span class="kw">head</span>(flights)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## 4: 2013 1 1 544 545 -1 1004
## 5: 2013 1 1 554 600 -6 812
## 6: 2013 1 1 554 558 -4 740
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## 4: 1022 -18 B6 725 N804JB JFK BQN 183
## 5: 837 -25 DL 461 N668DN LGA ATL 116
## 6: 728 12 UA 1696 N39463 EWR ORD 150
## distance hour minute time_hour
## 1: 1400 5 15 2013-01-01 05:00:00
## 2: 1416 5 29 2013-01-01 05:00:00
## 3: 1089 5 40 2013-01-01 05:00:00
## 4: 1576 5 45 2013-01-01 05:00:00
## 5: 762 6 0 2013-01-01 06:00:00
## 6: 719 5 58 2013-01-01 05:00:00</code></pre>
<div id="i-select-observations" class="section level3">
<h3><span class="header-section-number">C.3.1</span> i: Select Observations</h3>
<p>Obviously, we can always just tell which observations we want:</p>
<div class="sourceCode" id="cb299"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb299-1" data-line-number="1">flights[<span class="kw">c</span>(<span class="dv">1</span><span class="op">:</span><span class="dv">3</span>),]</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## distance hour minute time_hour
## 1: 1400 5 15 2013-01-01 05:00:00
## 2: 1416 5 29 2013-01-01 05:00:00
## 3: 1089 5 40 2013-01-01 05:00:00</code></pre>
<p>picks the first three lines from the data. Maybe more interestingly,
we can use the special variable <code>.N</code> (the number of rows), to get the
penultimate row:</p>
<div class="sourceCode" id="cb301"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb301-1" data-line-number="1">flights[.N<span class="dv">-1</span>,]</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 9 30 NA 1159 NA NA
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 1344 NA MQ 3572 N511MQ LGA CLE NA
## distance hour minute time_hour
## 1: 419 11 59 2013-09-30 11:00:00</code></pre>
<p>We can select observations with logical index vector in the same way as in data frames:</p>
<div class="sourceCode" id="cb303"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb303-1" data-line-number="1"><span class="kw">head</span>(flights[origin <span class="op">==</span><span class="st"> "EWR"</span> <span class="op">&</span><span class="st"> </span>dest <span class="op">==</span><span class="st"> "SEA"</span>,], <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 724 725 -1 1020
## 2: 2013 1 1 857 851 6 1157
## 3: 2013 1 1 1418 1419 -1 1726
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 1030 -10 AS 11 N594AS EWR SEA 338
## 2: 1222 -25 UA 1670 N45440 EWR SEA 343
## 3: 1732 -6 UA 16 N37464 EWR SEA 348
## distance hour minute time_hour
## 1: 2402 7 25 2013-01-01 07:00:00
## 2: 2402 8 51 2013-01-01 08:00:00
## 3: 2402 14 19 2013-01-01 14:00:00</code></pre>
<p>will create a new data table including only flights from Newark to
Seattle. However, note that we just use <code>origin</code>, and not
<code>flights$origin</code> as were the case with data frames. Data tables
evaluate the arguments as if inside <code>with</code>-function.</p>
<p>The first, integer indexing corresponds to dplyr’s <code>slice</code> function
while the other one is equivalent to <code>filter</code>.</p>
</div>
<div id="j-work-with-columns" class="section level3">
<h3><span class="header-section-number">C.3.2</span> j: Work with Columns</h3>
<p><code>j</code> is perhaps the most powerful (and most confusing) of all arguments
for data table indexing. It allows both to select and do more complex
tasks. Lets start with selection:</p>
<div class="sourceCode" id="cb305"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb305-1" data-line-number="1"><span class="kw">head</span>(flights[, dest], <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## [1] "IAH" "IAH" "MIA"</code></pre>
<p>selects only the <code>dest</code> variable from the data. Note this results in
a vector, not in a single-variable data table. If you want to get
that, you can do</p>
<div class="sourceCode" id="cb307"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb307-1" data-line-number="1"><span class="kw">head</span>(flights[, .(dest)], <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## dest
## 1: IAH
## 2: IAH
## 3: MIA</code></pre>
<p><code>.()</code> is just an alias for <code>list()</code>, encoded differently in data
tables to improve readability and make it easier to type. If we want to select
more that one variable, we can use the latter syntax:</p>
<div class="sourceCode" id="cb309"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb309-1" data-line-number="1"><span class="kw">head</span>(flights[, .(origin, dest)], <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## origin dest
## 1: EWR IAH
## 2: LGA IAH
## 3: JFK MIA</code></pre>
<p>Selection supports a number of goodies, such as ranges of variables
with <code>:</code> (for instance, <code>dep_time:arr_delay</code>) and excluding variables
with <code>!</code> or <code>-</code> (for instance, <code>-year</code>).</p>
<p>Obviously we can combine both <code>i</code> and <code>j</code>: let’s select origin and
departure delay for flights to Seattle:</p>
<div class="sourceCode" id="cb311"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb311-1" data-line-number="1"><span class="kw">head</span>(flights[dest <span class="op">==</span><span class="st"> "SEA"</span>, .(origin, dep_delay)], <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## origin dep_delay
## 1: EWR -1
## 2: JFK 13
## 3: EWR 6</code></pre>
<p>The example so far broadly corresponds to dplyr’s <code>select</code>.</p>
<p>But <code>j</code> is not just for selecting. It is also for computing. Let’s
find the mean arrival delay for flights to Seattle:</p>
<div class="sourceCode" id="cb313"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb313-1" data-line-number="1">flights[dest <span class="op">==</span><span class="st"> "SEA"</span>, <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>)]</a></code></pre></div>
<pre><code>## [1] -1.099099</code></pre>
<p>Several variables can be returned by wrapping, and optionally named,
these in <code>.()</code>. For instance, find the average departure and arrival
delay for all flights to Seattle, given the flight was delayed on
arrival, and name these <code>dep</code> and <code>arr</code>:</p>
<div class="sourceCode" id="cb315"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb315-1" data-line-number="1">flights[dest <span class="op">==</span><span class="st"> "SEA"</span> <span class="op">&</span><span class="st"> </span>arr_delay <span class="op">></span><span class="st"> </span><span class="dv">0</span>,</a>
<a class="sourceLine" id="cb315-2" data-line-number="2"> .(<span class="dt">dep =</span> <span class="kw">mean</span>(dep_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>), <span class="dt">arr =</span> <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>))]</a></code></pre></div>
<pre><code>## dep arr
## 1: 33.98266 39.79984</code></pre>
<p>The result is a data table with two variables.</p>
<p>We can use the special variable <code>.N</code> to count the rows:</p>
<div class="sourceCode" id="cb317"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb317-1" data-line-number="1">flights[dest <span class="op">==</span><span class="st"> "SEA"</span> <span class="op">&</span><span class="st"> </span>arr_delay <span class="op">></span><span class="st"> </span><span class="dv">0</span>, .N]</a></code></pre></div>
<pre><code>## [1] 1269</code></pre>
<p>will tell us how many flights to Seattle were delayed at arrival.</p>
<p>Handling the case where the variable names are stored in other
variables is not that hard, but still adds a layer of complexity. We
can specify the variables in <code>.SDcols</code> parameter. This
parameter determines which columns go into <code>.SD</code> (=Subset Data)
special variable. Afterwards we make an <code>lapply</code> expression in <code>j</code>:</p>
<div class="sourceCode" id="cb319"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb319-1" data-line-number="1">flights[dest <span class="op">==</span><span class="st"> "SEA"</span> <span class="op">&</span><span class="st"> </span>arr_delay <span class="op">></span><span class="st"> </span><span class="dv">0</span>,</a>
<a class="sourceLine" id="cb319-2" data-line-number="2"> <span class="kw">lapply</span>(.SD, <span class="cf">function</span>(x) <span class="kw">mean</span>(x, <span class="dt">na.rm=</span><span class="ot">TRUE</span>)),</a>
<a class="sourceLine" id="cb319-3" data-line-number="3"> .SDcols =<span class="st"> </span><span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>)]</a></code></pre></div>
<pre><code>## arr_delay dep_delay
## 1: 39.79984 33.98266</code></pre>
<p>Let’s repeat: <code>.SDcols</code> determines which variables will go into the
special <code>.SD</code> list (default: all). <code>lapply</code> in <code>j</code> computes mean
values of each of the variables in the <code>.SD</code> list. This procedure
feels complex, although it is internally optimized.</p>
<p>These examples correspond to dplyr’s <code>aggregate</code> function. One can
argue, however, that data tables’ syntax is more confusing and harder
to read. Note also that the functionality data tables offer here is
optimized for speed and memory efficiency but still return a new
object. Aggregation does not work by reference.</p>
</div>
<div id="group-in-by" class="section level3">
<h3><span class="header-section-number">C.3.3</span> Group in <code>by</code></h3>
<p>Finally, all of the above can by computed by groups using <code>by</code>. Let’s
compute the average delays above by carrier and origin:</p>
<div class="sourceCode" id="cb321"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb321-1" data-line-number="1">flights[dest <span class="op">==</span><span class="st"> "SEA"</span> <span class="op">&</span><span class="st"> </span>arr_delay <span class="op">></span><span class="st"> </span><span class="dv">0</span>,</a>
<a class="sourceLine" id="cb321-2" data-line-number="2"> .(<span class="dt">dep =</span> <span class="kw">mean</span>(dep_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>), <span class="dt">arr =</span> <span class="kw">mean</span>(arr_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>)),</a>
<a class="sourceLine" id="cb321-3" data-line-number="3"> by =<span class="st"> </span>.(carrier, origin)]</a></code></pre></div>
<pre><code>## carrier origin dep arr
## 1: DL JFK 29.82373 39.49831
## 2: B6 JFK 28.49767 40.32093
## 3: UA EWR 40.24053 41.85078
## 4: AS EWR 31.80952 34.36508
## 5: AA JFK 34.04132 40.48760</code></pre>
<p>We just had to specify the <code>by</code> argument that lists the grouping
variables. If more than one, these should be wrapped in a list with
<code>.()</code> function.</p>
<p>We can use the <code>.N</code> variable to get the group size. How many flights
did each carrier from each origin?</p>
<div class="sourceCode" id="cb323"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb323-1" data-line-number="1">flights[, .N, by=.(carrier, origin)] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb323-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">3</span>)</a></code></pre></div>
<pre><code>## carrier origin N
## 1: UA EWR 46087
## 2: UA LGA 8044
## 3: AA JFK 13783</code></pre>
<p>Finally, we can also use quoted variables for grouping too just be
replacing <code>.()</code> with <code>c()</code>:</p>
<div class="sourceCode" id="cb325"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb325-1" data-line-number="1">flights[, .N, by=<span class="kw">c</span>(<span class="st">"carrier"</span>, <span class="st">"origin"</span>)] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb325-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">3</span>)</a></code></pre></div>
<pre><code>## carrier origin N
## 1: UA EWR 46087
## 2: UA LGA 8044
## 3: AA JFK 13783</code></pre>
<p>In dplyr context, the examples here include <code>group_by</code> and <code>summarize</code> verbs.</p>
<p>Read more about the basic usage in data.table the vignette <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html">Data analysis using data.table</a>.</p>
</div>
</div>
<div id="create-variables-by-reference" class="section level2">
<h2><span class="header-section-number">C.4</span> <code>:=</code>–Create variables by reference</h2>
<p>While summarizing we compute values in <code>j</code>, these will always create a
new data.table. Reducing operations are not possible to do in-place.
But computing new variables can be done in-place.</p>
<p>In place variable computations (without summarizing) can be done with
<code>:=</code> assignment operator in <code>j</code>. Let’s compute a new
variable–speed–for each flight. We can do this as follows:</p>
<div class="sourceCode" id="cb327"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb327-1" data-line-number="1">flights[, speed <span class="op">:</span><span class="er">=</span><span class="st"> </span>distance<span class="op">/</span>(air_time<span class="op">/</span><span class="dv">60</span>)]</a>
<a class="sourceLine" id="cb327-2" data-line-number="2">flights <span class="op">%>%</span><span class="st"> </span><span class="kw">head</span>(<span class="dv">3</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## distance hour minute time_hour speed
## 1: 1400 5 15 2013-01-01 05:00:00 370.0441
## 2: 1416 5 29 2013-01-01 05:00:00 374.2731
## 3: 1089 5 40 2013-01-01 05:00:00 408.3750</code></pre>
<p>We see the new variable, speed, included as the last variable in the
data. Note we did this operation <em>by reference</em>, i.e. we did not
assign the result to a new data table. The existing table was
modified in place.</p>
<p>The same assignment operator also permits us to remove variables by
setting these to <code>NULL</code>. Let’s remove speed:</p>
<div class="sourceCode" id="cb329"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb329-1" data-line-number="1">flights[, speed <span class="op">:</span><span class="er">=</span><span class="st"> </span><span class="ot">NULL</span>]</a>
<a class="sourceLine" id="cb329-2" data-line-number="2">flights <span class="op">%>%</span><span class="st"> </span><span class="kw">head</span>(<span class="dv">3</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## distance hour minute time_hour
## 1: 1400 5 15 2013-01-01 05:00:00
## 2: 1416 5 29 2013-01-01 05:00:00
## 3: 1089 5 40 2013-01-01 05:00:00</code></pre>
<p>Indeed, there is no speed any more.</p>
<p>Assigning more that one variable by reference may feel somewhat more
intimidating:</p>
<div class="sourceCode" id="cb331"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb331-1" data-line-number="1">flights[, <span class="kw">c</span>(<span class="st">"speed"</span>, <span class="st">"meanDelay"</span>) <span class="op">:</span><span class="er">=</span><span class="st"> </span>.(distance<span class="op">/</span>(air_time<span class="op">/</span><span class="dv">60</span>), (arr_delay <span class="op">+</span><span class="st"> </span>dep_delay)<span class="op">/</span><span class="dv">2</span>)]</a>
<a class="sourceLine" id="cb331-2" data-line-number="2">flights <span class="op">%>%</span><span class="st"> </span><span class="kw">head</span>(<span class="dv">3</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## distance hour minute time_hour speed meanDelay
## 1: 1400 5 15 2013-01-01 05:00:00 370.0441 6.5
## 2: 1416 5 29 2013-01-01 05:00:00 374.2731 12.0
## 3: 1089 5 40 2013-01-01 05:00:00 408.3750 17.5</code></pre>
<p>Assignment works together with both selection and grouping. For
instance, we may want to replace negative delay by zeros:</p>
<div class="sourceCode" id="cb333"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb333-1" data-line-number="1">flights[ arr_delay <span class="op"><</span><span class="st"> </span><span class="dv">0</span>, arr_delay <span class="op">:</span><span class="er">=</span><span class="st"> </span><span class="dv">0</span>][, arr_delay] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb333-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">20</span>)</a></code></pre></div>
<pre><code>## [1] 11 20 33 0 0 12 19 0 0 8 0 0 7 0 31 0 0 0 12 0</code></pre>
<p>Indeed, we only see positive numbers and zeros. But be careful: now
we have overwritten the <code>arr_delay</code> in the original data. We cannot
restore the previous state any more without re-loading the dataset.</p>
<p>As an example of
groupings, let’s compute the maximum departure delay by origin:</p>
<div class="sourceCode" id="cb335"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb335-1" data-line-number="1">flights[, maxDelay <span class="op">:</span><span class="er">=</span><span class="st"> </span><span class="kw">max</span>(dep_delay, <span class="dt">na.rm=</span><span class="ot">TRUE</span>), by=origin] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb335-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">4</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## 4: 2013 1 1 544 545 -1 1004
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## 4: 1022 0 B6 725 N804JB JFK BQN 183
## distance hour minute time_hour speed meanDelay maxDelay
## 1: 1400 5 15 2013-01-01 05:00:00 370.0441 6.5 1126
## 2: 1416 5 29 2013-01-01 05:00:00 374.2731 12.0 911
## 3: 1089 5 40 2013-01-01 05:00:00 408.3750 17.5 1301
## 4: 1576 5 45 2013-01-01 05:00:00 516.7213 -9.5 1301</code></pre>
<p>We can see that <code>by</code> caused the delay to be computed for each group,
however, the data is not summarized, just the max delay is added to
every single row.</p>
<p>Finally, if you <em>do not</em> want to modify the original data, you should
use <code>copy</code> function. This makes a deep copy of the data, and you can
modify the copy afterwards:</p>
<div class="sourceCode" id="cb337"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb337-1" data-line-number="1">fl <-<span class="st"> </span><span class="kw">copy</span>(flights)</a>
<a class="sourceLine" id="cb337-2" data-line-number="2">fl <-<span class="st"> </span>fl[, .(origin, dest)]</a>
<a class="sourceLine" id="cb337-3" data-line-number="3"><span class="kw">head</span>(fl, <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## origin dest
## 1: EWR IAH
## 2: LGA IAH
## 3: JFK MIA</code></pre>
<div class="sourceCode" id="cb339"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb339-1" data-line-number="1"><span class="kw">head</span>(flights, <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## year month day dep_time sched_dep_time dep_delay arr_time
## 1: 2013 1 1 517 515 2 830
## 2: 2013 1 1 533 529 4 850
## 3: 2013 1 1 542 540 2 923
## sched_arr_time arr_delay carrier flight tailnum origin dest air_time
## 1: 819 11 UA 1545 N14228 EWR IAH 227
## 2: 830 20 UA 1714 N24211 LGA IAH 227
## 3: 850 33 AA 1141 N619AA JFK MIA 160
## distance hour minute time_hour speed meanDelay maxDelay
## 1: 1400 5 15 2013-01-01 05:00:00 370.0441 6.5 1126
## 2: 1416 5 29 2013-01-01 05:00:00 374.2731 12.0 911
## 3: 1089 5 40 2013-01-01 05:00:00 408.3750 17.5 1301</code></pre>
<p>As you see, the <code>flights</code> data has not changed.</p>
<p>These operations correspond to the dplyr’s <code>mutate</code> verb. However,
<code>mutate</code> always makes a copy of the original dataset, something that
may well make your analysis slow and sluggish with large data.</p>
<p>Read more in vignette <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html">Data.table reference semantics</a></p>
</div>
<div id="keys" class="section level2">
<h2><span class="header-section-number">C.5</span> keys</h2>
<p>Data tables allow fast lookup based on <em>key</em>. In it’s simplest
version, a key is a column (or several columns) which is used to
pre-sort the data table. Pre-sorting makes it much faster to look up
certain values, perform grouping operations and merges. As data can
only be sorted according to one rule at a time, there can only be one
key on data.table (but a key may be based on several variables).</p>
<p>Let’s set origin and destination as keys for the data table:</p>
<div class="sourceCode" id="cb341"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb341-1" data-line-number="1"><span class="kw">data</span>(flights, <span class="dt">pkg=</span><span class="st">"nycflights13"</span>)</a></code></pre></div>
<pre><code>## Warning in data(flights, pkg = "nycflights13"): data set 'flights' not
## found</code></pre>
<pre><code>## Warning in data(flights, pkg = "nycflights13"): data set 'nycflights13' not
## found</code></pre>
<div class="sourceCode" id="cb344"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb344-1" data-line-number="1"><span class="kw">setDT</span>(flights, <span class="dt">key=</span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>))</a>
<a class="sourceLine" id="cb344-2" data-line-number="2">fl <-<span class="st"> </span>flights[,.(origin, dest, arr_delay)]</a>
<a class="sourceLine" id="cb344-3" data-line-number="3"> <span class="co"># focus on a few variables only</span></a>
<a class="sourceLine" id="cb344-4" data-line-number="4"><span class="kw">head</span>(fl, <span class="dv">3</span>)</a></code></pre></div>
<pre><code>## origin dest arr_delay
## 1: EWR ALB 0
## 2: EWR ALB 40
## 3: EWR ALB 44</code></pre>
<p>We see that both origin and destination are alphabetically ordered.
Note that when selecting variables, the resulting data table <code>fl</code> will
have the same keys as the original one.</p>
<p>When set, we can easily subset by key by just feeding the key values
in <code>i</code>:</p>
<div class="sourceCode" id="cb346"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb346-1" data-line-number="1">fl[<span class="st">"LGA"</span>] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb346-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">5</span>)</a></code></pre></div>
<pre><code>## origin dest arr_delay
## 1: LGA ATL 0
## 2: LGA ATL 12
## 3: LGA ATL 5
## 4: LGA ATL 0
## 5: LGA ATL 17</code></pre>
<p>will extract all LaGuardia-originating flights. In terms of output,
this is equivalent to <code>fl[origin == "LGA"]</code>, just much more
efficient. When you want to
extract flights based on origin-destination pair, you can just add
both key columns:</p>
<div class="sourceCode" id="cb348"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb348-1" data-line-number="1">fl[.(<span class="st">"EWR"</span>, <span class="st">"SEA"</span>)] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb348-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">4</span>)</a></code></pre></div>
<pre><code>## origin dest arr_delay
## 1: EWR SEA 0
## 2: EWR SEA 0
## 3: EWR SEA 0
## 4: EWR SEA 0</code></pre>
<p>Again, this can be achieved in other ways, just keys are more
efficient. Finally, if we want to extract based on the second key,
the syntax is more confusing:</p>
<div class="sourceCode" id="cb350"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb350-1" data-line-number="1">fl[.(<span class="kw">unique</span>(origin), <span class="st">"SEA"</span>)] <span class="op">%>%</span></a>
<a class="sourceLine" id="cb350-2" data-line-number="2"><span class="st"> </span><span class="kw">head</span>(<span class="dv">4</span>)</a></code></pre></div>
<pre><code>## origin dest arr_delay
## 1: EWR SEA 0
## 2: EWR SEA 0
## 3: EWR SEA 0
## 4: EWR SEA 0</code></pre>
<p>We have to tell the <code>[</code> that we want to extract all observations
where the first key is everything, and the second one is “SEA”.</p>
<p>Read more in the vignette <a href="https://cran.r-project.org/web/packages/data.table/vignettes/datatable-keys-fast-subset.html">Keys and fast binary search based subset</a>.</p>
</div>
<div id="resources-17" class="section level2">
<h2><span class="header-section-number">C.6</span> Resources</h2>
<ul>
<li><a href="https://cran.r-project.org/web/packages/data.table/index.html">Data Table CRAN page</a>.
Vignettes are a very valuable source of information.</li>
</ul>
</div>
</div>
<div class="footnotes">
<hr />
<ol start="2">
<li id="fn2"><p>Automatic decompression is a feature request for data tables<a href="data-tables.html#fnref2" class="footnote-back">↩</a></p></li>
</ol>
</div>
</section>
</div>
</div>
</div>
<a href="control-structures.html" class="navigation navigation-prev " aria-label="Previous page"><i class="fa fa-angle-left"></i></a>
<a href="remote-server.html" class="navigation navigation-next " aria-label="Next page"><i class="fa fa-angle-right"></i></a>
</div>
</div>
<script src="libs/gitbook-2.6.7/js/app.min.js"></script>
<script src="libs/gitbook-2.6.7/js/lunr.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-search.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-sharing.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-fontsettings.js"></script>
<script src="libs/gitbook-2.6.7/js/plugin-bookdown.js"></script>
<script src="libs/gitbook-2.6.7/js/jquery.highlight.js"></script>
<script>
gitbook.require(["gitbook"], function(gitbook) {
gitbook.start({
"sharing": {
"github": true,
"facebook": false,
"twitter": false,
"google": false,
"linkedin": false,
"weibo": false,
"instapaper": false,
"vk": false,
"all": ["github", "facebook", "twitter", "google"]
},
"fontsettings": {
"theme": "white",
"family": "sans",
"size": 2
},
"edit": {
"link": "https://github.com/info201/book/edit/master/data-tables.Rmd",
"text": "Edit"
},
"history": {
"link": null,
"text": null
},
"download": null,