-
Notifications
You must be signed in to change notification settings - Fork 36
/
Copy pathdatahandling.tex
6441 lines (4987 loc) · 362 KB
/
datahandling.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
\PassOptionsToPackage{dvipsnames,svgnames,x11names}{xcolor}
%
\documentclass[
12pt,
]{style/krantz}
\usepackage{amsmath,amssymb}
\usepackage{lmodern}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math}
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\setmonofont[Scale=0.7]{Source Code Pro}
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={Data Handling Pocket Reference},
pdfauthor={Ulrich Matter},
colorlinks=true,
linkcolor={Maroon},
filecolor={Maroon},
citecolor={Blue},
urlcolor={Blue},
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\BuiltInTok}[1]{#1}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
\newcommand{\ExtensionTok}[1]{#1}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ImportTok}[1]{#1}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\NormalTok}[1]{#1}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\RegionMarkerTok}[1]{#1}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
\usepackage{float}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage[bf,singlelinecheck=off]{caption}
\usepackage{Alegreya}
\usepackage[scale=.7]{sourcecodepro}
\usepackage{framed,color}
\definecolor{shadecolor}{RGB}{248,248,248}
\renewcommand{\textfraction}{0.05}
\renewcommand{\topfraction}{0.8}
\renewcommand{\bottomfraction}{0.8}
\renewcommand{\floatpagefraction}{0.75}
\renewenvironment{quote}{\begin{VF}}{\end{VF}}
\let\oldhref\href
\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
\ifxetex
\usepackage{letltxmacro}
\setlength{\XeTeXLinkMargin}{1pt}
\LetLtxMacro\SavedIncludeGraphics\includegraphics
\def\includegraphics#1#{% #1 catches optional stuff (star/opt. arg.)
\IncludeGraphicsAux{#1}%
}%
\newcommand*{\IncludeGraphicsAux}[2]{%
\XeTeXLinkBox{%
\SavedIncludeGraphics#1{#2}%
}%
}%
\fi
\makeatletter
\newenvironment{kframe}{%
\medskip{}
\setlength{\fboxsep}{.8em}
\def\at@end@of@kframe{}%
\ifinner\ifhmode%
\def\at@end@of@kframe{\end{minipage}}%
\begin{minipage}{\columnwidth}%
\fi\fi%
\def\FrameCommand##1{\hskip\@totalleftmargin \hskip-\fboxsep
\colorbox{shadecolor}{##1}\hskip-\fboxsep
% There is no \\@totalrightmargin, so:
\hskip-\linewidth \hskip-\@totalleftmargin \hskip\columnwidth}%
\MakeFramed {\advance\hsize-\width
\@totalleftmargin\z@ \linewidth\hsize
\@setminipage}}%
{\par\unskip\endMakeFramed%
\at@end@of@kframe}
\makeatother
\makeatletter
\@ifundefined{Shaded}{
}{\renewenvironment{Shaded}{\begin{kframe}}{\end{kframe}}}
\makeatother
\newenvironment{rmdblock}[1]
{
\begin{itemize}
\renewcommand{\labelitemi}{
\raisebox{-.7\height}[0pt][0pt]{
{\setkeys{Gin}{width=3em,keepaspectratio}\includegraphics{images/#1}}
}
}
\setlength{\fboxsep}{1em}
\begin{kframe}
\item
}
{
\end{kframe}
\end{itemize}
}
\newenvironment{rmdnote}
{\begin{rmdblock}{note}}
{\end{rmdblock}}
\newenvironment{rmdcaution}
{\begin{rmdblock}{caution}}
{\end{rmdblock}}
\newenvironment{rmdimportant}
{\begin{rmdblock}{important}}
{\end{rmdblock}}
\newenvironment{rmdtip}
{\begin{rmdblock}{tip}}
{\end{rmdblock}}
\newenvironment{rmdwarning}
{\begin{rmdblock}{warning}}
{\end{rmdblock}}
\usepackage{makeidx}
\makeindex
\urlstyle{tt}
\usepackage{amsthm}
\makeatletter
\def\thm@space@setup{%
\thm@preskip=8pt plus 2pt minus 4pt
\thm@postskip=\thm@preskip
}
\makeatother
\frontmatter
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\usepackage[]{natbib}
\bibliographystyle{apalike}
\title{Data Handling Pocket Reference}
\author{Ulrich Matter}
\date{2022-10-18}
\begin{document}
\maketitle
% leave a few empty pages before the dedication page
%\cleardoublepage\newpage\thispagestyle{empty}\null
%\cleardoublepage\newpage\thispagestyle{empty}\null
%\cleardoublepage\newpage
\thispagestyle{empty}
\begin{center}
\includegraphics{img/dedication.pdf}
\end{center}
\setlength{\abovedisplayskip}{-5pt}
\setlength{\abovedisplayshortskip}{-5pt}
{
\hypersetup{linkcolor=}
\setcounter{tocdepth}{2}
\tableofcontents
}
\listoffigures
\listoftables
\hypertarget{preface}{%
\chapter*{Preface}\label{preface}}
In applied econometric research and business analytics alike, many steps in handling digital data are involved before running regression estimations and training machine learning algorithms. While often neglected in statistics and econometrics curricula, the steps of gathering, cleaning, storing, and filtering data for research/analytics purposes are of utmost importance to ensure accurate and reproducible analytic insights. This pocket reference first introduces and summarizes foundational and practically relevant data and data processing concepts, and then guides the reader through each step of how to get from the raw data to the final data analysis output.
\hypertarget{prerequisites-and-aims}{%
\section*{Prerequisites and aims}\label{prerequisites-and-aims}}
The book presupposes a basic knowledge of undergraduate economics and statistics and relates several case studies to practical questions in these fields. Finally, the aim is to give you firsthand practical insights into each part of the data science pipeline in the context of business and economics research.
\begin{figure}
\centering
\includegraphics{img/cc.png}
\caption{Creative Commons License}
\end{figure}
The online version of this book is licensed under the \href{http://creativecommons.org/licenses/by-nc-sa/4.0/}{Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License}.
\begin{flushright}
Ulrich Matter
St.~Gallen, Switzerland
\end{flushright}
\mainmatter
\hypertarget{introduction}{%
\chapter{Introduction}\label{introduction}}
Lower computing costs, a stark decrease in storage costs for digital data, as well as the diffusion of the Internet, have led to the development of new products (e.g., smartphones) and services (e.g., web search engines, cloud computing) over the last few decades. A side product of these developments is a strong increase in the availability of digital data describing all kinds of everyday human activities \citep{einav_levin2014, matter_stutzer2015}. As a consequence, new business models and economic structures are emerging with data as their core commodity (i.e., AI-related technological and economic change). For example, the current hype surrounding `Artificial Intelligence' (AI) - largely fueled by the broad application of machine-learning techniques such as `deep learning' (a form of artificial neural networks) - would not be conceivable without the increasing abundance of large amounts of digital data on all kind of socio-economic entities and activities. In short, without understanding and handling the underlying data streams properly, the AI-driven economy cannot function. The same rationale applies, of course, to other ways of making use of digital data. Be it traditional big data analytics or scientific research (e.g., applied econometrics).
The need for proper handling of large amounts of digital data has given rise to the interdisciplinary field of \href{https://en.wikipedia.org/wiki/Data_science}{`Data Science'} and increasing demand for `Data Scientists'. While nothing within Data Science is particularly new on its own, the combination of skills and insights from different fields (particularly Computer Science and Statistics) has proven to be very productive in meeting new challenges posed by a data-driven economy. In that sense, Data Science is rather a craft than a scientific field. As such, it presupposes a more practical and broader understanding of the data than traditional Computer Science and Statistics from which Data Science borrows its methods. This book focuses on the skills necessary for \emph{acquiring, cleaning, and manipulating} digital data for research/analytics purposes.
\hypertarget{programming-with-data}{%
\chapter{Programming with Data}\label{programming-with-data}}
\hypertarget{handling-data-programmatically}{%
\section{Handling data programmatically}\label{handling-data-programmatically}}
The need for proper handling of large amounts of digital data has given rise to the interdisciplinary field of \href{https://en.wikipedia.org/wiki/Data_science}{`Data Science'} as well as an increasing demand for `Data Scientists'. While nothing within Data Science is particularly new on its own, it is the combination of skills and insights from different fields (particularly Computer Science and Statistics) that has proven to be very productive in meeting new challenges posed by a data-driven economy. The various facets of this new craft are often illustrated in the `Data Science' Venn diagram, reflecting the combination of knowledge and skills from Mathematics/Statistics, substantive expertise in the particular scientific field in which Data Science is applied (here: Economics), and the skills for \emph{acquiring, cleaning, and analyzing} data \emph{programmatically}.
\begin{figure}
{\centering \includegraphics[width=0.6\linewidth]{img/venn_diagramm}
}
\caption{Venn diagram illustrating the domains of Data Science in the context of Economics.}\label{fig:venn}
\end{figure}
In the proposed framework of Data Science in Economics followed in this book, programming skills, basic knowledge about computing and in data technologies serve as a complement to engaging in modern economic analysis. They are necessary ingredients to both working new econometric approaches in machine learning (and the preceding feature engineering) as well as to solving complex problems in the domain of economic modeling. Moreover, programming (or `coding') is the basis to better understand and engage with the handling of data for analytics purposes.
While industry-scale data science projects tend to include data processing frameworks and programming languages (such as Scala, Python, and SQL), we will see that the core steps of how to get from the raw data to the final analysis report can for most simpler data projects be easily managed with just one programming language. In the case of this book, we choose \emph{R}.
\hypertarget{why-r}{%
\section{\texorpdfstring{Why \emph{R}?}{Why R?}}\label{why-r}}
\hypertarget{the-data-language}{%
\subsection{The `data language'}\label{the-data-language}}
The programming language and open-source statistical computing environment \href{www.r-project.org}{\emph{R}} has over the last decade become a core tool for data science in industry and academia. It was originally designed as a tool for statistical analysis. Many characteristics of the language make \emph{R} particularly useful to work with data. \emph{R} is increasingly used in various domains, going well beyond the traditional applications of academic research.
\hypertarget{high-level-language-relatively-easy-to-learn}{%
\subsection{High-level language, relatively easy to learn}\label{high-level-language-relatively-easy-to-learn}}
\emph{R} is a relatively easy computer language to learn for people with no previous programming experience. The syntax is rather intuitive and error messages are not too cryptic to understand (this facilitates learning by doing). Moreover, with \emph{R}'s recent stark rise in popularity, there are plenty of freely accessible resources online that help beginners to learn the language.
\hypertarget{free-open-source-large-community}{%
\subsection{Free, open source, large community}\label{free-open-source-large-community}}
Due to its vast base of contributors, \emph{R} serves as a valuable tool for users in various fields related to data analysis and computation (economics/econometrics, biomedicine, business analytics, etc.). \emph{R} users have direct access to thousands of freely available `\emph{R}-packages' (small software libraries written in \emph{R}), covering diverse aspects of data analysis, statistics, data preparation, and data import.
Hence, a lot of people using \emph{R} as a tool in their daily work do not actually `write programs' (in the traditional sense of the word), but apply \emph{R} packages. Applied econometrics with \emph{R} is a good example of this. Almost any function a modern commercial computing environment with a focus on statistics and econometrics (such as \href{http://www.stata.com/}{STATA}) is offering, can also be found within the \emph{R} environment. Furthermore, there are \emph{R} packages covering all the areas of modern data analytics, including natural language processing, machine learning, big data analytics, etc. (see the \href{https://cran.r-project.org/web/views/}{CRAN Task Views} for an overview). We thus do not actually have to write a program for many tasks we perform with \emph{R}. Instead, we can build on already existing and reliable packages.
\hypertarget{rrstudio-overview}{%
\section{\texorpdfstring{\emph{R}/RStudio overview}{R/RStudio overview}}\label{rrstudio-overview}}
\emph{R} is the high-level (meaning `more user friendly') programming language for statistical computing. Once we have installed \emph{R} on our computer, we can run it\ldots{}
\begin{enumerate}
\def\labelenumi{\alph{enumi}.}
\tightlist
\item
\ldots directly from the command line, by typing \texttt{R} and hit enter (here in the OSX terminal):
\end{enumerate}
\begin{figure}
{\centering \includegraphics[width=0.6\linewidth]{img/r_terminal}
}
\caption{Running \emph{R} in the Mac/OSX terminal.}\label{fig:terminal}
\end{figure}
\begin{enumerate}
\def\labelenumi{\alph{enumi}.}
\setcounter{enumi}{1}
\tightlist
\item
\ldots with the simple \href{https://en.wikipedia.org/wiki/Integrated_development_environment}{Integrated Development Environment (IDE)} delivered with the basic \emph{R} installation
\end{enumerate}
\begin{figure}
{\centering \includegraphics[width=0.75\linewidth]{img/r_ide}
}
\caption{Running \emph{R} in the original \emph{R} GUI/IDE.}\label{fig:ide}
\end{figure}
\begin{enumerate}
\def\labelenumi{\alph{enumi}.}
\setcounter{enumi}{2}
\tightlist
\item
\ldots or with the more elaborated and user-friendly IDE called \emph{RStudio} (either locally or in the cloud, see, for example \href{https://rstudio.cloud/}{RStudio Cloud}:
\end{enumerate}
\begin{figure}
{\centering \includegraphics[width=0.75\linewidth]{img/rstudio_panels}
}
\caption{Running \emph{R} in RStudio (IDE).}\label{fig:rstudio}
\end{figure}
The latter is what we will do throughout this course. RStudio is a very helpful tool for simple data analysis with \emph{R}, writing \emph{R} scripts (short \emph{R} programs), or even for developing \emph{R} packages (software written in \emph{R}), as well as building interactive documents, presentations, etc. Moreover, it offers many options to change its own appearance (Pane Layout, Code Highlighting, etc.).
In the following, we have a look at each of the main panels that will be relevant in this course.
\hypertarget{the-r-console}{%
\subsection{\texorpdfstring{The \emph{R}-Console}{The R-Console}}\label{the-r-console}}
When working in an interactive session, we simply type \emph{R} commands directly into the \emph{R} console. Typically, the output of executing a command this way is also directly printed to the console. Hence, we type a command on one line, hit enter, and the output is presented on the next line.
\begin{figure}
{\centering \includegraphics[width=0.75\linewidth]{img/rstudio_console}
}
\caption{Running \emph{R} in the Mac/OSX terminal.}\label{fig:console}
\end{figure}
For example, we can tell \emph{R} to print the phrase \texttt{Hello\ world} to the console, by typing to following command in the console and hit enter:
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{print}\NormalTok{(}\StringTok{"Hello world"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "Hello world"
\end{verbatim}
\hypertarget{r-scripts}{%
\subsection{\texorpdfstring{\emph{R}-Scripts}{R-Scripts}}\label{r-scripts}}
Apart from very short interactive sessions, it usually makes sense to write \emph{R} code not directly in the command line but to an \emph{R}-script in the script panel. This way, we can easily execute several lines at once, comment the code (to explain what it does), save it on our hard disk, and further develop the code later on.
\begin{figure}
{\centering \includegraphics[width=0.75\linewidth]{img/rstudio_script}
}
\caption{The \emph{R} Script window in RStudio.}\label{fig:rscript}
\end{figure}
\hypertarget{r-environment}{%
\subsection{\texorpdfstring{\emph{R} Environment}{R Environment}}\label{r-environment}}
The environment pane shows what variables, objects, and data are loaded in our current \emph{R} session. Moreover, it offers functions to open documents and import data.
\begin{figure}
{\centering \includegraphics[width=0.5\linewidth]{img/rstudio_environment}
}
\caption{The environment window in RStudio.}\label{fig:renvironment}
\end{figure}
\hypertarget{file-browser}{%
\subsection{File Browser}\label{file-browser}}
With the file browser window we can navigate through the folder structure and files on our computer's hard disk, modify files, and set the working directory of our current \emph{R} session. Moreover, it has a pane to show plots generated in \emph{R} and a pane with help pages and \emph{R} documentation.
\begin{figure}
{\centering \includegraphics[width=0.5\linewidth]{img/rstudio_files}
}
\caption{The file browser window in RStudio.}\label{fig:rfiles}
\end{figure}
\hypertarget{first-steps-with-r}{%
\section{First steps with R}\label{first-steps-with-r}}
Before introducing some of the key functions and packages for data handling and data analysis with R, we should understand how such programs basically work and how we can write them in R. Once we understand the basics of the R language and how to write simple programs, understanding and applying already implemented programs is much easier.\footnote{In fact, since R is an open source environment, you can directly look at already implemented programs in order to learn how they work.}
\hypertarget{values-vectors-and-variables}{%
\subsection{Values, Vectors, and Variables}\label{values-vectors-and-variables}}
The simplest objects to work with in R are vectors. In fact, even a simple numeric value such as \texttt{5.5} or a string of characters (text) like \texttt{"Hello"} is considered a vector (a scalar).\footnote{You can try this out in the R console by typing \texttt{is.vector(5.5)} and \texttt{is.vector("Hello")}.}
A first good step to get familiar with coding in R is to assign names to the objects/values you are working with. For example, when summing up two numeric values, you might want to store the result in a separate object and call this object \texttt{result}. This is done with the assignment operator \texttt{\textless{}-} (or \texttt{=}, which serves the same purpose).
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# assign the variable name "result" to the sum of two numeric values}
\NormalTok{result }\OtherTok{\textless{}{-}} \FloatTok{25.6} \SpecialCharTok{+} \FloatTok{53.4}
\end{Highlighting}
\end{Shaded}
Whenever you want to re-use the just computed sum, you can directly call the object by its name (the variable \texttt{result}):
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# check what "is stored in" result}
\NormalTok{result}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 79
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# further work with the value in result}
\NormalTok{result }\SpecialCharTok{{-}} \DecValTok{20}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 59
\end{verbatim}
With the combine-function (\texttt{c()}), you easily form vectors with several elements and name the elements in the vector. By doing so, you create a very simple dataset. For example, suppose you survey the age of a sample of persons. The age values (in years) gives you an integer vector. By naming each integer value (each vector element) after the corresponding person's name, you then have a simple dataset stored in a named R vector.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# a simple integer vector}
\NormalTok{a }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{10}\NormalTok{,}\DecValTok{22}\NormalTok{,}\DecValTok{33}\NormalTok{, }\DecValTok{22}\NormalTok{, }\DecValTok{40}\NormalTok{)}
\CommentTok{\# give names to vector elements}
\FunctionTok{names}\NormalTok{(a) }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\StringTok{"Andy"}\NormalTok{, }\StringTok{"Betty"}\NormalTok{, }\StringTok{"Claire"}\NormalTok{, }\StringTok{"Daniel"}\NormalTok{, }\StringTok{"Eva"}\NormalTok{)}
\NormalTok{a}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Andy Betty Claire Daniel Eva
## 10 22 33 22 40
\end{verbatim}
To retrieve specific values from this vector, you can either select the corresponding vector element with the element's index (the first, second, third, etc. element) or via its name.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# indexing either via a number of vector element (start count with 1)}
\CommentTok{\# or by element name}
\NormalTok{a[}\DecValTok{3}\NormalTok{]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Claire
## 33
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a[}\StringTok{"Claire"}\NormalTok{]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Claire
## 33
\end{verbatim}
When not sure what kind of object \texttt{a} is, the \texttt{str()} (structure) function, provides you with a short summary.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# inspect the object you are working with}
\FunctionTok{str}\NormalTok{(a) }\CommentTok{\# returns the structure of the object ("what is in variable a?")}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Named num [1:5] 10 22 33 22 40
## - attr(*, "names")= chr [1:5] "Andy" "Betty" "Claire" "Daniel" ...
\end{verbatim}
If you want to learn more about what the \texttt{c()} or \texttt{str()} functions (or any other pre-defined R functions) do and how they should be used, type \texttt{help(FUNCTION-NAME)} or \texttt{?FUNCTION-NAME} in the console and hit enter. A help-page with detailed explanations of what the function is for and how it can be used will appear in one of the R-Studio panels.
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{help}\NormalTok{(str)}
\NormalTok{?c}
\end{Highlighting}
\end{Shaded}
\hypertarget{math-operators}{%
\subsection{Math operators}\label{math-operators}}
Above, we have just in a side remark introduced the very intuitive syntax for two common math operators in R: \texttt{+} for the addition of numeric or integer values, and \texttt{-} for subtraction. R knows all basic math operators and has a variety of functions to handle more advanced mathematical problems. One basic practical application of R in academic life is to use it as a sophisticated (and programmable) calculator.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# basic arithmetic}
\DecValTok{2}\SpecialCharTok{+}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 4
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{sum\_result }\OtherTok{\textless{}{-}} \DecValTok{2}\SpecialCharTok{+}\DecValTok{2}
\NormalTok{sum\_result}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 4
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{sum\_result }\SpecialCharTok{{-}}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{4}\SpecialCharTok{*}\DecValTok{5}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 20
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{20}\SpecialCharTok{/}\DecValTok{5}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 4
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# order of operations}
\DecValTok{2}\SpecialCharTok{+}\DecValTok{2}\SpecialCharTok{*}\DecValTok{3}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 8
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{(}\DecValTok{2}\SpecialCharTok{+}\DecValTok{2}\NormalTok{)}\SpecialCharTok{*}\DecValTok{3}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 12
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{(}\DecValTok{5}\SpecialCharTok{+}\DecValTok{5}\NormalTok{)}\SpecialCharTok{/}\NormalTok{(}\DecValTok{2}\SpecialCharTok{+}\DecValTok{3}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# work with variables}
\NormalTok{a }\OtherTok{\textless{}{-}} \DecValTok{20}
\NormalTok{b }\OtherTok{\textless{}{-}} \DecValTok{10}
\NormalTok{a}\SpecialCharTok{/}\NormalTok{b}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# arithmetics with vectors}
\NormalTok{a }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\DecValTok{4}\NormalTok{,}\DecValTok{6}\NormalTok{)}
\NormalTok{a }\SpecialCharTok{*} \DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 2 8 12
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{b }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{10}\NormalTok{,}\DecValTok{40}\NormalTok{,}\DecValTok{80}\NormalTok{)}
\NormalTok{a }\SpecialCharTok{*}\NormalTok{ b}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 10 160 480
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{a }\SpecialCharTok{+}\NormalTok{ b}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 11 44 86
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# other common math operators and functions}
\DecValTok{4}\SpecialCharTok{\^{}}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 16
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{sqrt}\NormalTok{(}\DecValTok{4}\SpecialCharTok{\^{}}\DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 4
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{log}\NormalTok{(}\DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 0.6931
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{exp}\NormalTok{(}\DecValTok{10}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 22026
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{log}\NormalTok{(}\FunctionTok{exp}\NormalTok{(}\DecValTok{10}\NormalTok{))}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 10
\end{verbatim}
To look up the most common math operators in R and get more details about how to use them type
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{?}\StringTok{\textasciigrave{}}\AttributeTok{+}\StringTok{\textasciigrave{}}
\end{Highlighting}
\end{Shaded}
in the R console and hit enter.
\hypertarget{basic-programming-concepts-in-r}{%
\section{Basic programming concepts in R}\label{basic-programming-concepts-in-r}}
In very simple terms, programming/coding is all about using a computer language to instruct a computer what to do. What reads very complex at first sight, is actually rather simple in (at least for a large array of basic programming problems). At the core of almost any R program is the right application and combination of just a handful of basic programming concepts: loops, logical statements, control statements, and functions. Once you a) conceptually understand what these concepts are for, and b) have learned the syntax of how to use these concepts when writing a program in R, addressing all kind of data handling problems efficiently with R, will simply become a matter of training/practice.
\hypertarget{loops}{%
\subsection{Loops}\label{loops}}
A loop is typically a sequence of statements executed a specific number of times. How often the code `inside' the loop is executed depends on a clearly defined control statement. If we know in advance how often the code inside the loop has to be executed, we typically write a so-called `for-loop'. We typically write a so-called' while-loop' if the number of iterations is not clearly known before executing the code. The following subsections illustrate both of these concepts in R.
\hypertarget{for-loops}{%
\subsection{For-loops}\label{for-loops}}
In simple terms, a for-loop tells the computer to execute a sequence of commands `for each case in a set of n cases'. The flowchart in Figure @ref(fig: for) illustrates the concept.
\begin{figure}
{\centering \includegraphics[width=0.4\linewidth]{img/forloop}
}
\caption{For-loop illustration.}\label{fig:for}
\end{figure}
For example, a for-loop could be used, to sum up each element in a numeric vector of fixed length (thus, the number of iterations is clearly defined). In plain English, the for-loop would state something like: ``Start with 0 as the current total value, for each of the elements in the vector, add the value of this element to the current total value.'' Note how this logically implies that the loop will `stop' once the value of the last element in the vector is added to the total. Let's illustrate this in R. Take the numeric vector \texttt{c(1,2,3,4,5)}. A for loop to sum up all elements can be implemented as follows:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# vector to be summed up}
\NormalTok{numbers }\OtherTok{\textless{}{-}} \FunctionTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{,}\FloatTok{2.1}\NormalTok{,}\FloatTok{3.5}\NormalTok{,}\FloatTok{4.8}\NormalTok{,}\DecValTok{5}\NormalTok{)}
\CommentTok{\# initiate total}
\NormalTok{total\_sum }\OtherTok{\textless{}{-}} \DecValTok{0}
\CommentTok{\# number of iterations}
\NormalTok{n }\OtherTok{\textless{}{-}} \FunctionTok{length}\NormalTok{(numbers)}
\CommentTok{\# start loop}
\ControlFlowTok{for}\NormalTok{ (i }\ControlFlowTok{in} \DecValTok{1}\SpecialCharTok{:}\NormalTok{n) \{}
\NormalTok{ total\_sum }\OtherTok{\textless{}{-}}\NormalTok{ total\_sum }\SpecialCharTok{+}\NormalTok{ numbers[i]}
\NormalTok{\}}
\CommentTok{\# check result}
\NormalTok{total\_sum}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 16.4
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# compare with the result of sum() function}
\FunctionTok{sum}\NormalTok{(numbers)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 16.4
\end{verbatim}
\hypertarget{nested-for-loops}{%
\subsubsection{Nested for-loops}\label{nested-for-loops}}
In some situations, a simple for-loop might not be sufficient. Within one sequence of commands, there might be another sequence of commands that also has to be executed for a number of times each time the first sequence of commands is executed. In such a case, we speak of a `nested for-loop'. We can illustrate this easily by extending the example of the numeric vector above to a matrix for which we want to sum up the values in each column. Building on the loop implemented above, we would say `for each column \texttt{j} of a given numeric matrix, execute the for-loop defined above'.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# matrix to be summed up}
\NormalTok{numbers\_matrix }\OtherTok{\textless{}{-}} \FunctionTok{matrix}\NormalTok{(}\DecValTok{1}\SpecialCharTok{:}\DecValTok{20}\NormalTok{, }\AttributeTok{ncol =} \DecValTok{4}\NormalTok{)}
\NormalTok{numbers\_matrix}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# number of iterations for the outer loop}
\NormalTok{m }\OtherTok{\textless{}{-}} \FunctionTok{ncol}\NormalTok{(numbers\_matrix)}
\CommentTok{\# number of iterations for the inner loop}
\NormalTok{n }\OtherTok{\textless{}{-}} \FunctionTok{nrow}\NormalTok{(numbers\_matrix)}
\CommentTok{\# start outer loop (loop over columns of the matrix)}
\ControlFlowTok{for}\NormalTok{ (j }\ControlFlowTok{in} \DecValTok{1}\SpecialCharTok{:}\NormalTok{m) \{}
\CommentTok{\# start inner loop}
\CommentTok{\# initiate total}
\NormalTok{ total\_sum }\OtherTok{\textless{}{-}} \DecValTok{0}
\ControlFlowTok{for}\NormalTok{ (i }\ControlFlowTok{in} \DecValTok{1}\SpecialCharTok{:}\NormalTok{n) \{}
\NormalTok{ total\_sum }\OtherTok{\textless{}{-}}\NormalTok{ total\_sum }\SpecialCharTok{+}\NormalTok{ numbers\_matrix[i, j]}
\NormalTok{ \}}
\FunctionTok{print}\NormalTok{(total\_sum)}
\NormalTok{ \}}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 15
## [1] 40
## [1] 65
## [1] 90
\end{verbatim}
\hypertarget{while-loop}{%
\subsubsection{While-loop}\label{while-loop}}
In a situation where a program has to repeatedly run a sequence of commands, but we don't know in advance how many iterations we need to reach the intended goal, a while-loop can help. In simple terms, a while loop keeps executing a sequence of commands as long as a certain logical statement is true. The flow chart in Figure @ref(fig: while) illustrates this point.
\begin{figure}
{\centering \includegraphics[width=0.7\linewidth]{img/while_loop_own}
}
\caption{While-loop illustration.}\label{fig:while}
\end{figure}
For example, a while-loop in plain English could state something like ``start with 0 as the total, add 1.12 to the total until the total is larger than 20.'' We can implement this in R as follows.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{\# initiate starting value}
\NormalTok{total }\OtherTok{\textless{}{-}} \DecValTok{0}
\CommentTok{\# start loop}
\ControlFlowTok{while}\NormalTok{ (total }\SpecialCharTok{\textless{}=} \DecValTok{20}\NormalTok{) \{}
\NormalTok{ total }\OtherTok{\textless{}{-}}\NormalTok{ total }\SpecialCharTok{+} \FloatTok{1.12}
\NormalTok{\}}
\CommentTok{\# check the result}
\NormalTok{total}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 20.16
\end{verbatim}
\hypertarget{booleans-and-logical-statements}{%
\subsection{Booleans and logical statements}\label{booleans-and-logical-statements}}
Note that in order to write a meaningful while-loop we have to make use of a logical statement such as ``the value stored in the variable \texttt{total}is smaller or equal to \texttt{20}'' (\texttt{total\ \textless{}=\ 20}). A logical statement results in a `Boolean' data type. That is, a data type with the only two possible values \texttt{TRUE} or \texttt{FALSE} (\texttt{1} or \texttt{0}).
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{2}\SpecialCharTok{+}\DecValTok{2} \SpecialCharTok{==} \DecValTok{4}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] TRUE
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\DecValTok{3}\SpecialCharTok{+}\DecValTok{3} \SpecialCharTok{==} \DecValTok{7}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] FALSE
\end{verbatim}
Logical statements play an important role in fundamental programming concepts. In particular, they are crucial to make conditional statements (`if-statements') that build the control structure of a program, controlling the `direction' the program takes (given certain conditions).
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{condition }\OtherTok{\textless{}{-}} \ConstantTok{TRUE}
\ControlFlowTok{if}\NormalTok{ (condition) \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"This is true!"}\NormalTok{)}
\NormalTok{\} }\ControlFlowTok{else}\NormalTok{ \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"This is false!"}\NormalTok{)}
\NormalTok{\}}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "This is true!"
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{condition }\OtherTok{\textless{}{-}} \ConstantTok{FALSE}
\ControlFlowTok{if}\NormalTok{ (condition) \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"This is true!"}\NormalTok{)}
\NormalTok{\} }\ControlFlowTok{else}\NormalTok{ \{}
\FunctionTok{print}\NormalTok{(}\StringTok{"This is false!"}\NormalTok{)}
\NormalTok{\}}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "This is false!"
\end{verbatim}
\hypertarget{r-functions}{%
\subsection{R functions}\label{r-functions}}
R programs heavily rely on functions. Conceptually, `functions' in R are very similar to what we know as `functions' in math (i.e., \(f:X \rightarrow Y\)). A function can thus, e.g., take a variable \(X\) as input and provide value \(Y\) as output. The actual calculation of \(Y\) based on \(X\) can be something as simple as \(2\times X = Y\). But it could also be a very complex algorithm or an operation that does not directly have anything to do with numbers and arithmetic.\footnote{Of course, on the very low level, everything that happens in a microprocessor can, in the end, be expressed in some formal way using math. However, the point here is that at the level we work with R, a function could simply process different text strings (i.e., stack them together). Thus for us as programmers, R functions do not necessarily have to do anything with arithmetic and numbers but could serve all kinds of purposes, including the parsing of HTML code, etc.}
In R---and many other programming languages---functions take `parameter values' as input, process those values according to a predefined program, and `return' the result. For example, a function could take a numeric vector as input and return the sum of all the individual numeric values in the input vector.
When we open RStudio, all basic functions are already loaded automatically. This means we can directly call them from the R-Console or by executing an R-Script. As R is made for data analysis and statistics, the basic functions loaded with R cover many aspects of tasks related to working with and analyzing data. Besides these basic functions, thousands of additional functions covering all kinds of topics related to data analysis can be loaded additionally by installing the respective R-packages (\texttt{install.\ packages("PACKAGE-NAME")}) and then loading the packages with \texttt{library(PACKAGE-NAME)}. In addition, it is straightforward to define our own functions.
\hypertarget{case-study-compute-the-mean}{%
\subsubsection{Case study: Compute the mean}\label{case-study-compute-the-mean}}
To illustrate the point of how functions work in R and how we can write our own functions in R, the following code-example illustrates how to implement a function that computes the mean/average value, given a numeric vector.
First, we initiate a simple numeric vector which we then use as an example to test the function. Whenever you implement a function, it is very useful to first define a simple example of an input for which you know what the output should be.
\begin{Shaded}