Keywords

1 What is R Programming Language?

R is an open-source software or programming language for computing statistics and graphical displays through methods such as data manipulation, modeling, and calculation (Ihaka & Gentleman, 1996; Venables et al., 2020). In theory, it is a programming language developed by Ross Ihaka and Robert Gentleman in 1993 (Ihaka & Gentleman, 1996), and is regarded as an implementation of the S and S-Plus language that was originally developed at Bell Laboratories by Rick Becker, John Chambers and Allan Wilks (Becker et al., 1988). Practically, R packages and the several supported methods are available and are implemented using integrated developments environment (IDE) such as the RStudio (see Sect. 1.2). Technically, R provides a wide range of statistical and graphical techniques for programming and modeling: ranging from linear and nonlinear modeling techniques to statistical tests and analysis, predictive modeling such as clustering and classification, and time series analysis, etc.

For research and data analytics purposes (see Fig. 1.1), R programming and statistics (R Project, 2023) are performed and used in a series of steps that includes programming, transforming, discovering, modeling, and communication of the outputs or results (Wickham & Grolemund, 2017).

Fig. 1.1
A flow diagram depicts the conceptual overview of steps to use R programming. It begins with the program, transforming, discovering, modeling, and communicating with their details.

Conceptual overview of steps to using R programming for statistical data analysis in research

Whilst many existing statistical software, such as SAS (SAS Institute, 2023) and SPSS (IBM, 2022), provide the researchers or data scientists with a bounteous output from conducting the different statistical analyses or methods, R tends to give the analysts a minimal output by storing the results of the methods in an apt “object” for further functions or interrogations.

Therefore, with R being an “object-oriented programming language” (Mailund, 2017), the intermediate results are stored in objects that can be recalled, re-used, or manipulated by running other pre-defined functions or codes by pointing to the “object name”. In other terms, R is a functional and object-oriented programming used to analyze datasets by running mathematical simulations, rearranging complex datasets into simpler and more useful formats and functions, etc. (Matloff, 2011).

Among the many benefits of R in comparison to the other statistical tools and software includes (Douglas et al., 2020; Matloff, 2011):

  1. 1.

    The capacity to write more efficient code using parallel method or vectorization, because of its programmable integrated environment that uses command-line scripting.

  2. 2.

    The capability to define and customize the functions or codes (e.g., how the analysts expect the resultant models to behave) upon handling of the data. R has several built-in functions and libraries that are extendable (extensible) and allow the users to define their own (customized) functions or methods that can be stored in the simple object system.

  3. 3.

    The capability to create artful (illustrative) graphs to visualize or have a conceptual overview of complex data characteristics and functions.

  4. 4.

    The capacity to interface R language with other programs or softwares (e.g., C/C++, Tableau, Python) for improved and better functionality or speed of data analysis.

  5. 5.

    The capacity to find different packages that can be used to perform image manipulation, textual data analysis or natural language processing, machine learning and classifications, etc.

  6. 6.

    Troubleshooting of bugs (code) with an advanced level of debugging performance.

Other advantages of using R particularly as it concerns its technicality or conducting the different statistical data analysis discussed in this book, include (Venables et al., 2020):

  • It is built on a well-developed, simple, and effective programming language called “S and S-Plus” that supports user-defined recursive functions and conditionals loops.

  • It consists of an effective data handling and storage facility with a wide range of coherent and integrated collections of intermediate tools (packages) and suite operators for statistical data analysis and calculating arrays and matrices.

  • It supports different graphical facilities or functions for data manipulation and displays, either directly on the computer screen or storage as soft copy and hard copy on the machine.

However, just like every other programming language, aside from the statistical power of the software, R has its own sets of limitations. R can be daunting for first-time users and people who do not have prior programming knowledge or experience may find it difficult to use the software. Not necessarily because it is more difficult than other programming languages, but because the syntax is different from that of the many other existing languages. Also, R-supported methods or algorithms are spread across different packages, and in consequence, users with no prior knowledge of some of the packages might find it hard to implement the specific methods or algorithms. Thus, the authors has provided in the first part (PART I) of the book, the fundamental concepts of the R programming and statistical data analysis to guide the work of the readers. In addition, R commands give minimal consideration to the computer memory and management and use a lot of the computer physical memory to store the results of the methods as objects, which may be different from some of the other programming languages like Python (Python Software Foundation, 2023). Thus, it uses more computational power and memory. But, R is a continuously evolving language with newer versions and functionalities being developed, and therefore, much of the limitations will eventually fade away with fresher versions and future updates.

2 RStudio Integrated Development Environment (IDE)

RStudio is a “friendly front end to R language”. It allows the researchers and data scientists to practically implement the R packages, methods, and run the several lines of codes. By definition, “RStudio” is an Integrated development environment (IDE) designed to help researchers and analysts to be more efficient and productive with R (Rstudio, 2023). Typically, RStudio consists of a console, syntax-highlighting editor for direct code execution, and several sophisticated tools and functions for viewing the data history, visualization, troubleshooting of codes, and managing of the project workspace as the authors discuss more in detail later in this chapter.

Just like R programming language, RStudio is free and open source (Rstudio, 2023). Its graphical user interface (GUI) is logically systematized in a way that allows the users to clearly view the data tables and graphs, the source codes, and output/results of the codes, simultaneously.

The RStudio IDE offers the users with Import-Wizard features for importing files of different formats into the environment, e.g., comma-separated values (*.csv), Excel (*.xlsl), SAS (*.sas), SPSS (*.sav), and Stata (*.dta) file formats without having to write the codes. Also, just like many of the existing IDEs or GUIs that are used to execute different programming languages, RStudio has windows with multiple tabs, drop-down menus, and many customization options. And, it is available for Windows, Macintosh, and Linux operating systems (OS) (Rstudio, 2023).

Among the many features and functionalities of RStudio IDE includes:

  • a window that allows the user to write codes and view the results in real time.

  • navigate through the files on the local machine or computer.

  • check the details and history of the imported/analyzed data and variables.

  • visualization of the results and plots (graphs, models) that are generated.

The RStudio IDE can also be used for developing packages, modeling and writing of executable applications, and natural language and machine learning techniques, etc.

It is important when working with R to know the difference between the “R language” and “RStudio” as covered already in this chapter (see Sects. 1.1 and 1.2). It is noteworthy, to always keep in mind that while “R” is a programming language that can practically be used to statistically compute and manipulate the different variables (data) and models. On the other hand, “RStudio” makes use of R language to develop and show the statistical programs/outputs. Thus, “RStudio allows the users to develop and edit programs using R”. Interestingly, R can be used without RStudio, but RStudio cannot be used without R. The user or researcher must first install R before they can install RStudio on their computer, as the authors cover in the next section of this chapter.

3 Installing and Configuring R and RStudio Software

This section of the chapter covers the different steps on how to download, install, and configure R and RStudio before using it for statistical data analysis or research purpose.

3.1 Downloading and Installing R Language

Installing R on the computer is very simple and easy. All the user need is to know which operating system (OS) they are using so that they can download the right software for installation on the computer system.

The official site for downloading the R free software is via the following link: https://www.r-project.org/.

When you visit the site, you will find different binary files for the different types of operating systems (OS) that the R software support, particularly the most common: Windows, Mac OS, and Linux. The latest versions of Linux distributions come with R by default. But for Windows and Mac OS, the user will need to download and install the software as follows:

Go to https://www.r-project.org/ by entering the URL on the web browser and click on “download R” as shown in Fig. 1.2.

Fig. 1.2
A screenshot depicts downloading the R software. The website link of the R project is labeled as Step 1 and the web page describes the details of the R project for statistical computing. The hyperlink to download R is labeled as step 2.

Downloading R software

When the user selects “download R”, they will be automatically directed to another page where they will be asked to select the Country from which they will be using R software, or yet the closest location to you if the country of location is not listed on the site. Navigate to the country of choice and click on any of the CRAN links under the country to proceed with the download process. CRAN is a network of ftp (file transfer protocol) and web servers around the world that store identical, up-to-date, versions of code and documentation for R.

For example, as shown in Fig. 1.3, the user can navigate to Mexico (e.g., the authors of this book are affiliated with the country at the time of writing this book) and select https://cran.itam.mx/ to proceed with the downloading process. ***Note, it is recommended to always choose the closest CRAN link to you upon downloading the R software. 

Fig. 1.3
A screenshot of the web page CRAN network for R download. It depicts links for various countries to host CRAN mirror. The link to Mexico is labeled as step 3 and CRAN f t p or web server is selected.

Country of location and nearest CRAN network for R download

When the user clicks on the R CRAN binary distribution of their choice, they will be directed to a page where they can download the right version of R for the specific operating system (OS) they are using on the computer (see Fig. 1.4).

Fig. 1.4
A screenshot of the comprehensive R archive webpage. On the left are hyperlinks under CRAN, About R, Software, and documentation. On the right, it depicts steps to download and install R. The steps are highlighted. Below, is the source code for all platforms and hyperlinks on questions about R.

Downloading the right version of R for your operating system (OS)

For example, as shown in Fig. 1.5 (or step 5), when the users click on Download R for (Mac) OS X, they will be directed to where they can then download the latest version of R for Mac OS X. Same applies to the other types of operating systems (OS) such as Windows, if you are using the Windows operating system.

Fig. 1.5
A screenshot of the comprehensive R archive webpage. Instructions for downloading R 4 point 2 point 0 for Mac O S X on the CRAN website, highlighting step 5 and the download link.

Downloading the latest release of R for your operating system (OS)

When the user clicks on the download link, the executable program (i.e., installation file) will be automatically downloaded on the computer. Navigate to the location where the downloaded file is stored on the computer and install it as every other application program, e.g., by double-clicking on the downloaded file. When you double-click on the file, you will get a pop-up window as shown in Fig. 1.6a. Follow the steps illustrated in the figures (Fig. 1.6a, b, and c) by Clicking on Continue until you see the window that says you have successfully installed R software.

Fig. 1.6
2 screenshots of a web page titled Install R 4.0.2 for Mac O S. On the left is the introduction, Read me, license, and other details with the introduction highlighted, and on the right is a note and the continue tab highlighted. The second screenshot depicts details on the software license agreement.figure 6

a Installing R on the computer or local machine (step 1). b Installing R on the computer or local machine (step 2). c Successfully installing R on the computer or local machine (step 3)

3.2 Downloading and Installing RStudio Software

The next step after installing R on your computer, is to download and install the RStudio IDE that allows the users to use R.

The official site for downloading the RStudio free software is via the following link: https://rstudio.com/ or https://posit.co/.

As shown in Fig. 1.7, when you visit the RStudio website, you will find the download link where the user can download the latest version of RStudio for their computer operating system (OS). ***Note that all the companies update their websites every now and then, and therefore, it may be likely that you find a different front-end display different from the one in Fig. 1.7, which the company uses at the time of writing this book. If you happen to find an updated website depending on when the reader is reading or using this book guide, just simply find where the download link is located on the website and follow the same steps or procedure discussed in this current chapter.

Fig. 1.7
A screenshot of the webpage R studio. It has tabs download, support, docs, and community at the top right and the tab download is highlighted.

Downloading RStudio software

Click on the “DOWNLOAD” menu (Fig. 1.7), and you will be directed to a page where you can download the RStudio software. Select the “Download” link for the “free version of RStudio Desktop” as shown in Fig. 1.8. Again, it is important to note that the web display is based on the time of writing this book. As you can see in the figure (Fig. 1.8), there are also paid versions of the software, but those are not covered in this topic.

Fig. 1.8
A screenshot of the web page, download the free version of R Studio. It depicts detailed text on Choose your version and versions such as free and priced versions. The free version to download is highlighted as step 2.

Downloading free version of RStudio software

When you have selected the free version of the software, then select the “right version of the Installer for your Operating System (OS)” as shown in Fig. 1.9.

Fig. 1.9
A screenshot of the web page Download R studio. It depicts All installers and a list of the Operating systems such as Windows, Mac, Ubuntu, and others highlighted and a list of R studio versions for each O S along with the size list and S H A-256 list.

Downloading the right version of RStudio for your operating system (OS)

For instance, as shown in Fig. 1.9, when the user clicks on the download link for the file “RStudio.dmg” (which is for the MacOS latest version at the time of writing this book), the executable program (Installer) will be automatically downloaded on the computer system. Navigate to the location where the downloaded (Installer program) file is stored on your computer or local machine, and install it as every other application program, e.g., by double-clicking on the downloaded file.

When you double-click or run the file, you will get a pop-up window as shown in Fig. 1.10.

Fig. 1.10
A screenshot of the R Studio 1.3.1093 web page. It depicts applications on the left and R Studio on the right with an arrow from R Studio to applications that reads Step 4 and a detailed text on how to open the program with a note.

Installing RStudio on your computer (e.g., for MacOS)

As illustrated in the figure (Fig. 1.10), same installation process applies to other types of OS (operating system) such as Windows if you are using the Windows OS. Double-click the Installation file to start up the executable file (Installer program). Then, click on Continue until you see the window that says you have successfully installed the RStudio software.

Once you have completed the installation process, start the RStudio IDE by either opening the application from the list of programs on your computer or clicking on the desktop shortcut icon. You will be presented with a Window as shown in Fig. 1.11.

Fig. 1.11
4 screenshots of R Studio integrated development environment. It is represented for Windows 4, 3, 2, and 1 as a source code editor, data environment and history, console, packages, files, and workspace management with tabs at the top highlighted.

RStudio integrated development environment (IDE)

Congratulations! You are now set to run and execute your first R project in RStudio. Welcome to using R programming for statistical data analysis in research covered as the main objective of this book.

The first time the users open RStudio, they will be presented with three windows by default, i.e., Window-1, Window-2, and Window-3 (see Fig. 1.11). The fourth window (Window-4) is hidden by default and is only displayed when the user executes a program or run a command, but the users can also open it by selecting the “File” drop-down menu, then New File, and then R Script or simply by importing a dataset into the environment, which the authors will cover in detail in the next section (Sect. 1.4) and chapter (Chap. 2) of this book.

In Table 1.1, the authors outline the description and functions of the different tabs (component) of the R window or integrated development environment (IDE) (see Windows 1, 2, 3, and 4 in Fig. 1.11).

Table 1.1 Description of function of the different tabs (component) of R window (IDE)

4 Running Your First R Project in R Using RStudio

In this section of the chapter, the authors introduces to the readers steps on how to create or start an R Project in RStudio. RStudio Projects make it easier and straightforward for the users to distribute their work into different categories or contexts, with each having their own working directory in the workspace including the history and source code documents. It is important to keep in mind that R projects are associated with a “working directory” where the users can save their new or running projects, and also retrieve existing projects.

Users can create an RStudio project in either a (i) brand-new directory, (ii) an existing directory where they already have R code and data, (iii) or by cloning a version control repository, e.g., from Git, GitHub or Subversion (see Fig. 1.13).

To create a new R Project in RStudio, start RStudio (see description in the previous section—Sect. 1.3). Once you are logged in and have the RStudio window open (see Fig. 1.11), click on the “File” menu at the top left corner of the RStudio window and select the “New Project” button as shown in Fig. 1.12. You will be presented with a pop-up window as shown in Fig. 1.13.

Fig. 1.12
A screenshot of the R Studio tabs file, edit code, view, and plots. The file option is selected and the New project option among other options such as open file, reopen with encoding, recent files, and others, is selected

Creating a new project in RStudio

Fig. 1.13
A screenshot of the web page Create Project. It depicts New directory, existing directory, and version control. The New Directory is highlighted.

New project wizard pop-up window

Select the “New Directory” option and fill in the pop-up with your chosen preferred project_directory_name by following the steps illustrated in Figs. 1.14 and 1.15.

Fig. 1.14
A screenshot of the web page titled New Project Wizard. It has a tab back at the top left. The tab New Project is highlighted and the other tabs are R package, Shiny Web Application, R package using R c p p, R package using R c p p Armadillo, R package using R c p p Eigen, and R package using R c p p Parallel.

Selecting the project type

Fig. 1.15
A screenshot of the web page New Project Wizard. It depicts a space to fill in the directory name and browse tab alongside create a project as a subdirectory of and a box to check labeled use r e n v with this project and a check box open in the new session and the tab create the project is highlighted. Steps are depicted for each tab.

Creating a new R project and directory name

When finished click the “Create Project” button and you’re done! Congratulations once more! You have created your new project in R.

With the window and a console open, for instance, the authors named our project “MyFirstR_Project” in the directory as shown in Fig. 1.16, we are ready to run the R script, therein we can code and run the programs.

Fig. 1.16
A screenshot of the R console. It depicts details on the first project, on the right is the console for the environment and below it is the new folder with the file My First Project with the size and other details.

New R project created in RStudio

To create a “new R script” and run your new/written codes, select the “File” drop-down menu, then “New File”, and then “R Script” as shown in Fig. 1.17.

Fig. 1.17
A, R studio console with the tabs file, edit, and others. The tab file is selected and New file is selected from the drop down and R script is selected from the drop-down of New file.

Creating a new R script

You will be presented with a new working window or editor where you can start writing your code (see Fig. 1.18).

Fig. 1.18
4 windows of New R script. The first depicts an empty window with a cursor and the text start writing your code here, below is the details on the first R project. On the right is the empty Environment console and bottom is a folder My First R project with folder details.

New R script window with the source/code tab

Now let’s run some simple lines of code. As shown in Fig. 1.19 (see Steps 1 and 2), write the example codes from Line 1 to Line 4 in the Editor and execute the codes using the “Run” button (see Fig. 1.19). Example R code: Line 1: x <- 3 + 5 Line 2: x Line 3: print(x) Line 4: print(“I am ready to work with data in R and start conducting the different statistical analysis for my research”)

Fig. 1.19
4 windows of writing and running R scripts example in RStudio. The first depicts writing R code from 1 line to 4th line, the second depicts the R object that is loaded in the I D E environment and the results or outputs are illustrated in the third window and the stored file is represented in the fourth window.

Writing and running R scripts example in RStudio

*Remember, start from Line 1 (e.g., by clicking anywhere in the line) before running the codes, or alternatively, follow the steps illustrated in Fig. 1.20 to run (execute) all the codes at once.

Fig. 1.20
A screenshot of a window depicts the tab code that is elected with the run region selected among other options and Run All selected from the drop-down of Run Region.

Running all the R script source code in RStudio editor

When finished or run all applied, you will be presented with a screen similar to the one in Fig. 1.19. Well done! You have just created and run your first R project and R script in RStudio.

Next, as you can see in Fig. 1.19, the R Script is named “Untitiled1” by default. We will save the R script with a name on the computer or workspace so that we can retrieve it anytime or when we want.

As illustrated in Fig. 1.21, click on the “File” drop-down menu and choose the “Save as” option.

Fig. 1.21
A screenshot of the R Studio window with the file tab selected and save as selected from other options.

Saving untitled R script

You will be presented with a pop-up window as shown in Fig. 1.22. Enter your chosen or preferred script name, for example, “MyFirst-RScript”, and click the “Save” button to save the Script with the new name.

Fig. 1.22
2 windows depict saving the R script in R Studio. The file is labeled as Untitled with the space provided for Tags and space provided for Where, which is filled as My First Project with tabs cancel and save. The second window depicts My First R Script under Save As option and My First R Project under the option Where with the tab save highlighted.

Saving the R script in RStudio

Now, take another look at the new window or screen (see Fig. 1.23), you will find that the “Untitiled1” script has now changed or saved as the new name “MyFirst-RScript”. Also, you will notice that the updated/saved file has also been included and listed in the File Explorer window (Fig. 1.23).

Fig. 1.23
4 windows of the Saved R script and file explorer. The Named script My First R project is highlighted in the first window. Below it are the details of the My First R project. The third window depict the Environment with values x and 8 and the last window depicts two updated files stored in the C drive that are highlighted.

Saved R script and file explorer

5 Tips and Technical Guidelines

Here, the authors provide further tips and other useful information the readers, particularly the first timers to R, can find as a technical guide in their journey with R covered in this book.

5.1 Tips About a New R Project

When the user creates a new project in RStudio:

  1. 1.

    It creates a project file with .Rproj extension within the directory. This is used as a reference point to the computer file system or yet shortcut for opening the project directly from the file system.

  2. 2.

    It creates another hidden directory named .Rproj.user for storing and handling the temporary files, e.g., auto-save or state of the window.

  3. 3.

    It loads the new project unto RStudio and displays the name in the Projects toolbar (see top right side of main toolbar in Fig. 1.23).

5.2 Opening Existing R Projects and R Scripts

There are various ways to open an existing R project or R Script:

  1. 1.

    By selecting the specific Project from the list of “Recent Projects” from the “File” drop-down menu. The same procedure applies to opening a specific R script by selecting “Recent Files” from the “File” menu and selecting the R script name.

  2. 2.

    By using the “Open Project” command from the “File” drop-down menu, and then browse the working directory and select an existing project file (.Rproj). Same procedure applies to opening a specific R script by selecting “Open File” from the “File” menu, and then browses the working directory and select an existing R file (.R).

  3. 3.

    By double-clicking on the specific project/file from the list of files in the File Explorer tab/window.

The following actions are performed when a new or existing project is opened in RStudio:

  1. 1.

    It starts a new R session.

  2. 2.

    It sources the .Rprofile file in the main directory of the project.

  3. 3.

    The .RData file and .Rhistory file in the main directory of the project will be loaded in the Environment or History Tab.

  4. 4.

    The R working (current) directory will be set to the current projects’ directory.

  5. 5.

    Existing source codes (if any) will be loaded into the editor window.

  6. 6.

    Other relevant settings and active tabs of the project will be restored to the original form when it was last saved or closed.

5.3 Working with Multiple R Projects

RStudio allows the users to simultaneously work with more than one project at once by opening an instance of each project on its own.

  1. 1.

    The users can use the “Open Project in New Session” option from the “File” drop-down menu to do this.

  2. 2.

    Or by opening multiple project files from the File Explorer or system file by double-clicking on the specific folder or file in each setting as required.

5.4 Closing or Quitting R

The following actions are performed when the user closes an active project or opens another project concurrently:

  • The source codes in the editor window are saved so that you can re-open or restore them when next you open the closed project.

  • All available .Rhistory and .RData will be stored on the project directory.

  • Other relevant settings and active tabs of the project will be stored in their current form.

  • The R session will be terminated.

6 Summary

In this chapter, the authors provided an introduction to the R programming language and RStudio software or IDE. It covered the basic concept of R programming, and how the readers can install and run their first project using R. The capacity to write more efficient code using parallel method or vectorization is one of the main features of the R software illustrated in this chapter, because of its programmable integrated environment (RStudio) that uses command-line scripting. We also showed the capability to define and customize the R functions or codes. R has several built-in functions and libraries that are extendable (extensible) and allow the users to define their own (customized) functions or methods that are stored in the simple object system. In the next chapter (Chap. 2), the authors focus on introducing the readers to how to effectively work with Data in R. This includes understanding and learning how to create Objects, Vectors, and Factorization in R, to understanding how to install the R Packages and Libraries, and then presents some hands-on examples of Data Visualization methods and practices.