How to Upload Dataset Into Rstudio Command

Importing Data into R

A tutorial most data analysis using R

Dr Jon Yearsley (School of Biology and Ecology Science, UCD)

  • Objectives
  • Organise yourself!
  • Information Workflow
  • Format your data (tidy information)
  • Data frames
  • Importing spreadsheet information
  • Summary of the topics covered
  • Farther Reading

How to Read this Tutorial

This tutorial is a mixture of R code chunks and explanations of the code. The R lawmaking chunks will appear in boxes.

Below is an case of a clamper of R code:

                                          # This is a chunk of R lawmaking. All text afterwards a # symbol is a comment                                            # Set working directory using setwd() role                                            setwd('Enter the path to my working directory')                                            # Clear all variables in R'southward memory                                            rm(list=                ls())                # Standard lawmaking to articulate R's memory                                    

Sometimes the output from running this R code will be displayed after the chunk of code. R output volition be preceeded by ##.

Here is a chunk of lawmaking followed by the R output

                                          2                +                4                # Apply R to add two numbers                                    
          ## [1] six        

Objectives

The objectives of this tutorial are:

  1. Demonstrate good practise in information organisation
  2. Innovate plain text file formats for data
  3. Explain data import into R

Organise yourself!

Before you offset importing data into R you lot should have fourth dimension to organised your workspace on your computer:

  • Create a binder on your computer to contain all your work for this item project (e.k. a folder chosen DataModule)
  • Inside this projection folder create another folder called data. This volition hold all the raw data files. These raw data files should not be changed.
  • Inside this project folder create a text file called MyFirstScript.R. You can use RStudio for this (for this use File->New File->R Script carte option) or whatever basic text editor to do this (e.g. Notepad, TextEdit, gedit, emacs). This file will exist your R script that volition comprise all the commands for R. The .r or .R suffixes is the standard suffix for an R script.
  • If you are starting a large project consider creating divide folder for: R scripts, figures, output from the R script

Your showtime R script

Now you have created the file MyFirstScript.R you should put some header text at the first of the file to explain what the R script will practise. This was described in tutorial one.

Video Tutorial: Creating a new R script with RStudio (1 min)

The text should take a short explanation of the R script followed past your name and the engagement y'all wrote the R script. Each line should offset with a # and so that the text is non interpreted by R (this text is for humans so they understand what the file is intended to practise). Hither is an instance,

          # ********** Start of header ************** # Title: <The championship of your R script>  # # Add a short description of the R script here. # # Author: <your name>  (email address) # Date: <today's appointment> # # *********** Stop of header ****************  # Two mutual commands at the commencement of an R script are: rm(list=ls())         # Clear R's memory  setwd('~/DataModule') # Fix the working directory  # Replace '~/DataModule' with the name of your own directory  # ****************************************** # Write your commands beneath.  # Remember to use comments to explain your commands                  

Writing clear R scripts

An R script isn't simply telling the calculator how to perform calculations on your data. It is as well explaining your working to other homo beings.

"Instead of imagining that our main task is to instruct a computer what to do, let usa concentrate rather on explaining to human beings what we want a calculator to exercise." – Donald E. Knuth

To make your R scripts usable by humans they must exist clearly commented (using the # symbol to starting time a comment) and clearly organised.

As you write an R script consider these questions:

  • Does your R script wait well organised (e.g. is it well spaced, are lines indented logically)?
  • Could someone else read the R script and empathize the basic idea?
  • Could someone else change your R script relatively easily?
  • In a couple of months time could you quickly read and edit your own R script?

Professional data analysts take clarity very seriously. Here are some links to R coding style guides:

  1. Google's way guide, https://google.github.io/styleguide/Rguide.xml
  2. Hadley Wickham's way guide, http://adv-r.had.co.nz/Manner.html
  3. http://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
  4. http://nicercode.github.io/blog/2013-04-05-why-nice-code/

Information Workflow

Beneath is a schematic of the workflow for handling data.

Figure: The workflow to follow when handling data.

In this tutorial we will consider formating data, in the next tutorial we'll discuss importing information, and then we'll showtime to consider exploring the information using graphics and numerical summaries.

Format your data (tidy information)

The workflow starts long before yous analyse your data. It starts fifty-fifty before you have your data in some figurer software.

Organising your data should follow tidy data guidelines (come across below) and be planned before you collect your information. The format of the data should be finalised before importing the data into R. It is often easiest to tidy your data using a spreadsheet program before you lot import the data into R.

Well organised data from the start will make your life a lot easier and your information import as painless as possible.

6 guidelines for tidy data

When tidying your data y'all should ensure that:

  1. each variable has its own cavalcade
  2. each row is an observation
  3. the meridian of each column contains the name of the variable
  4. in that location are no bare columns or bare rows betwixt information
  5. all information in a cavalcade has the aforementioned blazon (e.yard. information technology is all numerical information, or it is all text data)
  6. data are consistent (eastward.yard. if a binary variable can take values 'Yes' or 'No' then only these two values are allowed, with no alternatives such as 'Y' and 'Northward')

PDF Summary: This PDF document reiterates the concept of tidy data

The link to the PDF is: http://www.ucd.ie/ecomodel/pdf/TidyData.pdf

Poorly vs well formatted data

The information gear up shown in the figure below are an instance of poorly formatted data. The data set contains data on the lead concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were nerveless: samples from fillets of fish and from whole fish. The data has three variables: pb concentration, species of fish and type of fish sample.

Figure: A poorly formatted data set. This file would be hard to import and analyse in this format.

How would y'all improve the format of the poorly formatted information shown in the figure? (Hint: use the 6 guidelines above)

The 2d figure shows some well formatted information that follows the tidy data guidelines: each column represents a unmarried variable and each row an observation.

Figure: A well formatted data set. This file would be easy to import and analyse in this format. One column contains the data for one variable. These data are the worldwide occurences of Covid-19, downlaoded from the European Centre for Disease Prevention and Control, https://www.ecdc.europa.eu/en

Data frames

A data frame is R'south name for spreadsheet information (e.grand. information organised in a grid, like Excel). R stores the vast majority of data as a data frame and uses data frames when analyzing data.

A information frame forces the data to be well organised.

  • Each column is a variable. The name of this variable becomes the proper name of the column.
  • Each row corresponds to an observation. This meas that values in the same row are data collected almost the same object. Rows can also have names.

Beneath is an example of a information frame (chosen airquality) that contains data on the air quality in New York from May - September 1973 (this is a data set that is built in to R).

                                          # The airquality data is a built-in dataset                                                          # First 10 rows of the airquality information frame                                            head(airquality,                n=                10)                      
          ##    Ozone Solar.R Wind Temp Month Day ## ane     41     190  7.4   67     five   ane ## two     36     118  8.0   72     v   2 ## 3     12     149 12.half-dozen   74     5   3 ## 4     18     313 11.v   62     5   4 ## v     NA      NA 14.3   56     5   five ## 6     28      NA xiv.9   66     v   vi ## 7     23     299  8.half dozen   65     v   7 ## eight     19      99 13.8   59     5   eight ## nine      8      nineteen 20.1   61     five   9 ## x    NA     194  8.6   69     5  x        

Y'all tin type ?airquality to brandish the help file for this data set. The data frame has 154 rows (observations) and 6 columns (variables measured). The 6 columns contain information on: ozone concentrations (parts per billion), solar radiation, air current speed, air temperature, calendar month and solar day of observation. Yous tin can see that each column has a proper name corresponding to the data for that cavalcade.

The structure of the data frame can be viewed using the str() role

                                          # Brandish the structure of the airquality data frame                                            str(airquality)                      
          ## 'data.frame':    153 obs. of  6 variables: ##  $ Ozone  : int  41 36 12 xviii NA 28 23 nineteen 8 NA ... ##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ... ##  $ Wind   : num  7.4 8 12.6 11.5 14.iii fourteen.ix 8.six 13.8 xx.i 8.6 ... ##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ... ##  $ Month  : int  5 5 5 5 5 5 v 5 5 5 ... ##  $ Mean solar day    : int  1 two iii 4 5 half dozen vii 8 9 ten ...        

The str() function shows that this is a information frame with 153 observations (rows) and six variables (columns). It too shows the data tyes of the variables: wind is a numerical variable (i.e. continuous) and the other variables are all integers (i.east. whole numbers).

Tidy data in R is described in more detail on this web page: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html

Tibbles

A recent development (circa 2016) is an improved information frame called a tibble. Nosotros will not hash out these new data frame objects here, but you can read nigh them at https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html.

Don't Panic! Tibbles are very similar to data frames.

The important point to know is that if you lot apply RStudio's GUI interface to import data then your data will be stored in a tibble, not a data frame.

Importing spreadsheet data

To start working with information in R yous need to import your information into R. Yous are aiming to have a data frame that contains your data.

The simplest style to import data into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes chosen flat files) can be read past whatever estimator operating organisation and past many different statistical programs. Saving data as a elementary text file makes your data highly transportable.

Importing data from software specific formats (e.g. Excel's .XLSX format, Minitab'south .MTW format, SPSS's .SAV format or SAS'south .SAS format) is possible (e.g. using RStudio's Import Dataset GUI). If you want your data to exist easily shared with other people then use a text file to shop your information.

Nosotros advise yous to:

  • save your data as a text file (software, such as Excel, oft have an option to save data as plain text)
  • organize data with columns corresponding to unlike variables earlier exporting to the text file
  • use a visible text grapheme to delimit each column (unremarkably a comma, semi-colon). Using an invisible graphic symbol (due east.g. a infinite or a TAB) is not recommended because these characters all await the aforementioned at offset glance.

Full general advice on importing data into R can exist found at https://cran.r-project.org/md/manuals/r-release/R-information.html

Converting data to a CSV text file

A comma separated values file (CSV file) is the most common format for a text file that contains information.

Hither are a few video tutorials on converting data into a CSV text file so that information technology is suitable for import into R.

Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)

Video Tutorial: Converting data from Googlesheets to a CSV format (1 min)

Viewing text files

Before importing a text file into any software package it is a huge assist if you tin look at it in a text editor. Text files can contain characters that are ordinarily invisible (due east.g. spaces, tabs and end of line markers). If a text editor is going to be of use it must be able to display all the characters in a file.

3 text editors that tin can do this are:

notepad++ is a free program for Windows operating systems

BBedit is a free program for Mac OSX operating systems

emacs is a GNU opensource programme primarily for Linux operating systems.

On Linux systems the cat -A control from the concluding is also useful.

Hither are two video tutorials on this topic

Video Tutorial: Viewing data in a text file earlier importing into R (4 mins)

Video Tutorial: An overview of the common data text file formats (3 mins)

Data import examples

The information we'll be importing are described at http://www.ucd.ie/ecomodel/Resources/datasets_WebVersion.html

The files are:

  • WOLF.CSV: This file is a text file of comma separated values.
  • HEIGHT.CSV: This file is a text file of comma separated values.
  • INSECT.TXT:This file is a text file of TAB delimited values.
  • BEEKEEPER.TXT: This file is a text file with blank space delimiting the values.
  • MALIN_HEAD.TXT: This file is a text file with TAB delimited values.

All these information files are uncomplicated text files that differ in the character used to distinguish columns of data.

Comma delimited files (CSV files)

CSV stands for comma separated values (note sometimes semi-colons are used in place of commas because some countries utilise the comma in place of the decimal point).

The read.table() office is a flexible function for importing text data

Video Tutorial: Importing a CSV file into R using read.table() (v mins)

                                          # Import WOLF.CSV file using read.table office                            wolf                =                read.tabular array('WOLF.CSV',                header=                TRUE,                sep=                ',')                      

The wolf variable contains the imported data. It is chosen a information frame.

The ideal arrangement of a information frame is for each row to be an observation of some object and each columns a variable that measures some property of the object. For instance, each row of wolf is an observation of one individual wolf and each column of wolf give data about where the wolf was observed and the data nerveless from its hair sample.

The Peak.CSV file likewise contains comma separated values. Here is the read.table() command to read in this file

                                          # Import Superlative.CSV file using read.table function                            homo                =                read.tabular array('Tiptop.CSV',                header=                TRUE,                sep=                ',')                      

Note: The function read.csv() is a special example of the read.tabular array() function.

Use the R help pages to learn more than about these functions

                          ?read.tabular array                # Brandish help page on read.table office                                    

TAB delimited files (TXT files)

The INSECT.TXT information set is a text file where variables are delimited by a TAB. In improver the first three lines contain a data description that nosotros do non want to import.

The read.tabular array() function can be used to import this file. The argument skip=three is used to ignore the first 3 lines. The argument sep='\t' specifies a TAB every bit the variable delimiter

                                          # Import INSECT.TXT file using read.table part (TAB delimited)                                            # skipping the first 3 lines (skip=iii)                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                3,                sep=                '                \t                ')                      

The MALIN_HEAD.TXT also contains TAB delimited data. Here is the read.table() control to read in this file

                                          # Import MALIN_HEAD.TXT file using read.table office (TAB delimited)                            rainfall                =                read.table('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')                      

Blank space delimited files

The Apiculturist.TXT data set up uses white space to delimit the variables. The first six lines of the file contain a description of the information

Using read.table() with the argument sep='' volition interpret whatsoever infinite equally a variable delimiter.

                                          # Import BEEKEEPER.TXT file using read.table function (white space delimited)                                            # skipping the get-go 6 lines (skip=6)                            bees                =                read.table('Beekeeper.TXT',                header=T,                skip=                6,                sep=                '')                      

Summary of import commands

Type of text file R Command
Comma delimited (.CSV) read.tabular array(<filename>, header=T, sep=',')
TAB delimited (.TXT) read.table(<filename>, header=T, sep='\t')
Blank space (.TXT) read.table(<filename>, header=T, sep='')
                                          # Comma separated values                            wolf                =                read.tabular array('WOLF.CSV',                header=                True,                sep=                ',')              human being                =                read.table('Elevation.CSV',                header=                TRUE,                sep=                ',')                                            # TAB delimited values                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                3,                sep=                '                \t                ')              rainfall                =                read.table('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')                                            # White space delimited values                            bees                =                read.table('Beekeeper.TXT',                header=T,                skip=                6,                sep=                '')                      

Importing data using RStudio

RStudio has its own data import functionality. To utilise this you will need to install the R packet readr. For more inofmration about this encounter RStudio's guide: https://support.rstudio.com/hc/en-us/articles/218611977-Importing-Data-with-RStudio

Video Tutorial: Importing a CSV file into R using RStudio's GUI (3 mins 13 secs)

Importing data using RStudio will save the information as a modified data frame, called a tibble (tibbles are briefly discussed above).

Importing using fread()

fread() is a powerful data import role that is similar to read.tabular array() merely faster. It is part of the information.table package, which you volition need to install.

You should only have to give fread() the name of the file you desire to import, and fread() will effort to work out the appropriate way to import the data. Try some examples and compare the the examples above

                                          # ******************************************                                            # Other packages for importing data --------                                            # The data.table parcel                                                          library(information.table)                # Load the information.table packet                                                          # Import a CSV file                            wolf2                =                fread('WOLF.CSV')                            human2                =                fread('Acme.CSV')                                            # Import TAB delimited file                            insect2                =                fread('INSECT.TXT')              rainfall2                =                fread('MALIN_HEAD.TXT')                                                          # Import white infinite delimited file                            bees2                =                fread('Apiculturist.TXT')                      

The fread() command is simpler to utilise because it tries to guess the format of the data in the file.

Summary of the topics covered

  • Organizing your files on your estimator
  • Best practise for formatting data
  • Reading in spreadsheet data
  • Data frames

Further Reading

All these books tin be found in UCD'southward library

  • Andrew P. Beckerman and Owen L. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford University Press, Oxford) [Chapter 2, three]
  • Mark Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter)
  • Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Affiliate 2]
  • Tenko Raykov and George A Marcoulides, 2013 Bones statistics: an introduction with R (Rowman and Littlefield, Plymouth)

smithforis1954.blogspot.com

Source: https://www.ucd.ie/ecomodel/Resources/Sheet2a_data_import_WebVersion.html

0 Response to "How to Upload Dataset Into Rstudio Command"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel