R tutorial - Learn How to Subset, Extend & Sort Data Frames in R



40
162634

Explore how you can subset, extend and sort your data frames in R. Join DataCamp today, and start our interactive intro to R programming tutorial for free: https://www.datacamp.com/courses/free-introduction-to-r The data frame is somewhere on the intersection between matrices and lists. To subset a dataframe you can thus use subsetting syntax from both matrices and lists. On the one hand, you can use the single brackets from matrix subsetting, while you can also use the double brackets and dollar sign notation that you use to select list elements. We'll continue with the data frame that contained some information on 5 persons. Have another look at its definition here. Let's start with selecting single elements from a data frame. To select the age of Frank, who is on row 3 in the data frame, you can use the exact same syntax as for matrix subsetting: single brackets with two indices inside. The row, index 3, comes first, and the column, index 2, comes second: Indeed, Frank is 21 years old. You can also use the column names to refer to the columns of course: Just as for matrices, you can also choose to omit one of the two indices or names, to end up with an entire row or an entire column. If you want to have all information on Frank, you can use this command: The result is a data frame with a single observation, because there has to be a way to store the different types. On the other hand, to get the entire age column, you could use this command: Here, the result is a vector, because columns contain elements of the same type. Subsetting the data frame to end up with a sub data frame that contains multiple observations also works just as you'd expect. Have a look at this command, that selects the age and parenting information on Frank and Cath: All of these examples show that you can subset data frames exactly as you did with matrices. The only difference occurs when you specify only one index inside `people`. In the matrix case, R would go through each column from left to right to find the index you specified. In the data frame case, you simply end up with a new data frame, that only contains the column you specified. This command, for example, gives the age column as a data.frame. I repeat: a data.frame, not a vector! Why so? Let me talk about subsetting data.frames with list syntax and it'll all become clear. Remember when I told that a data frame is actually a list containing all vectors of the same length? This means that you can also use the list syntax to select elements. Say, for example, you typed people dollar sign age: The age vector inside the data frame gets returned, so you end up with the age column. Likewise, you can use the double brackets notation with a name ... or with an index. In all cases, the result is a vector. You can also use single brackets to subset lists, but this generates a new list, containing only the specified elements. Take this command for example: The result is still a data frame, which is a list, but this time containing only the "age" element. This explains why before, this command gave a data frame instead of vector. Again, using single brackets or double brackets to subset data structures can have serious consequences, so always think about what you're dealing with and how you should handle it. Once you know how to correctly subset data frames, extending those data frames is pretty simple. Sometimes, you'll want to add a column, a new variable, to your data frame. Other times, it's also useful to add new rows, so new observations, to your data frame. To add a column, which actually comes down to adding a new element to the list, you can use the dollar sign or the double square brackets. Suppose you want to add a column `height`, the information of which is already in a vector `height`. This call ... Or this call ... Will do the trick. You can also use the `cbind()` function that you've learned to build and extend matrices. It works just the same for data.frames. To add a weight column, in kilograms, for example. If `cbind()` works, than surely `rbind()` will work fine as well. Indeed, you can use `rbind()` to add new rows to your observations. Suppose you want to add the information of another person, Tom, to the data frame. Simply creating a vector with the name, age, height etc, won't work, because a vector can't contain elements of different types. You'll have to create a new data frame containing only a single observation, and add that to the data frame using rbind. Let's call this mini data frame `tom`. Now, we can use `rbind()` to bind `people` and `tom` together: Wait, what? R throws an error. Names do not match previous names. This means that the names in `people` and `tom` do not match. We'll have to improve our definition of `tom` to make the merge successful: Now, `rbind()` will work as you'd want it to work. So adding a column to a data frame is pretty easy, but adding new observations requires some care.

Published by: DataCamp Published at: 8 years ago Category: آموزشی