Skip to content

Commit

Permalink
Merge pull request #3430 from programminghistorian/publish-gestion-ma…
Browse files Browse the repository at this point in the history
…nipulation-donnees-r

Publish /fr/lecons/gestion-manipulation-donnees-r
  • Loading branch information
anisa-hawes authored Jan 8, 2025
2 parents 66044fa + aac1dbc commit 59374e9
Show file tree
Hide file tree
Showing 28 changed files with 522 additions and 23 deletions.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion en/lessons/beginners-guide-to-twitter-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ At this point, your data has gone from the long list of single tweet IDs to a ro

Each tweet now has lots of useful metadata, including the time created, the included hashtags, number of retweets and favorites, and some geo info. One can imagine how this information can be used for a wide variety of explorations, including to map discourse around an issue on social media, explore the relationship between sentiment and virality, or even text analysis of language of the tweets.

All of these processes will probably include some light data work to format this dataset so that you can produce useful insights: [statistical analyses](/en/lessons/data-wrangling-and-management-in-R), [maps](/en/lessons/mapping-with-python-leaflet), [social network analyses](/en/lessons/exploring-and-analyzing-network-data-with-python), [discourse analyses](/en/lessons/corpus-analysis-with-antconc). But regardless of where you go from here, you have a pretty robust dataset that can be used for a variety of academic pursuits.
All of these processes will probably include some light data work to format this dataset so that you can produce useful insights: [statistical analyses](/en/lessons/data-wrangling-and-management-in-r), [maps](/en/lessons/mapping-with-python-leaflet), [social network analyses](/en/lessons/exploring-and-analyzing-network-data-with-python), [discourse analyses](/en/lessons/corpus-analysis-with-antconc). But regardless of where you go from here, you have a pretty robust dataset that can be used for a variety of academic pursuits.

You might have noticed we didn't get any latitude/longitude location information, but we did get a "place" column with less exact, textualized location information. Non-coordinate location data needs to be [geocoded](https://en.wikipedia.org/wiki/Geocode), which in this case means using a geocoder to [geoparse](https://en.wikipedia.org/wiki/Toponym_Resolution#Geoparsing) the reported locations and assign lat/long values to them. Different programs do this to greater or lesser success. [Tableau](https://www.tableau.com), for instance, has a hard time interpolating a set of locations if it's not at a consistent geographical level (city, state, etc.). For that reason, I generated latitude and longitude information with the Google geocoder following this *Programming Historian* [lesson](/en/lessons/mapping-with-python-leaflet), and then inputted that information into Tableau for mapping. There's plenty of good mapping [tools](https://digitalfellows.commons.gc.cuny.edu/2019/06/03/finding-the-right-tools-for-mapping/) out there that you can feel free to use: the key here is getting specific, accurate location information from the list of place names in the dataset.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Data Wrangling and Management in R
slug: data-wrangling-and-management-in-R
slug: data-wrangling-and-management-in-r
layout: lesson
collection: lessons
authors:
Expand Down Expand Up @@ -126,7 +126,7 @@ An Example of dplyr in Action
Let's go through an example to see how dplyr can aid us as historians by
inputting U.S. decennial census data from 1790 to 2010. Download the
data by [clicking
here](/assets/introductory_state_example.csv)
here](/assets/data-wrangling-and-management-in-r/introductory_state_example.csv)
and place it in the folder that you will use to work through the examples
in this tutorial.

Expand Down Expand Up @@ -164,7 +164,7 @@ time.
geom_line() +
geom_point()

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-01.png" caption="Graph of California and New York population" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-01.png" caption="Graph of California and New York population" %}

As we can see, the population of California has grown considerably
compared to New York. While this particular example may seem obvious
Expand All @@ -182,7 +182,7 @@ with two different states such as Mississippi and Virginia.
geom_line() +
geom_point()

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-02.png" caption="Graph of Mississippi and Virginia population" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-02.png" caption="Graph of Mississippi and Virginia population" %}

Quickly making changes to our code and reanalyzing our data is a
fundamental part of exploratory data analysis (EDA). Rather than trying
Expand Down Expand Up @@ -579,7 +579,7 @@ colleges founded before the U.S. War of 1812:
geom_bar(aes(x=is_secular, fill=is_secular))+
labs(x="Is the college secular?")

{% include figure.html filename="en-or-data-wrangling-and-management-in-R-03.png" caption="Number of secular and non-secular colleges before War of 1812" %}
{% include figure.html filename="en-or-data-wrangling-and-management-in-r-03.png" caption="Number of secular and non-secular colleges before War of 1812" %}

Again, by making a quick change to our code, we can also look at the
number of secular versus non-secular colleges founded after the start of
Expand All @@ -593,7 +593,7 @@ the War of 1812:
geom_bar(aes(x=is_secular, fill=is_secular))+
labs(x="Is the college secular?")

({% include figure.html filename="en-or-data-wrangling-and-management-in-R-04.png" caption="Number of secular and non-secular colleges after War of 1812" %}
({% include figure.html filename="en-or-data-wrangling-and-management-in-r-04.png" caption="Number of secular and non-secular colleges after War of 1812" %}

Conclusion
==========
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/geospatial-data-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Now we have a large dataframe called `County_Aggregate_Data` which has our count
```r
religion <- read.csv("./data/Religion/Churches.csv", as.is=TRUE)
```
Depending on the state of the data you may need to do some data transformations in order to merge it back with the DataFrame. For complex transformations, see tutorials in R on working with data such as [Data Wrangling and Management in R tutorial](/en/lessons/data-wrangling-and-management-in-R) [data transforms](http://r4ds.had.co.nz/transform.html). In essence, you need to have a common field in both datasets to merge upon. Often this is a geographic id for the county and state represented by `GEOID`. It could also be the unique FIPS Code given by the US Census. Below I am using state and county `GEOID`. In this example, we are converting one data frame's common fields to numeric so that they match the variable type of the other dataframe:
Depending on the state of the data you may need to do some data transformations in order to merge it back with the DataFrame. For complex transformations, see tutorials in R on working with data such as [Data Wrangling and Management in R tutorial](/en/lessons/data-wrangling-and-management-in-r) [data transforms](http://r4ds.had.co.nz/transform.html). In essence, you need to have a common field in both datasets to merge upon. Often this is a geographic id for the county and state represented by `GEOID`. It could also be the unique FIPS Code given by the US Census. Below I am using state and county `GEOID`. In this example, we are converting one data frame's common fields to numeric so that they match the variable type of the other dataframe:

```r
religion$STATEFP <- religion$STATE
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/sentiment-analysis-syuzhet.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Although the lesson is not intended for advanced R users, it is expected that yo

* Taylor Arnold and Lauren Tilton, '[Basic Text Processing in R](/en/lessons/basic-text-processing-in-r)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0061
* Taryn Dewar, '[R Basics with Tabular Data](/en/lessons/r-basics-with-tabular-data)', *Programming Historian* 5 (2016), https://doi.org/10.46430/phen0056
* Nabeel Siddiqui, '[Data Wrangling and Management in R](/en/lessons/data-wrangling-and-management-in-R)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0063
* Nabeel Siddiqui, '[Data Wrangling and Management in R](/en/lessons/data-wrangling-and-management-in-r)', *Programming Historian* 6 (2017), https://doi.org/10.46430/phen0063

You may also be interested in other sentiment analysis lessons:

Expand Down
4 changes: 2 additions & 2 deletions en/lessons/shiny-leaflet-newspaper-map-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ In this lesson, you will learn:
- The concept and practice of 'reactive programming', as implemented by Shiny applications. Specifically, you'll learn how you can use Shiny to 'listen' for certain inputs, and how they are connected to outputs displayed in your app.

<div class="alert alert-info">
Note that this lesson doesn't teach any coding in R, other than what's necessary to create the web application, nor does it cover publishing the finished application to the web. A basic knowledge of R, particularly using the <a href='/en/lessons/data-wrangling-and-management-in-R'>tidyverse</a>, would be very useful.
Note that this lesson doesn't teach any coding in R, other than what's necessary to create the web application, nor does it cover publishing the finished application to the web. A basic knowledge of R, particularly using the <a href='/en/lessons/data-wrangling-and-management-in-r'>tidyverse</a>, would be very useful.
</div>

### Graphical User Interfaces and the Digital Humanities
Expand Down Expand Up @@ -108,7 +108,7 @@ First, however, you need to set up the correct programming environment and creat

To get started with this tutorial, you should install the latest versions of [R](https://cran.rstudio.com/) and [Rstudio](https://www.rstudio.com/products/rstudio/download/) on your local machine. The R programming language has a very popular IDE (Integrated Development Environment) called RStudio, which is often used alongside R, as it provides a large set of features to make coding in the language more convenient. We'll use RStudio throughout the lesson.

Previous *Programming Historian* lessons have covered [working with R](/en/lessons/r-basics-with-tabular-data) and [working with the tidyverse](/en/lessons/data-wrangling-and-management-in-R). It would be useful to go through these lessons beforehand, to learn the basics of installing R and using the tidyverse for data wrangling.
Previous *Programming Historian* lessons have covered [working with R](/en/lessons/r-basics-with-tabular-data) and [working with the tidyverse](/en/lessons/data-wrangling-and-management-in-r). It would be useful to go through these lessons beforehand, to learn the basics of installing R and using the tidyverse for data wrangling.

### Create a new RStudio Project

Expand Down
4 changes: 2 additions & 2 deletions es/lecciones/administracion-de-datos-en-r.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ translation-reviewer:
- Victor Gayol
review-ticket: https://github.com/programminghistorian/ph-submissions/issues/199
layout: lesson
original: data-wrangling-and-management-in-R
original: data-wrangling-and-management-in-r
difficulty: 2
activity: transforming
topics: [data-manipulation, data-management, distant-reading, r, data-visualization]
Expand Down Expand Up @@ -78,7 +78,7 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
```

## Un ejemplo de dplyr en acción
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".

Expand Down
Loading

0 comments on commit 59374e9

Please sign in to comment.