Merge pull request #3442 from programminghistorian/Issue-3441

Issue-3441-cleaning-assets
programminghistorian · Jan 17, 2025 · 31976b3 · 31976b3
2 parents ca98ac8 + 55bea08
commit 31976b3
Show file tree

Hide file tree

Showing 48 changed files with 56 additions and 10,195 deletions.
diff --git a/...nt-in-r/ejemplo_introductorio_estados.csv → ...os-en-r/ejemplo_introductorio_estados.csv b/...nt-in-r/ejemplo_introductorio_estados.csv → ...os-en-r/ejemplo_introductorio_estados.csv
diff --git a/assets/domCasmurro.txt → ...lise-sentimento-R-syuzhet/domCasmurro.txt b/assets/domCasmurro.txt → ...lise-sentimento-R-syuzhet/domCasmurro.txt
diff --git a/assets/galdos_miau.txt → ...nalisis-de-sentimientos-r/galdos_miau.txt b/assets/galdos_miau.txt → ...nalisis-de-sentimientos-r/galdos_miau.txt
diff --git a/assets/teste.md → ...el-texto-simples-pandoc-markdown/teste.md b/assets/teste.md → ...el-texto-simples-pandoc-markdown/teste.md
diff --git a/assets/phm-collection.tsv → ...g-data-with-openrefine/phm-collection.tsv b/assets/phm-collection.tsv → ...g-data-with-openrefine/phm-collection.tsv
diff --git a/.../The_Dataset_-_Alumni_Oxonienses-Jas1.csv → .../The_Dataset_-_Alumni_Oxonienses-Jas1.csv b/.../The_Dataset_-_Alumni_Oxonienses-Jas1.csv → .../The_Dataset_-_Alumni_Oxonienses-Jas1.csv
diff --git a/...s/extracting-keywords-final-gazetteer.txt → ...s/extracting-keywords-final-gazetteer.txt b/...s/extracting-keywords-final-gazetteer.txt → ...s/extracting-keywords-final-gazetteer.txt
diff --git a/assets/obo-t17800628-33.html → ...-to-list-of-words-1/obo-t17800628-33.html b/assets/obo-t17800628-33.html → ...-to-list-of-words-1/obo-t17800628-33.html
diff --git a/assets/Roman_to_Arabic.txt → ...from-an-OCR-text-file/Roman_to_Arabic.txt b/assets/Roman_to_Arabic.txt → ...from-an-OCR-text-file/Roman_to_Arabic.txt
diff --git a/assets/chiffres_romains_arabes.txt → ...ees-texte-ocr/chiffres_romains_arabes.txt b/assets/chiffres_romains_arabes.txt → ...ees-texte-ocr/chiffres_romains_arabes.txt
diff --git a/assets/map-warper.csv → ...ts/introduccion-map-warper/map-warper.csv b/assets/map-warper.csv → ...ts/introduccion-map-warper/map-warper.csv
diff --git a/assets/count_retweets.txt → assets/json-and-jq/count_retweets.txt b/assets/count_retweets.txt → assets/json-and-jq/count_retweets.txt
diff --git a/assets/filter_retweets.txt → assets/json-and-jq/filter_retweets.txt b/assets/filter_retweets.txt → assets/json-and-jq/filter_retweets.txt
diff --git a/assets/jq_rkm.json → assets/json-and-jq/jq_rkm.json b/assets/jq_rkm.json → assets/json-and-jq/jq_rkm.json
diff --git a/assets/jq_twitter.json → assets/json-and-jq/jq_twitter.json b/assets/jq_twitter.json → assets/json-and-jq/jq_twitter.json
diff --git a/assets/baileycode.zip → assets/naive-bayesian/baileycode.zip b/assets/baileycode.zip → assets/naive-bayesian/baileycode.zip
diff --git a/assets/scissorsandpaste-master.zip b/assets/scissorsandpaste-master.zip
diff --git a/assets/sentiment-analysis-syuzhet/galdos_miau.txt b/assets/sentiment-analysis-syuzhet/galdos_miau.txt
diff --git a/assets/sonification-diary.csv → assets/sonification/sonification-diary.csv b/assets/sonification-diary.csv → assets/sonification/sonification-diary.csv
diff --git a/assets/sonification-jesuittopics.csv → ...onification/sonification-jesuittopics.csv b/assets/sonification-jesuittopics.csv → ...onification/sonification-jesuittopics.csv
diff --git a/assets/sonification-roman-data.csv → .../sonification/sonification-roman-data.csv b/assets/sonification-roman-data.csv → .../sonification/sonification-roman-data.csv
diff --git a/assets/sonification-romancoin-data-music.csv → ...ion/sonification-romancoin-data-music.csv b/assets/sonification-romancoin-data-music.csv → ...ion/sonification-romancoin-data-music.csv
diff --git a/assets/sample.md → ...-text-using-pandoc-and-markdown/sample.md b/assets/sample.md → ...-text-using-pandoc-and-markdown/sample.md
diff --git a/assets/ensayos-jose-marti.zip → ...-modeling-y-mallet/ensayos-jose-marti.zip b/assets/ensayos-jose-marti.zip → ...-modeling-y-mallet/ensayos-jose-marti.zip
diff --git a/en/lessons/cleaning-data-with-openrefine.md b/en/lessons/cleaning-data-with-openrefine.md
@@ -30,7 +30,6 @@ doi: 10.46430/phen0023
 
 
 
-
 ## Lesson goals
 
 Don’t take your data at face value. That is the key message of this
@@ -144,7 +143,7 @@ as creating [Linked Data][].
 OpenRefine works on all platforms: Windows, Mac, and Linux. *OpenRefine*
 will open in your browser, but it is important to realise that the
 application is run locally and that your data won't be stored online.
-The data files are archived on the Programming Historian site: as [phm-collection][]. Please download the
+The data files are archived on the Programming Historian site as [phm-collection][]. Please download the
 *phm-collection.tsv* file before continuing.
 
 On the *OpenRefine* start page, create a new project using the
@@ -413,7 +412,7 @@ the case you have made an error.
   [Controlled vocabulary]: http://en.wikipedia.org/wiki/Controlled_vocabulary
   [Linked Data]: http://en.wikipedia.org/wiki/Linked_data
   [Download OpenRefine]: https://openrefine.org/download
-  [phm-collection]: /assets/phm-collection.tsv
+  [phm-collection]: /assets/cleaning-data-with-openrefine/phm-collection.tsv
   [Powerhouse Museum Website]: /images/powerhouseScreenshot.png
   [facet]: http://en.wikipedia.org/wiki/Faceted_search
   [Screenshot of OpenRefine Example]: /images/overviewOfSomeClusters.png

diff --git a/en/lessons/extracting-keywords.md b/en/lessons/extracting-keywords.md
@@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to
 
 The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century.
 
-[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
+[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
 
 {% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %}
 
@@ -378,7 +378,7 @@ Before you re-run your Python code, you'll have to update your `texts.txt` file
 
 I'd challenge you to make a few refinements to your gazetteer before moving ahead, just to make sure you have the hang of it.
 
-Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
+Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
 
 At this point you could stop, as you've achieved what you set out to do. This lesson taught you how to use a short Python program to search a fairly large number of texts for a set of keywords defined by you.
 

diff --git a/en/lessons/from-html-to-list-of-words-1.md b/en/lessons/from-html-to-list-of-words-1.md
@@ -259,4 +259,4 @@ that’s ok!
   [Manipulating Strings in Python]: /lessons/manipulating-strings-in-python
   [Code Reuse and Modularity]: /lessons/code-reuse-and-modularity
   [zip]: /assets/python-lessons2.zip
-  [obo-t17800628-33.html]: /assets/obo-t17800628-33.html
+  [obo-t17800628-33.html]: /assets/from-html-to-list-of-words-1/obo-t17800628-33.html
diff --git a/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md b/en/lessons/generating-an-ordered-data-set-from-an-OCR-text-file.md
@@ -221,7 +221,7 @@ def rom2ar(rom):
 
     return result
 ```
-(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
+(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
 
 ## Some other things we'll need:
 At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)).

diff --git a/en/lessons/json-and-jq.md b/en/lessons/json-and-jq.md
@@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github.
 
 jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it.
 
-To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json)
+To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/jq_rkm.json)
 Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side.
 
 
@@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle
 
 For this lesson, we will use a small sample of 50 public tweets.
 Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked.
-[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play].
+[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play].
 
 ### One-to-many relationships: Tweet hashtags
 
@@ -895,7 +895,7 @@ You should get the following table:
 "whiteprivilege",1
 ```
 
-[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt)
+[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt)
 
 #### Count total retweets per user
 
@@ -909,7 +909,7 @@ Hints:
 
 As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset.
 
-[See my answer.](/assets/count_retweets.txt)
+[See my answer.](/assets/json-and-jq/count_retweets.txt)
 
 ## Using jq on the command line
 
@@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi
 (See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.)
 
 ```sh
-wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
+wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
 ```
 
 Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe.

diff --git a/en/lessons/naive-bayesian.md b/en/lessons/naive-bayesian.md
@@ -1462,7 +1462,7 @@ Happy hunting!
 
   [A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html
   [Old Bailey digital archive]: http://www.oldbaileyonline.org/
-  [A zip file of the scripts]: /assets/baileycode.zip
+  [A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip
   [another zip file]: https://doi.org/10.5281/zenodo.13284
   [BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/
   [search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp

diff --git a/en/lessons/sentiment-analysis-syuzhet.md b/en/lessons/sentiment-analysis-syuzhet.md
@@ -245,7 +245,7 @@ library(tm)
 
 ## Load and Prepare the Text
 
-Next, download a machine readable copy of the novel: [*Miau*](/assets/sentiment-analysis-syuzhet/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
+Next, download a machine readable copy of the novel: [*Miau*](/assets/analisis-de-sentimientos-r/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
 
 With the text at hand, you first need to load it into R as one long string so that you can work with it programmatically. Make sure to replace `FILEPATH` with the location of the novel on your own computer (don't just type 'FILEPATH'). This loading process is slightly different on Mac/Linux and Windows machines:  
 

diff --git a/en/lessons/sonification.md b/en/lessons/sonification.md
@@ -52,9 +52,9 @@ You will see that 'sonification' moves us along the spectrum from mere 'visualiz
 
 ### Example Data
 
-+ [Roman artefact data](/assets/sonification-roman-data.csv)
-+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification-diary.csv)
-+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification-jesuittopics.csv)
++ [Roman artefact data](/assets/sonification/sonification-roman-data.csv)
++ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification/sonification-diary.csv)
++ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification/sonification-jesuittopics.csv)
 
 # Some Background on Sonification
 
@@ -122,18 +122,18 @@ _There is no 'right' way to represent your data as sound_, at least not yet: but
 But what about time? Historical data often has a punctuation point, a distinct 'time when' something occured. Thus, the amount of time between two data points has to be taken into account. This is where our next tool becomes quite useful, for when our data points have a relationship to one another in temporal space. We begin to move from sonfication (data points) to music (relationships between points).
 
 ### Practice
-The [sample dataset](/assets/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
+The [sample dataset](/assets/sonification/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
 
-1. Open the[sonification-roman-data.csv](/assets/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
+1. Open the[sonification-roman-data.csv](/assets/sonification/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
 2. Add the following column information like so:
 ```
 # Of Voices, Text Area Name, Text Area Data
 1,morphBox,
 ,areaPitch1,
 ```
-...so that your data follows immediately after that last comma (as like [this](/assets/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
+...so that your data follows immediately after that last comma (as like [this](/assets/sonification/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
 
-3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
+3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
 4. Click on 'Pitch Input'. You'll see the values of your data. For now, **do not select** any further options on this page (thus using the site's default values).
 5. Click on 'Duration Input'. **Do not select any options here for now**. The options here will map various transformations against your data that will alter the duration for each note. Do not worry about these options for now; move on.
 6. Click on 'Pitch Mapping'. This is the most crucial choice, as it will transform (that is, scale) your raw data to a mapping against the keys of the keyboard. Leave the `mapping` set to 'division'.  (The other options are modulo or logarithmic). The option `Range` 1 to 88 uses the full 88 keys of the keyboard; thus your lowest value would accord to the deepest note on the piano and your highest value with the highest note. You might wish instead to constrain your music around middle C, so enter 25 to 60 as your range. The output should change to: `31,34,34,34,25,28,30,60,28,25,26,26,25,25,60,25,25,38,33,26,25,25,25` These are no longer your counts; they are notes on the keyboard.{% include figure.html filename="sonification-musicalgorithms-settings-for-pitch-mapping-5.png" caption="Click into the 'range' box and set it to 25. The values underneath will change automatically. Click into the 'to' box and set it to 60. Click back into the other box; the values will update." %}
@@ -244,7 +244,7 @@ Can you make your computer play this song? (This [chart](https://web.archive.org
 
 ### Getting your own data in
 
-[This file](/assets/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
+[This file](/assets/sonification/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
 
 _For the purposes of this tutorial, we are going to leave the names of variables and so on unchanged from the sample script. The sample script was developed with earthquake data in mind; so where it says 'magnitude' we can think of it as equating to '% topic composition.'_
 
@@ -375,7 +375,7 @@ Why would you want to do this? As has progressively become clear in tutorial, wh
 
 Here, I offer simply a code snippet that will allow you to import your data, where your data is simply a list of values saved as csv. I am indebted to George Washington University librarian Laura Wrubel who posted to [gist.github.com](https://gist.github.com/lwrubel) her experiments in sonifying her library's circulation transactions.
 
-In this [sample file](/assets/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
+In this [sample file](/assets/sonification/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
 
 ### Practice
 

diff --git a/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md b/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md
@@ -331,8 +331,7 @@ small image) to your folder, and add the following somewhere in the body
 of the text: `![image caption](your_image.jpg)`.
 
 At this point, your `main.md` should look something like the following.
-You can download this sample .md file
-[here](https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/sample.md).
+You can download this sample Markdown file from the _Programming Historian_ repository.
 
     ---
     title: Plain Text Workflow

diff --git a/es/lecciones/administracion-de-datos-en-r.md b/es/lecciones/administracion-de-datos-en-r.md
@@ -78,7 +78,8 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
 ```
 
 ## Un ejemplo de dplyr en acción
-Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
+
+Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.
 
 Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".