Skip to content

Commit

Permalink
Merge pull request #3442 from programminghistorian/Issue-3441
Browse files Browse the repository at this point in the history
Issue-3441-cleaning-assets
  • Loading branch information
anisa-hawes authored Jan 17, 2025
2 parents ca98ac8 + 55bea08 commit 31976b3
Show file tree
Hide file tree
Showing 48 changed files with 56 additions and 10,195 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file removed assets/scissorsandpaste-master.zip
Binary file not shown.
10,138 changes: 0 additions & 10,138 deletions assets/sentiment-analysis-syuzhet/galdos_miau.txt

This file was deleted.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
5 changes: 2 additions & 3 deletions en/lessons/cleaning-data-with-openrefine.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ doi: 10.46430/phen0023




## Lesson goals

Don’t take your data at face value. That is the key message of this
Expand Down Expand Up @@ -144,7 +143,7 @@ as creating [Linked Data][].
OpenRefine works on all platforms: Windows, Mac, and Linux. *OpenRefine*
will open in your browser, but it is important to realise that the
application is run locally and that your data won't be stored online.
The data files are archived on the Programming Historian site: as [phm-collection][]. Please download the
The data files are archived on the Programming Historian site as [phm-collection][]. Please download the
*phm-collection.tsv* file before continuing.

On the *OpenRefine* start page, create a new project using the
Expand Down Expand Up @@ -413,7 +412,7 @@ the case you have made an error.
[Controlled vocabulary]: http://en.wikipedia.org/wiki/Controlled_vocabulary
[Linked Data]: http://en.wikipedia.org/wiki/Linked_data
[Download OpenRefine]: https://openrefine.org/download
[phm-collection]: /assets/phm-collection.tsv
[phm-collection]: /assets/cleaning-data-with-openrefine/phm-collection.tsv
[Powerhouse Museum Website]: /images/powerhouseScreenshot.png
[facet]: http://en.wikipedia.org/wiki/Faceted_search
[Screenshot of OpenRefine Example]: /images/overviewOfSomeClusters.png
Expand Down
4 changes: 2 additions & 2 deletions en/lessons/extracting-keywords.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The lesson touches on Regular Expressions, so some readers may find it handy to

The first step of this process is to take a look at the data that we will be using in the lesson. As mentioned, the data includes biographical details of approximately 6,692 graduates who began study at the University of Oxford in the early seventeenth century.

[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)
[The\_Dataset\_-\_Alumni_Oxonienses-Jas1.csv](/assets/extracting-keywords/The_Dataset_-_Alumni_Oxonienses-Jas1.csv) (1.4MB)

{% include figure.html filename="extracting-keywords-1.png" caption="Screenshot of the first forty entries in the dataset" %}

Expand Down Expand Up @@ -378,7 +378,7 @@ Before you re-run your Python code, you'll have to update your `texts.txt` file

I'd challenge you to make a few refinements to your gazetteer before moving ahead, just to make sure you have the hang of it.

Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.
Once you are happy with that, you can snag my [completed list of English and Welsh counties, shortforms, and various other cities (London, Bristol etc) and places (Jersey, Ireland, etc)](/assets/extracting-keywords/extracting-keywords-final-gazetteer.txt). My completed list contains 157 entries, and should get you all of the entries that can be extracted from the texts in this collection.

At this point you could stop, as you've achieved what you set out to do. This lesson taught you how to use a short Python program to search a fairly large number of texts for a set of keywords defined by you.

Expand Down
2 changes: 1 addition & 1 deletion en/lessons/from-html-to-list-of-words-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,4 +259,4 @@ that’s ok!
[Manipulating Strings in Python]: /lessons/manipulating-strings-in-python
[Code Reuse and Modularity]: /lessons/code-reuse-and-modularity
[zip]: /assets/python-lessons2.zip
[obo-t17800628-33.html]: /assets/obo-t17800628-33.html
[obo-t17800628-33.html]: /assets/from-html-to-list-of-words-1/obo-t17800628-33.html
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ def rom2ar(rom):

return result
```
(run <[this little script](/assets/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)
(run <[this little script](/assets/generating-an-ordered-data-set-from-an-OCR-text-file/Roman_to_Arabic.txt)> to see in detail how `rome2ar` works. Elegant programming like this can offer insight; like poetry.)

## Some other things we'll need:
At the top of your Python module, you're going to want to import some python modules that are a part of the standard library. (see Fred Gibbs's tutorial [*Installing Python Modules with pip*](/lessons/installing-python-modules-pip)).
Expand Down
10 changes: 5 additions & 5 deletions en/lessons/json-and-jq.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ These set various jq [command-line options, or _flags_](https://stedolan.github.

jq operates by way of _filters_: a series of text commands that you can string together, and which dictate how jq should transform the JSON you give it.

To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/jq_rkm.json)
To learn the basic jq filters, we'll work with a sample response from the Rijksmuseum API: [rkm.json](/assets/json-and-jq/jq_rkm.json)
Select all the text at that link, copy it, and paste it into the "JSON" box at [jq play] on the left hand side.


Expand Down Expand Up @@ -425,7 +425,7 @@ One of the easiest ways to search and download Twitter data is using the excelle

For this lesson, we will use a small sample of 50 public tweets.
Clear the "Filter", "JSON" and "Result" boxes on [jq play], and ensure all the checkboxes are unchecked.
[Then copy this sample Twitter data](/assets/jq_twitter.json) into [jq play].
[Then copy this sample Twitter data](/assets/json-and-jq/jq_twitter.json) into [jq play].

### One-to-many relationships: Tweet hashtags

Expand Down Expand Up @@ -895,7 +895,7 @@ You should get the following table:
"whiteprivilege",1
```

[There are multiple ways to solve this with jq. See my answer here.](/assets/filter_retweets.txt)
[There are multiple ways to solve this with jq. See my answer here.](/assets/json-and-jq/filter_retweets.txt)

#### Count total retweets per user

Expand All @@ -909,7 +909,7 @@ Hints:

As a way to verify your results, user `356854246` should have a total retweet count of `51` based on this dataset.

[See my answer.](/assets/count_retweets.txt)
[See my answer.](/assets/json-and-jq/count_retweets.txt)

## Using jq on the command line

Expand Down Expand Up @@ -959,7 +959,7 @@ This can be useful when downloading JSON with a utility like `wget` for retrievi
(See [Automated Downloading with Wget](/lessons/automated-downloading-with-wget) to learn the basics of this other command line program.)

```sh
wget -qO- http://programminghistorian.org/assets/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
wget -qO- http://programminghistorian.org/assets/json-and-jq/jq_rkm.json | jq -r '.artObjects[] | [.id, .title, .principalOrFirstMaker, .webImage.url] | @csv'
```

Note that you must use the `wget` flag `-qO-` in order to send the output of `wget` into `jq` by way of a shell pipe.
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/naive-bayesian.md
Original file line number Diff line number Diff line change
Expand Up @@ -1462,7 +1462,7 @@ Happy hunting!

[A Naive Bayesian in the Old Bailey]: http://digitalhistoryhacks.blogspot.com/2008/05/naive-bayesian-in-old-bailey-part-1.html
[Old Bailey digital archive]: http://www.oldbaileyonline.org/
[A zip file of the scripts]: /assets/baileycode.zip
[A zip file of the scripts]: /assets/naive-bayesian/baileycode.zip
[another zip file]: https://doi.org/10.5281/zenodo.13284
[BeautifulSoup]: http://www.crummy.com/software/BeautifulSoup/
[search interface]: http://www.oldbaileyonline.org/forms/formMain.jsp
Expand Down
2 changes: 1 addition & 1 deletion en/lessons/sentiment-analysis-syuzhet.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ library(tm)

## Load and Prepare the Text

Next, download a machine readable copy of the novel: [*Miau*](/assets/sentiment-analysis-syuzhet/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.
Next, download a machine readable copy of the novel: [*Miau*](/assets/analisis-de-sentimientos-r/galdos_miau.txt) and make sure to save it as a .txt file. When you open the file you will see that the novel is in [plain text](https://perma.cc/Z5WH-V9SW) format, which is essential for this particular analysis using R.

With the text at hand, you first need to load it into R as one long string so that you can work with it programmatically. Make sure to replace `FILEPATH` with the location of the novel on your own computer (don't just type 'FILEPATH'). This loading process is slightly different on Mac/Linux and Windows machines:

Expand Down
18 changes: 9 additions & 9 deletions en/lessons/sonification.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ You will see that 'sonification' moves us along the spectrum from mere 'visualiz

### Example Data

+ [Roman artefact data](/assets/sonification-roman-data.csv)
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification-diary.csv)
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification-jesuittopics.csv)
+ [Roman artefact data](/assets/sonification/sonification-roman-data.csv)
+ [Excerpt from the Topic model of John Adams' Diary](/assets/sonification/sonification-diary.csv)
+ [Excerpt from the Topic model of the Jesuit Relations](/assets/sonification/sonification-jesuittopics.csv)

# Some Background on Sonification

Expand Down Expand Up @@ -122,18 +122,18 @@ _There is no 'right' way to represent your data as sound_, at least not yet: but
But what about time? Historical data often has a punctuation point, a distinct 'time when' something occured. Thus, the amount of time between two data points has to be taken into account. This is where our next tool becomes quite useful, for when our data points have a relationship to one another in temporal space. We begin to move from sonfication (data points) to music (relationships between points).

### Practice
The [sample dataset](/assets/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.
The [sample dataset](/assets/sonification/sonification-roman-data.csv) provided contains counts of Roman coins in its first column and counts of other Roman materials from the same locations, as contained in the Portable Antiquities Scheme database from the British Museum. A sonification of this data might reveal or highlight aspects of the economic situation along Watling street, a major route through Roman Britain. The data points are organized geographically from North West to South East; thus as the sound plays out, we are hearing movement over space. Each note represents another stop along the way.

1. Open the[sonification-roman-data.csv](/assets/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
1. Open the[sonification-roman-data.csv](/assets/sonification/sonification-roman-data.csv) in a spreadsheet. Copy the first column into a text editor. Delete the line endings so that the data is all in a single row.
2. Add the following column information like so:
```
# Of Voices, Text Area Name, Text Area Data
1,morphBox,
,areaPitch1,
```
...so that your data follows immediately after that last comma (as like [this](/assets/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.
...so that your data follows immediately after that last comma (as like [this](/assets/sonification/sonification-romancoin-data-music.csv)). Save the file with a useful name like `coinsounds1.csv`.

3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
3. Go to the [Musicalgorithms](http://musicalgorithms.org/3.0/index.html) site (version 3), and hit the load button. In the pop-up, click the blue 'load' button and select the file saved in step 2. The site will load your materials and display a green check mark if it loaded successfully. If it did not, make sure that your values are separated by spaces, and that they follow immediately the last comma in the code block in step 2. You may also try loading up the [demo file for this tutorial](/assets/sonification/sonification-romancoin-data-music.csv) instead.{% include figure.html filename="sonification-musicalgorithms-upload-4.png" caption="Click 'load' on the main screen to get this dialogue box. Then 'load csv'. Select your file; it will appear in the box. Then click the bottom load button." %}
4. Click on 'Pitch Input'. You'll see the values of your data. For now, **do not select** any further options on this page (thus using the site's default values).
5. Click on 'Duration Input'. **Do not select any options here for now**. The options here will map various transformations against your data that will alter the duration for each note. Do not worry about these options for now; move on.
6. Click on 'Pitch Mapping'. This is the most crucial choice, as it will transform (that is, scale) your raw data to a mapping against the keys of the keyboard. Leave the `mapping` set to 'division'. (The other options are modulo or logarithmic). The option `Range` 1 to 88 uses the full 88 keys of the keyboard; thus your lowest value would accord to the deepest note on the piano and your highest value with the highest note. You might wish instead to constrain your music around middle C, so enter 25 to 60 as your range. The output should change to: `31,34,34,34,25,28,30,60,28,25,26,26,25,25,60,25,25,38,33,26,25,25,25` These are no longer your counts; they are notes on the keyboard.{% include figure.html filename="sonification-musicalgorithms-settings-for-pitch-mapping-5.png" caption="Click into the 'range' box and set it to 25. The values underneath will change automatically. Click into the 'to' box and set it to 60. Click back into the other box; the values will update." %}
Expand Down Expand Up @@ -244,7 +244,7 @@ Can you make your computer play this song? (This [chart](https://web.archive.org

### Getting your own data in

[This file](/assets/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.
[This file](/assets/sonification/sonification-diary.csv) is a selection from the topic model fitted to John Adams' Diaries for[The Macroscope](http://themacroscope.org). Only the strongest signals have been preserved by rounding the values in the columns to two decimal places (remembering that .25 for instance would indicate that that topic is contributing to a quarter of that diary entry's composition). To get this data into your python script, it has to be formatted in a particular away. The tricky bit is getting the date field right.

_For the purposes of this tutorial, we are going to leave the names of variables and so on unchanged from the sample script. The sample script was developed with earthquake data in mind; so where it says 'magnitude' we can think of it as equating to '% topic composition.'_

Expand Down Expand Up @@ -375,7 +375,7 @@ Why would you want to do this? As has progressively become clear in tutorial, wh

Here, I offer simply a code snippet that will allow you to import your data, where your data is simply a list of values saved as csv. I am indebted to George Washington University librarian Laura Wrubel who posted to [gist.github.com](https://gist.github.com/lwrubel) her experiments in sonifying her library's circulation transactions.

In this [sample file](/assets/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.
In this [sample file](/assets/sonification/sonification-jesuittopics.csv)(a topic model generated from the [Jesuit Relations](http://puffin.creighton.edu/jesuit/relations/)), there are two topics. The first row contains the headers: topic1, topic2.

### Practice

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -331,8 +331,7 @@ small image) to your folder, and add the following somewhere in the body
of the text: `![image caption](your_image.jpg)`.

At this point, your `main.md` should look something like the following.
You can download this sample .md file
[here](https://raw.githubusercontent.com/programminghistorian/jekyll/gh-pages/assets/sample.md).
You can download this sample Markdown file from the _Programming Historian_ repository.

---
title: Plain Text Workflow
Expand Down
3 changes: 2 additions & 1 deletion es/lecciones/administracion-de-datos-en-r.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ Copia el siguiente código en R Studio. Para ejecutarlo tienes que marcar las l
```

## Un ejemplo de dplyr en acción
Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/data-wrangling-and-management-in-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Veamos un ejemplo de cómo dyplr nos puede ayudar a los historiadores. Vamos a cargar los datos del censo decenal de 1790 a 2010 de Estados Unidos. Descarga los datos haciendo [click aquí](/assets/administracion-de-datos-en-r/ejemplo_introductorio_estados.csv)[^2] y ponlos en la carpeta que vas a utilizar para trabajar en los ejemplos de este tutorial.

Como los datos están en un archivo CSV, vamos a usar el comando de lectura ```read_csv()``` en el paquete [readr](https://cran.r-project.org/web/packages/readr/vignettes/readr.html) de "tidyverse".

Expand Down
Loading

0 comments on commit 31976b3

Please sign in to comment.