-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBasic_Bash_Tutorial.Rmd
306 lines (227 loc) · 15.6 KB
/
Basic_Bash_Tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---
title: "Basic Bash Tutorial"
author: "Author: Linton Freund ([email protected])"
date: "Last update: `r format(Sys.time(), '%d %B, %Y')`"
output:
html_document:
toc: true # table of content true
toc_depth: 3 # upto three depths of headings (specified by #, ## and ###)
number_sections: yes ## if you want number sections at each table header
code_folding: show
collapsed: yes
highlight: pygments # specifies the syntax highlighting style
smooth_scroll: yes
theme: cosmo # many options for theme, see here: https://www.datadreaming.org/post/r-markdown-theme-gallery/
toc_float: yes
fontsize: 15pt
always_allow_html: yes
---
<style type="text/css">
body{
font-size: 13pt;
}
</style>
<!--
Render from R:
rmarkdown::render("Basic_Bash_Tutorial.Rmd", clean=TRUE, output_format="html_document")
R
Rendering from the command-line. To render to PDF format, use the argument setting: output_format="pdf_document".
$ Rscript -e "rmarkdown::render('Basic_Bash_Tutorial.Rmd', output_format='html_document', clean=TRUE)"
-->
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Welcome
This tutorial is a collection of helpful Bash commands as well as some different loops that you can use for your own scripts. These tricks are things I have picked up along the way, and admittedly, things I often forget if I am not using them. If you are overwhelmed by this, or forget certain things, do not stress! Google is your best friend when it comes to coding, and you can always find help that way.
This tutorial uses two text files (XCZO_MapFile_copy.txt, codon_table_copy.txt) that you need to follow along. You can download these files [here](https://github.com/hlfreund/workflows.github.io/tree/main/Basic_Bash_Tutorial_Files).
However, you can use these commands with any and all files at your disposal - just be sure not to delete any of your files without triple checking, or making a copy!
## What is Bash?
Simply, bash is a coding language that we can use to interact with our operating system, particularly a Unix-based operating system (OS). Unix and Linux operating systems are similar, so you can use bash to interact with these types of OS.
We use bash in the **shell** or **command-line interpreter** to tell the computer what we need it to do. On a Mac, we usually use the Terminal ap as our shell, whereas on Windows you can use the command prompt app as your shell.
For more information on Bash, check out its Wiki page [here](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) and look into this helpful bash/linux tutorial [here](https://linuxconfig.org/bash-scripting-tutorial-for-beginners).
# A Crash Course in Bash
Let's start with some simple basics and work our way up! Just to reiterate, these are JUST the basics. Please use this tutorial as a jumping off point for learning more about bash.
## The Basics
`pwd` will print the directory we are currently in. `ls` prints everything that is in our current directory. Find more options for `ls` [here](https://linux.die.net/man/1/ls).
```{bash where_are_you, eval=FALSE}
pwd # print current working directory, aka where you are in your computer
```
<center>
![](Basic_Bash_Tutorial_Files/pwd_example.png)
</center>
<div align="center">Figure 1a: Example of `pwd` output</div></br>
```{bash listing,eval=FALSE}
ls -l # list items in your current directory in a long list format
ls -ltr # list items in your current directory in long list format, reverse sort by time
```
<center>
![](Basic_Bash_Tutorial_Files/ls_l_example.png)
</center>
<div align="center">Figure 1b: Example of `ls -l` output</div></br>
<center>
![](Basic_Bash_Tutorial_Files/ls_ltr_example.png)
</center>
<div align="center">Figure 1b: Example of `ls -ltr` output</div></br>
Let's make a directory called `test_dir` and get into that directory. From there, we can create a text file called `test.txt` and read this text file in the terminal.
```{bash make_read_remove, eval=FALSE}
mkdir test_dir # mkdir creates a directory
cd test_dir # cd stands for change directory
touch test.txt # touch is a command used for creating a file (not editing)
nano test.txt # nano is a command used to create OR edit a file, in this case we are editing test.txt
cat test.txt # cat is how to read a file
rm test.txt # how to delete a file(s) from your directory
```
<center>
![](Basic_Bash_Tutorial_Files/nano_example.png)
</center>
<div align="center">Figure 2: `nano` File edit screen for file called "test.txt"</div></br>
To go back to your previous directory in your **path** (aka folder/you/are/in), do the following
```{bash cd_bac, eval=FALSE}
cd .. # change directory to previous directory in path
pwd
```
Now let's try some other commands we can use to seeing what is in a file. `head` can show us the first 10 lines of a file, and `tail` can show us the last 10 lines of a file.
```{bash heads_tails, eval=FALSE}
head XCZO_MapFile_copy.txt # read first 10 lines of file
head n -15 XCZO_MapFile_copy.txt # read first 15 lines of file
tail XCZO_MapFile_copy.txt # read last 10 lines of file; uses -n in same way that head does
```
<center>
![](Basic_Bash_Tutorial_Files/head_xczo_file_example.png)
</center>
<div align="center">Figure 3: Head of "XCZO_MapFile_copy.txt". Full file could not be fit into image.</div></br>
We can also use the `more` function to read files page by page, using the *spacebar* to go to each page. Or, we can use the `less` function to read the files line by line, using the up/down arrows to navigate. These functions are very similar, however, less is a bit faster because it does not load the entire file as you read it.
```{bash more_less_files, eval=FALSE}
more XCZO_MapFile_copy.txt # read file by page. Use "space" to go to next page.
# to quit, just type "q" and enter
less XCZO_MapFile_copy.txt # read file by line. Use arrows to navigate up or down the page. Has more options than more
# to quit, just type "q" and enter
```
We can also move the file around or rename the file with the `mv` command, and make a copy of this file using the `cp` command.
```{bash move_move, eval=FALSE}
mkdir newdir # we will move our file into this directory
mv XCZO_MapFile_copy.txt newdir # move file into newdir
cd newdir
ls -ltr # what is in our directory?
mv XCZO_MapFile_copy.txt XCZO_Map.txt # rename file XCZO_MapFile_copy.txt to XCZO_Map.txt with mv
ls -ltr # sanity check - always good practice to see if your previous step worked!
cd ..
cp newdir/XCZO_Map.txt . # copy file in newdir to current working directory, represented by "."
cp newdir/* . # copy ALL files in newdir to current directory; "*" represents every file in newdir
# sanity check - did our copy just work?
cd newdir
ls -ltr
cd ..
```
## Loops
These commands are great, but what if you need to do the same action on hundreds of files!? For example, what if you are reading a file of amplicon sequences and want to cound the number of sequences in each file, and save that information? That is when we can use the magic of loops in our scripts! Loops are not specific to bash, but you will find that a lot of languages (like R & Python) model their loop structures after the loops in bash.
All the helpful flowcharts included below for each loop type come from [ZenFlowchart](https://www.zenflowchart.com/blog/for-loop-flowchart).
### While Read Loops
If we want to read every line in the file, we can create a `while loop` to run through each line of the file, and use the `echo` command to print out the contents of each line.
The logic of the while loop is the following: while read x (line, string, etc), do x.
<center>
![](Basic_Bash_Tutorial_Files/while-loop-example.png)
</center>
<div align="center">Figure 4a: While Loop Flowchart. [source](https://www.zenflowchart.com/blog/for-loop-flowchart)</div></br>
```{bash while_echo, eval=FALSE}
while read line
do
echo $line # prints every line, defined by $line variable
done < codon_table_copy.txt # this is how we feed the input file into the while read loop
```
### For Loops
If we want to be just a tad fancy, we can use a `for loop` and the `echo` command to print everything in our directory. We can loop through every file in our current directory by using the `*` as a wildcard. This `*` is an example of what we call a **regular expression** (aka regex) - basically, the `*` can be used to as a wildcard character to represent everything, OR it can be used as the asterisk itself.
Another example of a regex is the `.`, which can indicate both our current working directory, characters we are editing, or as a period itself.
To review, the logic of the for loop is the following: for x (a file, a line, anything you want) in this, do the following. This will loop repeteadly until all of the actions in the for statement are done.
<center>
![](Basic_Bash_Tutorial_Files/for-loop-flowchart-example.png)
</center>
<div align="center">Figure 4b: For Loop Flowchart. [source](https://www.zenflowchart.com/blog/for-loop-flowchart)</div></br>
```{bash for_echo, eval=FALSE}
for FILE in *; # for every FILE in our directory; "*" is a wildcard used to represent every thing
do
echo $FILE # prints $FILE variable; use $ to call a variable later in loop or code
done
# Here, FILE is a variable you are creating and assigning to every file in your current directory
## You can refer to this variable later using the $
```
### If-Else Loops
If we want to get a little fancier, we can use an `if else` loop to create directories. Personally, I really like using this trick when I want to make a directory, but I don't know if it already exists. The `!` is the same as "not" - so for the code below, I am checking if the directory (specificed with -d argument) does NOT exist.
In summary, the if loop logic here is: if this directory does NOT exist, then make directory. Otherwise, the directory exists, and nothing is done by the loop.
<center>
![](Basic_Bash_Tutorial_Files/if-else-loop-example.png)
</center>
<div align="center">Figure 4c: If-Else Loop Flowchart. [source](https://www.zenflowchart.com/blog/for-loop-flowchart)</div></br>
```{bash if_else, eval=FALSE}
if [[ ! -d ./testdir ]]; then
# [[ ! -d {dir} ]] --> checks if the directory dir does NOT exist
mkdir testdir
fi
rm -r testdir # how to delete a directory - recursive removal
```
## Word Count & Grep
In scenarios when we have really long files, or are looking for specific information within or about the file, there are several commands we can use. `grep` is an extremely powerful way to find certain patterns in your files/directories. There is a `find` command in bash, but we will not go over that here.
We can also use tools like `wc` to figure out how many words (or lines or other pieces of information) are in our file of interest.
```{bash grep_wc, eval=FALSE}
wc codon_table_copy.txt # wc is the "word count" command
## for wc - first column shows line count, second column shows word count, third column shows character count
wc -l codon_table_copy.txt # count number of lines in file
grep Stop codon_table_copy.txt # find the Stop codons in the file
grep Stop codon_table_copy.txt | wc -l # find Stop codons, then count number of them by counting the lines
## | is a pipe that is used to string multiple commands together
```
<center>
![](Basic_Bash_Tutorial_Files/wc_grep_examples.png)
</center>
<div align="center">Figure 5a: Examples of `wc` and `grep` outputs.</div></br>
We can also `sort` these files and pull out pieces of unique (`uniq`) information from said files, like unique gene IDs or protein names, etc. We can also be very specific as to where we search/sort by using the `cut` command to specify which sections/pieces we are cutting from.
```{bash sort_cut_uniq, eval=FALSE}
grep Stop codon_table_copy.txt | sort # sorts the order in which these lines with the word "Stop" are arranged in the output; automatically sorts alphabetically in descending order
grep Arginine codon_table_copy.txt | sort | cut -f 3 # cut by field; in this case 3rd field is kept and rest are removed
# fields are designated by whatever delimiter is used. For example, in a .csv file, the delimiter is ",", so each field is separated by a ","
## ^ searches for Arginine, sorts output of grep, and cuts third field (a column in this case) and returns output only from 3rd field
grep Arginine codon_table_copy.txt | sort | cut -f 3 | uniq # uniq returns only unique or individual instances of text in file(s) - great for checking out duplicates and the number of duplicates
cut -f3 codon_table_copy.txt | sort | uniq -c # cuts third field based on tab (\t); sorts 3rd field, then finds unique instances of each string and counts them
# cut can also cut fields based on any delimiter - the default delimiter is tab (\t)
```
<center>
![](Basic_Bash_Tutorial_Files/grep_sort_cut_examples.png)
</center>
<div align="center">Figure 5b: Examples of `grep`,`sort`, `cut`, and `uniq` outputs.</div></br>
## Transform and Streamline Editor (aka sed)
Finally, we can edit files using the transform command `tr` or the streamline editor command `sed`.
Transform or `tr` is used to make small edits across the entire file. For example, if you wanted to delete all of the spaces from a file, or if you wanted to switch all of your spaces to a tab, then you can use transform.
```{bash transform_ex, eval=FALSE}
cp codon_table_copy.txt codons.txt # make a copy of codon file so we can edit this file without changing original copy
head codons.txt # double check that this copy was successful by using head
tr a-z A-Z < codons.txt # transforms all lower case alphabetical characters to upper case
```
<center>
![](Basic_Bash_Tutorial_Files/tr_example.png)
</center>
<div align="center">Figure 6a: Example of `tr` output, where we converted all lower case letters to upper case.</div></br>
`sed` is a lot more useful for editing and replacing, but it does have somewhat of a weird syntax. The syntax is **`s/old pattern/new pattern`**, where the old pattern is the one you are editing, and the new pattern is what you are replacing the old pattern with.
`sed` is extremely powerful but can get complicated rather quickly, so for more on `sed` and using `sed` with regular expressions, please read this [Tutorials Point link](https://www.tutorialspoint.com/unix/unix-regular-expressions.htm) and the [Geeks for Geeks Link](https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/).
```{bash sed_ex, eval=FALSE}
sed 's/\t/ /g' codons.txt # convert all \t to spaces with sed, so that now all fields or columns are separated by spaces and not tabs
## syntax is sed 's/original_pattern/replacement_pattern/g', where "g" stands for global editing,.
### This means you can make these changes all throughout the file (hence it's global) rather than just making one or two edits in the file
```
<center>
![](Basic_Bash_Tutorial_Files/sed_example.png)
</center>
<div align="center">Figure 6b: Example of `sed` output, where we changed all tabs to spaces.</div></br>
# The End
Thank you for following my basic Bash scripting tutorial! I hope it is rewarding and helpful.
If you'd like to get a hold of me to offer me feedback about this tutorial, do not hestitate to reach out. My contact information is in the [About Me](#about-me) section.
# Version Information
```{r sessionInfo}
sessionInfo()
```
# About Me {#about-me}
My name is Linton and my pronouns are they/them. I am currently a PhD Student at UC Riverside in the [Genetics, Genomics, and Bioinformatics](https://ggb.ucr.edu/) PhD program and a member of [Dr. Emma Aronson's lab](https://profiles.ucr.edu/app/home/profile/emmaa).
If you have any questions regarding this workflow and the scripts I used, do not hesitate to contact [me](mailto:[email protected]?subject=Basic Bash Tutorial).
<div align="center">**Thank you so much for checking out my tutorial! Happy coding!! **</div></br>
<center>
![](Basic_Bash_Tutorial_Files/Bash_Logo_Colored.png)
</center>