This repository has been archived by the owner on Oct 12, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathxlsx_tutorial.Rmd
163 lines (124 loc) · 5.22 KB
/
xlsx_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: "Hackseq2017_Proj5"
author: Peter Zhang
date: Cot 22, 2017
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Load required R packages
```{R}
Sys.setenv(JAVA_HOME='C:/Program Files/Java/jdk-9.0.1') # for 64-bit version
library(xlsx)
library(animation)
library(readr)
library(biomaRt)
library(dplyr)
library(ggplot2)
library(pubmed.mineR)
```
Load the gene of interest
```{R}
gene<-"BRCA1"
```
Create a excel workbook project with xlsx package
```{R}
wb<-createWorkbook()
## create a bold font for column names of data tables
cs<-CellStyle(wb) + Font(wb, isBold = TRUE) + Border() + Alignment(h="ALIGN_CENTER")
```
Create a excel sheet within the workbook, we will name this sheet "Gene Info", because we will put general gene info in this first "cover sheet"
```{R}
sheet0<-createSheet(wb, sheetName = "Gene Info")
```
Now, we will use the biomaRt package to retrive basic gene info, such as genome coordinates
```{R}
mart<-useMart("ensembl", dataset="hsapiens_gene_ensembl")
a<-mart@attributes$name
##the info we retrieve with biomaRt is fully customization, see biomaRt package manual for a full list of more than 800 gene features available
a.useful<-c(a[1], a[9:12], a[61], a[97])
gene.info<-getBM(a.useful, a[60], gene, mart)
```
Now we load our gene info onto the excel sheet we created
```{R}
addDataFrame(gene.info, sheet0, colnamesStyle = cs)
```
On the same work sheet, we can add additional info, such as phenotype info
```{R}
gene.phenotype<-getBM("phenotype_description", a[60], gene, mart)
addDataFrame(gene.phenotype, sheet0, startRow = 6, colnamesStyle = cs)
```
Let say this is all we are going to put on our first sheet.
We are going to create a new work sheet for a new analysis, within the same workbook
```{R}
sheet1<-createSheet(wb, sheetName = "Expression")
##Generate an expression figure according to stage
download.file("https://github.com/hackseq/2017_project_5/raw/master/data/exprData.csv.gz", "exprData.csv.gz")
exprData<-read.csv("exprData.csv.gz")
expr.gene<-subset(exprData, exprData$GeneId==gene)
expr.gene.graph<-as.data.frame(t(expr.gene))
expr.gene.graph<-mutate(expr.gene.graph, GENE=as.character(expr.gene.graph[1,1]))
expr.gene.graph<-expr.gene.graph[2:nrow(expr.gene.graph),]
```
We can save the expression table in worksheet1
```{R}
addDataFrame(expr.gene.graph, sheet1, startRow = 1, colnamesStyle = cs)
```
Here our expression analysis can be virsualized with a figure
```{R}
##Make an expression figure
gene.boxplot<-ggplot(expr.gene.graph, aes(x=GENE, y=as.numeric(as.character(expr.gene.graph$GENE)))) +
geom_boxplot() +
labs(x = "Gene",
y = "Expression (Log2 transformed)",
title = paste0(("Expression of "),gene))+
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(size=6))
```
One way of storing the figure in our work sheet is to save it to an external file first, then import the figure to our data
Here, we are saving our expression figure on worksheet1, to avoid overlapping with our data table we load on this sheet earlier, we will load it onto column 6
```{R}
ggsave(gene.boxplot, filename="Expression.figure.png")
addPicture("Expression.figure.png", sheet1, scale=1, startRow=1, startColumn=6)
```
We can store unlimited number of data sheets or figure in a single workbook, on multiple worksheets. Create as many new sheets as needed and load the results data table or figure onto them.
```{R}
sheet2<-createSheet(wb, sheetName = "CNV")
##import your CNV data results as table
CNV.info2<-"CNV.data"
addDataFrame(CNV.info2, sheet2, colnamesStyle = cs)
```
Sometimes, functions from packages will save images as .pdf as default format and .pdf can not be directly imported onto our work sheet.
In such case, we can use the function from 'animation' package to convert pdf into .png file
```
im.convert("survival.pdf", output = "survival.png",extra.opts="-density 150")
addPicture("survival.png", sheet2, scale=0.5, startRow=12, startColumn=6)
```
Some functions do not have a built-in file output. We can use R default device image capture to store images
```
dev.off()
##some script making a figure to R imaging device
dev.copy2pdf()
file.rename("Rplot.pdf")
im.convert("Rplot.pdf", output = "Rplot.png",extra.opts="-density 150")
addPicture("Rplot.png", sheet2, scale=0.5, startRow=12, startColumn=6)
```
We can also store text results, such as text-mining results to our workbook
Example of storing results of PubMed text-mining results on a worksheet
```{R}
sheet3<-createSheet(wb, sheetName = "PubMed")
##grab abstracts for gene
##compile a abs database regarding "1000 genome breast cancer" search in pub-med
ABSdata<-readabs("pubmed_result_1000genomescancerbreast.txt")
gene.abs<-getabs(ABSdata, gene, FALSE)
gene.PMID<-get_PMCIDS(gene.abs)
##extract sentence from abs for gene
gene.sentences<-Give_Sentences(gene, gene.abs)
addDataFrame(gene.PMID, sheet3, colnamesStyle = cs)
addDataFrame(unlist(gene.sentences), sheet3, startRow = 6, colnamesStyle = cs)
```
After all results are loaded onto our multi-sheet workbook, we can save our workbook as an excel sheet
``` {R}
saveWorkbook(wb, file = paste0("ALL_RESULTS.xlsx"))
```