You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current Import and export chapter 7 includes a subsection, 7.10 that deals with APIs and making HTTP requests. This is helpful where APIs have been set up for large databases, however people are still likely to encounter and may need to extract data from databases that do not have APIs. A common scenario is a mysql database that is being used to hold disease surveillance data. The database may comprise of tables and views that users can access with their own user name and password.
To incorporate the querying and import of resultant data from an SQL database into R in a script or R markdown document, four things need to be demonstrated:
Best practice for securing login details, e.g. by using getPass() and encrypting, or by using dedicated packages such as keychain
How to connect to the database using the login details dynamically with connection strings and the DBI package or similar
How to deal with decryption of data from an encrypted database in R, either during the query or after import
How to create and send a query from R, using dplyr() primarily with a matched example of how this could be done with SQL from an R markdown document chunk
Ideally, it would be nice to set up a worked example which Epi R handbook readers could also interact with and try out themselves. This would mean creating an SQL database containing fake surveillance data (or publically available surveillance data e.g. from WHO) and building a scenario around it, such as creating a summary figure and table in an automated annual surveillance report. The database would need to be sandboxed and login details either provided in the text, or auto-generated and temporary login details given to the user. It would be important that they can see what their login details are, so that it is clear how to incorporate this into a connection string. The database could use some standard encryption so that decryption can be demonstrated as well (as this is a common practice in surveillance databases for columns containing personal identifying information).
Resources:
There is a nice e-book by POSIT on connecting to databases here. In this e-book there is a nice example of how to query a database using dplyr verbs - which is appealing because this approach will fit well with the rest of the handbook and also means that the user doesn't have to learn SQL to query their database. It might also be helpful to show how a query would be done with dplyr and then with SQL in R, for comparison (and in case there are some instances where for instance SQL is faster). Examples from the POSIT e-book are given here and here.
I would suggest this be added as a new sub-section after the current section 7.10.
The text was updated successfully, but these errors were encountered:
Scope:
cc @nsbatra
The current Import and export chapter 7 includes a subsection, 7.10 that deals with APIs and making HTTP requests. This is helpful where APIs have been set up for large databases, however people are still likely to encounter and may need to extract data from databases that do not have APIs. A common scenario is a mysql database that is being used to hold disease surveillance data. The database may comprise of tables and views that users can access with their own user name and password.
To incorporate the querying and import of resultant data from an SQL database into R in a script or R markdown document, four things need to be demonstrated:
getPass()
and encrypting, or by using dedicated packages such askeychain
DBI
package or similardplyr()
primarily with a matched example of how this could be done with SQL from an R markdown document chunkIdeally, it would be nice to set up a worked example which Epi R handbook readers could also interact with and try out themselves. This would mean creating an SQL database containing fake surveillance data (or publically available surveillance data e.g. from WHO) and building a scenario around it, such as creating a summary figure and table in an automated annual surveillance report. The database would need to be sandboxed and login details either provided in the text, or auto-generated and temporary login details given to the user. It would be important that they can see what their login details are, so that it is clear how to incorporate this into a connection string. The database could use some standard encryption so that decryption can be demonstrated as well (as this is a common practice in surveillance databases for columns containing personal identifying information).
Resources:
There is a nice e-book by POSIT on connecting to databases here. In this e-book there is a nice example of how to query a database using
dplyr
verbs - which is appealing because this approach will fit well with the rest of the handbook and also means that the user doesn't have to learn SQL to query their database. It might also be helpful to show how a query would be done withdplyr
and then withSQL
in R, for comparison (and in case there are some instances where for instance SQL is faster). Examples from the POSIT e-book are given here and here.I would suggest this be added as a new sub-section after the current section 7.10.
The text was updated successfully, but these errors were encountered: