This script has been made as an exercise to populate a PostgreSQL database with data coming from a series of CSV files regarding the management of courses in a public university. Probably something can not work, in such case report the issue to us.
PostgreSQL is available for download as ready-to-use packages or installers for various platforms and you can download it from here . Installation file includes also the GUI pgAdmin , usefull for beginners. If you are interested in a deeper explanation of how PostgreSQL works you can find the official documentation here.
In order to work with the database from Python you need an IDE (eg. Spyder or PyCharm) and you have to install some additional libraries: pandas, psycopg and SQLAlchemy.
Check which python
version is currently used by your machine. This tutorial is prepared using Python 3.9.0 but you can use also a higher version.
print( "You are using python version: ")
sys.version
If you are working locally, then it is recommended to work with a local virtual environment.
A virtual environment in Python is an isolated environment that allows you to manage and install packages independently of your system-wide Python installation. This isolation helps to avoid conflicts between dependencies and ensures that your projects have access to the specific versions of packages they need. Here are some key points about virtual environments:
-
Isolation: Each virtual environment has its own set of installed packages and dependencies, separate from other environments and the system Python installation.
-
Dependency Management: You can install and manage project-specific dependencies without affecting other projects or the global Python installation.
-
Reproducibility: Ensures that a project can be replicated on another system with the same set of dependencies, making it easier to share projects with others or deploy them on different machines.
-
Version Control: Allows you to test projects with different versions of packages without interfering with other projects.
-
Creating a Virtual Environment:
- Using
venv
(included in Python 3.3 and above):python -m venv myenv
- Using
virtualenv
(requires installation):pip install virtualenv virtualenv myenv
- Using
-
Activating the Virtual Environment:
- On Windows:
myenv\Scripts\activate
- On macOS and Linux:
source myenv/bin/activate
- On Windows:
-
Installing Packages:
- Once the virtual environment is activated, you can install packages using
pip
:pip install package_name
- Once the virtual environment is activated, you can install packages using
-
Deactivating the Virtual Environment:
- To deactivate the virtual environment and return to the global Python environment:
deactivate
- To deactivate the virtual environment and return to the global Python environment:
-
Deleting the Virtual Environment:
- To remove a virtual environment, simply delete the directory:
rm -rf myenv
- To remove a virtual environment, simply delete the directory:
-
Create a new virtual environment for your project:
python -m venv myprojectenv
-
Activate the virtual environment:
source myprojectenv/bin/activate # macOS/Linux myprojectenv\Scripts\activate # Windows
-
Install required packages:
pip install requests numpy
-
Work on your project, knowing the dependencies are isolated.
-
Deactivate the environment when done:
deactivate
Using virtual environments is a best practice for Python development, especially when managing multiple projects or working in a team.
For this tutorial, the following packages are needed:
If you are working locally, then install them preferably in your active virtual environment. If you are working in Google Colab, then install them in that Google Colab environment. As a third option, you can install them in your local machine.
Check if your current python environment has the required packages for this tutorial installed:
import psycopg
import sqlalchemy
If there is a ModuleNotFoundError, then install the missing modules, using pip install
:
pip install "psycopg[binary]"
pip install SQLAlchemy