The dataset I used in this analysis is one of the datasets provided by Udacity. It contains information about flights from 1987 to 2020. Because it is a large data set I just took a portion of the data, that is 27 000 records only. The dataset can be found from this link [https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HG7NV7]
In my exploration I found there are 3 main correlations that is Arrival Delay vs Depature Delay, Distance vs Air Time and Carrier Delay vs Departure Delay. With strong relationship found in Arrival Delay vs Depature Delay and Distance vs Air Time. The relationship between Carrier Delay vs Departure Delay is not very strong.
I also found out that origions with highest number of flights also have highest number of Departure Delays. So it is clear that there is relationship between having large number of flights and large number of delays as well. I was expecting 2020 to have the lowest number of flights since there were lockdowns in many countries. But it turned out to be above 1987. I also expected the number of flights in December to the highest due to holidays but thats not the case. Considering the monthly Depature and Arrival delays the month of August is extremely high and February extremely low. From the year 2006 there was a decline in number of delays, but this changed around 2016 with a very sharp increase since then until 2019.
Regarding delay reasons, NASDelays are the highest for all the years with LateAirCraft being the lowest cause. Looking into the reasons for cancellations, weather is the predominant cause for cancellations. The month of January being the most affected by weather cancellations. Cancellations due to Security issues only happened in the year 2020. Its suprising that in all previous years it was not an issue.
There are 3 main correlations that is Arrival Delay vs Depature Delay, Distance vs Air Time and Carrier Delay vs Departure Delay. Flight cancelations is only found in the year 2020, this is due to Covid19 pandemic.