You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the EPC datasets we need to be careful with duplicate EPCs for the same property. While not an enormous issue as an EPC is valid for up to 10 years unless the property is renovated or retrofitted, there may be multiple records especially for rental properties which are improved to meet recent regulations.
We should be able to spot this by removing duplicates with the same UPRN (UPRN: Unique Property Reference Number) and I would suggest selecting the most recent record and discarding others. I will add this feature to the R code for the energy intensity sampler.
I'm not sure this will have a big impact when taking a recent sample of 5000 certificates from the API, but when using the full csv this could be a problem (My colleague has pointed out some properties in that dataset can have four or five duplicates!).
The text was updated successfully, but these errors were encountered:
This update has a negligible effect on mean energy intensity estimates for most typologies of house for most Local Authorities but there are some exceptions. these exceptions seem to often be older houses that have several EPCs as a result of being renovated/refurbished and improved. You can see this in the scatter plot comparing mean energy intensities with the update and the original estimates for each house type in a sample of midlands LADs.
For @nickmalleson I am rerunning the national sample estimation on DAFNI and will update the geojson file once this has finished. This should not change the trend especially amongst newer build homes but should be a bit more accurate/less biased estimates.
When using the EPC datasets we need to be careful with duplicate EPCs for the same property. While not an enormous issue as an EPC is valid for up to 10 years unless the property is renovated or retrofitted, there may be multiple records especially for rental properties which are improved to meet recent regulations.
We should be able to spot this by removing duplicates with the same UPRN (UPRN: Unique Property Reference Number) and I would suggest selecting the most recent record and discarding others. I will add this feature to the R code for the energy intensity sampler.
I'm not sure this will have a big impact when taking a recent sample of 5000 certificates from the API, but when using the full csv this could be a problem (My colleague has pointed out some properties in that dataset can have four or five duplicates!).
The text was updated successfully, but these errors were encountered: