Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add caching for timezone offsets, significantly speeds up import #1250

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -179,3 +179,16 @@ Whenever the content of
the corresponding documentation table::

dateparser_scripts/update_supported_languages_and_locales.py


Updating the Timezone Cache
----------------------------------------------------

Whenever the content of
``dateparser/timezones.py`` is modified you need to rebuild the timezone cache.

Run this command:
``BUILD_TZ_CACHE=1 python -c "import dateparser"``

which should update
``dateparser/data/dateparser_tz_cache.pkl``
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ include CONTRIBUTING.rst
include HISTORY.rst
include LICENSE
include README.rst
include dateparser/data/dateparser_tz_cache.pkl
include dateparser_data/settings.py
include requirements.txt

Expand Down
Binary file added dateparser/data/dateparser_tz_cache.pkl
Binary file not shown.
51 changes: 47 additions & 4 deletions dateparser/timezone_parser.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
import os
import pickle
import zlib
from datetime import datetime, timedelta, timezone, tzinfo
from pathlib import Path

import regex as re

Expand Down Expand Up @@ -84,8 +88,47 @@
return offset


_search_regex_parts = []
_tz_offsets = list(build_tz_offsets(_search_regex_parts))
_search_regex = re.compile("|".join(_search_regex_parts))
_search_regex_ignorecase = re.compile("|".join(_search_regex_parts), re.IGNORECASE)
local_tz_offset = get_local_tz_offset()

_tz_offsets = None
_search_regex = None
_search_regex_ignorecase = None


def _load_offsets(cache_path, current_hash):
global _tz_offsets, _search_regex, _search_regex_ignorecase

try:
with open(cache_path, mode="rb") as file:
(
serialized_hash,
_tz_offsets,
_search_regex,
_search_regex_ignorecase,
) = pickle.load(file)
if current_hash is None or current_hash == serialized_hash:
return
except (FileNotFoundError, ValueError, TypeError):
pass

Check warning on line 112 in dateparser/timezone_parser.py

View check run for this annotation

Codecov / codecov/patch

dateparser/timezone_parser.py#L111-L112

Added lines #L111 - L112 were not covered by tests

_search_regex_parts = []
_tz_offsets = list(build_tz_offsets(_search_regex_parts))
_search_regex = re.compile("|".join(_search_regex_parts))
_search_regex_ignorecase = re.compile("|".join(_search_regex_parts), re.IGNORECASE)

Check warning on line 117 in dateparser/timezone_parser.py

View check run for this annotation

Codecov / codecov/patch

dateparser/timezone_parser.py#L114-L117

Added lines #L114 - L117 were not covered by tests

with open(cache_path, mode="wb") as file:
pickle.dump(

Check warning on line 120 in dateparser/timezone_parser.py

View check run for this annotation

Codecov / codecov/patch

dateparser/timezone_parser.py#L119-L120

Added lines #L119 - L120 were not covered by tests
(current_hash, _tz_offsets, _search_regex, _search_regex_ignorecase),
file,
protocol=5,
)


CACHE_PATH = Path(__file__).parent.joinpath("data", "dateparser_tz_cache.pkl")

if "BUILD_TZ_CACHE" in os.environ:
current_hash = zlib.crc32(str(timezone_info_list).encode("utf-8"))

Check warning on line 130 in dateparser/timezone_parser.py

View check run for this annotation

Codecov / codecov/patch

dateparser/timezone_parser.py#L130

Added line #L130 was not covered by tests
else:
current_hash = None

_load_offsets(CACHE_PATH, current_hash)
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
install_requires=[
"python-dateutil>=2.7.0",
"pytz>=2024.2",
"regex>=2015.06.24,!=2019.02.19,!=2021.8.27",
"regex>=2024.9.11",
"tzlocal>=0.2",
],
entry_points={
Expand Down
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ deps =
{[testenv]deps}
python-dateutil==2.7.0
pytz==2024.2
regex==2015.06.24
regex==2024.9.11
tzlocal==0.2

[testenv:min-all]
Expand Down
Loading