Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for time conversions in tools package #2341

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

markcampanelli
Copy link
Contributor

@markcampanelli markcampanelli commented Dec 23, 2024

  • Closes Add tests for time conversions in tools package #2340
  • I am familiar with the contributing guidelines
  • Tests added
  • Updates entries in docs/sphinx/source/reference for API changes.
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • New code is fully documented. Includes numpydoc compliant docstrings, examples, and comments where necessary.
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

@markcampanelli markcampanelli changed the title [DRAFT] Add tests for time conversions in tools package Add tests for time conversions in tools package Dec 23, 2024
@markcampanelli
Copy link
Contributor Author

I realize no one planned/asked for this, but I think it's a good addition and ready to be reviewed by a maintainer. Increases test coverage by a whopping 0.07%!

@kandersolar kandersolar added this to the v0.11.3 milestone Dec 23, 2024
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @markcampanelli

The tests expose a weakness in localize_to_utc: what happens if location has a timezone defined as an offset? That is permissible, and if done produces an unhelpful message:

    if zone.upper() == 'UTC':

AttributeError: 'int' object has no attribute 'upper'

'input,expected',
[
(
{
Copy link
Member

@cwhanse cwhanse Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a dict here allows the function call to be double-splatted (**input) and may make it easier to "see" each tested set of inputs and output. It's not the style used for other tests are parameterized.

I'm interested in feedback from others: is this parameterization style easier or more difficult, as a reviewer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents: a more compact representation of the changing values is helpful for reviewing, and is worth a bit of processing inside the test function itself. For example, it seems like the following:

(
    {
        "time": pd.DatetimeIndex(
            ["1974-06-22T18:30:15"],
            tz=ZoneInfo("Etc/GMT+5"),
        ),
        "location": location.Location(
            43.19262774396091, -77.58782907414867, tz="Etc/GMT+5"
        )
    },
    pd.DatetimeIndex(["1974-06-22T23:30:15"], tz=ZoneInfo("UTC")),
),
(
    {
        "time": pd.DatetimeIndex(["1974-06-22T18:30:15"]),
        "location": location.Location(
            43.19262774396091, -77.58782907414867, tz="Etc/GMT+5"
        )
    },
    pd.DatetimeIndex(["1974-06-22T23:30:15"], tz=ZoneInfo("UTC")),
),

could be replaced with something like this:

("1974-06-22T18:30:15", "Etc/GMT+5", 43, -77, "Etc/GMT+5", "1974-06-22T23:30:15+00:00"),
("1974-06-22T18:30:15", None, 43, -77, "Etc/GMT+5", "1974-06-22T23:30:15+00:00"),

And the test function could then contain the necessary calls to Location and pd.DatetimeIndex to construct the parameters themselves.

I think the latter format makes it easier to see what changes between each set of parameters. I found my eyes having to flick back and forth and subsequently getting lost with the former format.

@markcampanelli
Copy link
Contributor Author

markcampanelli commented Dec 23, 2024

Thanks @markcampanelli

The tests expose a weakness in localize_to_utc: what happens if location has a timezone defined as an offset? That is permissible, and if done produces an unhelpful message:

    if zone.upper() == 'UTC':

AttributeError: 'int' object has no attribute 'upper'

Good catch! Lemme see what I might do about fixing that here (if it's not too much scope creep).

BTW: I'm pretty sure this is the sort of issue that I solved (hopefully!) by simplifying the timezone representation in pvlib.location.Location in #2343

@markcampanelli
Copy link
Contributor Author

@cwhanse @kandersolar I addressed the bug Cliff spotted, and updated a Location test for that.

Due to the high repetition, I decided not to use test parameterization at all for the new tools tests. Same coverage.

Should be ready for another look.

pvlib/tools.py Outdated

Parameters
----------
time : datetime.datetime, pandas.DatetimeIndex,
or pandas.Series/DataFrame with a DatetimeIndex.
location : pvlib.Location object
location : pvlib.Location object (unused if time is localized)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
location : pvlib.Location object (unused if time is localized)
location : pvlib.Location object (unused if ``time`` is localized)

elif isinstance(tz, (int, float)):
self.tz = tz
self.pytz = pytz.FixedOffset(tz*60)
self.tz = f"Etc/GMT{int(-tz):+d}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to ask to revert this change. The docstring says tz can be a float. There are places that have a non-integer offset from UTC, e.g., the Marquesas are UTC−09:30UTC−09:30. localize_to_utc is changed in this PR to examine Location.pytz rather than Location.tz, so the error we discussed won't occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a point in the Location docstring that says this:

Location objects have two timezone attributes:
    * ``tz`` is a IANA timezone string.
    * ``pytz`` is a pytz timezone object.

While I appreciate my changes may break some things, I think this existing docstring suggests a much more sane approach. (Indeed, in the sequel using zoneinfo, I propose using one, and only one, internal representation of the timezone, instead of two, potentially with helper functions s to update this internal representation using int/float offsets, etc..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying revert from storing a string, but revert the int portion. If tz is truncated to an integer, how are timezones such as UTC+05:30 represented?

I'm open to doing away with accepting int or float for tz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwhanse Sorry I misunderstood your meaning. I was aware that some locations are on half-hour offsets and such, but what is unclear to me is if/how that is represented in pytz (or zoneinfo for that matter). I do not see any fractional offsets in zoneinfo.available_timezones() and this also does not work for me: pytz.timezone("Etc/GMT+5.5").

I must be missing something here. Does something non-standard have to be done to support fractional offsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to doing away with accepting int or float for tz.

Hi again @cwhanse. It looks like the standard "non-integral-hour" offsets are only supported in the IANA standard through place names such as "Asia/Kathmandu" and "Asia/Yangon", but not through Etc/GMT+X.Y style offsets.

Given that, I think the most trouble-free thing to do is to remove the int and float options and be sure to discuss this issue in the documentation tutorial. (Also, the only supported tz strings would be the IANA ones.) Sounds like you are amenable to this.

Keeping both fields tz (only as a string) and pytz (as a pytz timezone type), then I think I should add setters that change one when the other is changed after the Location object has already been already initialized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most trouble-free thing to do is to remove the int and float options

I'm in favor of this change. @pvlib/pvlib-maintainer @pvlib/pvlib-triage your opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TMY3 files represent timezone in numeric form, so this change may complicate some workflows. Outside of that, I never use this attribute (and don't know why I would), so I have no opinion either.

Remember that we usually have a deprecation period before outright removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more investigating on this yesterday, and I think I see an implementation that would better support the existing int and float offsets. However, this also raised some new questions about the existing implementation. I will try to push some concrete code here by end of day for people to consider, along with the new questions raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have added support back for int and non-fractional float being passed to tz in the Location constructor. I also test that pytz and zoneinfo timezones are likewise supported for initialization, and I beefed up the tests even more.

Note that the pytz Location attribute is now the single source of truth for the time zone in a Location object (which I recommend updating to a standard-library timezone.TimeZone object in #2342). self.tz simply returns the stringified self.pytz, but one can still directly set self.tz with any tz value supported by the Location intializer. Also, the tz and pytz attributes are kept in sync now, as changing one updates the other.

Ready for another review.

@markcampanelli
Copy link
Contributor Author

The two test failures appear to be unrelated to my changes.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should switch and regard tz as the source of truth, and pytz as the optional attribute? With an eye toward replacing pytz with python's timezone.

from the pytz and zoneinfo packages), default 'UTC'.
See http://en.wikipedia.org/wiki/List_of_tz_database_time_zones for a
list of valid name strings for IANA time zones.
ints and floats must be non-fractional N-hour offsets from UTC, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ints and floats must be non-fractional N-hour offsets from UTC, which
ints and floats must be whole number hour offsets from UTC, which

Comment on lines 102 to 103
"floating point tz does not have zero fractional part: "
f"{tz_}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"floating point tz does not have zero fractional part: "
f"{tz_}"
"floating point tz has non-zero fractional part: "
f"{tz_}. Only whole number offsets are supported."

else:
raise TypeError(
f"invalid tz specification: {tz_}, must be an IANA time zone "
"string, a non-fractional int/float UTC offset, or a "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"string, a non-fractional int/float UTC offset, or a "
"string, a whole number int/float UTC offset, or a "

@markcampanelli
Copy link
Contributor Author

Do you think we should switch and regard tz as the source of truth, and pytz as the optional attribute? With an eye toward replacing pytz with python's timezone.

@cwhanse Ok, so we make the single source of truth the standardized IANA-supported string in the tz attribute, and make a pytz getter that converts tz to the pytz time zone using return pytz.timezone(self.tz)), and the setter that stringifies the provided pytz timezone into the tz string. I like this idea.

Question: Would it be too disruptive at this point to make the (newly implemented) pytz attribute "read only"? That is, remove the setter mentioned above, so that one can only change the time zone on an existing Location object by setting the tz attribute (with same interface as the Location initializer). I would much prefer this simplification too.

@cwhanse
Copy link
Member

cwhanse commented Jan 10, 2025

make the (newly implemented) pytz attribute "read only"?

I don't think we do this. Since pytz is in maintenance, it seems reasonable to expect to replace pytz with timezone (as you anticipate). At that time, the pytz attribute can be removed. I don't see the value in having both the timezone object and a string as attributes, since one can print the object and get the string representation.

@markcampanelli
Copy link
Contributor Author

I decided to store the single source of truth about a Location's time zone as a psuedo-private zoneinfo.ZoneInfo object. However, the time zone is still updated (only) using the tz attribute, and so that interface is unchanged (and now very consistent for new vs. existing Location objects). However, I did make the pytz attribute read only, which avoids the synchronization issue with the tz attribute. I also added a read-only zoneinfo attribute that newly allows uses to get the time zone as the standard library's zoneinfo.ZoneInfo object (which pandas is apparently moving to in 3.0). I updated the Location's docstring accordingly. This new foundation paves the ways for #2342, but with minimal current disruption.

I also made all of @cwhanse's recommended changes.

If this implementation and the removal of directly setting the pytz attribute is ok by everyone, then I would like to add a deprecation message to the pytz attribute that says zoneinfo will be the official time-zone package going forward. I also still need to update the whatsnew and verify that the new documentation renders properly.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markcampanelli let's keep this PR to the scope described in the title. Importing and adding the zoneinfo attribute feels like half of the journey to shifting from pytz to zoneinfo and I think that should be done in its own PR.

Same for fixing asv - do that in a separate PR.

@@ -26,5 +35,4 @@ Requirements

Contributors
~~~~~~~~~~~~


* Mark Campanellli (:ghuser:`markcampanelli`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Mark Campanellli (:ghuser:`markcampanelli`)
* Mark Campanelli (:ghuser:`markcampanelli`)

@markcampanelli
Copy link
Contributor Author

@pvlib/pvlib-maintainer @pvlib/pvlib-triage Can someone review my asv CI fix here?

If this looks ok, then should I move the fix to a separate PR for better visibility?

@markcampanelli
Copy link
Contributor Author

@pvlib/pvlib-maintainer @pvlib/pvlib-triage Can someone review my asv CI fix here?

If this looks ok, then should I move the fix to a separate PR for better visibility?

Moved to #2352, which now should go in before this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests for time conversions in tools package
4 participants