Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib][proposal] Duration module proposal #4022

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bgreni
Copy link
Contributor

@bgreni bgreni commented Feb 26, 2025

A proposal for a Duration struct inspired by std::chrono::duration from the C++ stdlib

@martinvuyk
Copy link
Contributor

I think this will add some nice type safety on some otherwise very common mistakes when using APIs.

I still have the whole datetime module on my repo waiting for the stdlib to mature more so that the workload isn't too huge. But my main goal with the design is to bundle datetime as 1 and not a timedelta and date object like Python does (there are many other significant design choice diffs). Separating duration out would be like making an independent timedelta (which in the python stdlib docstrings says is a delta between datetimes). But if we don't mix it up with datetime I'm all for it.

@bgreni
Copy link
Contributor Author

bgreni commented Feb 26, 2025

I still have the whole datetime module on my repo waiting for the stdlib to mature more so that the workload isn't too huge. But my main goal with the design is to bundle datetime as 1 and not a timedelta and date object like Python does (there are many other significant design choice diffs). Separating duration out would be like making an independent timedelta (which in the python stdlib docstrings says is a delta between datetimes). But if we don't mix it up with datetime I'm all for it.

I think duration and datetime would exist separately from one another, but that's a very important point to consider thank you for bringing it up, I'm curious what the team has to say on that front.

@Julian-J-S
Copy link

I agree that having something like "Duration" or "TimeDelta" is very valueable and much nicer and safer than throwing around integers without context.

As for the implementation I personally like to keep it easy and intuitive and close to python instead of fancy rust/c style 😆
In the end the api should be easy to use and understand.
So I would favor something like this:

# Duration object with an underlying unit resolution
d = Duration(minutes=15, seconds=30, unit='ms')

# or parameterized
d = Duration['ms'](minutes=15, seconds=30)

use(d)

Instead of:

struct Duration[R: Ratio, postfix: StringLiteral='']: ... # user questions: Ratio? postfix? I want a "normal" constructor
struct Ratio[N: UInt, D: UInt = 1]: ... # user questions: Why do I need this?

d = Minutes(15) # user questions: How would I create a "complex" duration of 15 minutes and 30 seconds?

use(d.cast[Ratio.Milli]()) # user questions: What is Ratio.Milli? Why do I need to cast?

Is there a reason we need to deviate from python and become more complex/fancy? What are the advantaged for the user?
Just my initial thoughts. Maybe you are completely right and your approach is better 😉

@martinvuyk
Copy link
Contributor

I agree with @Julian-J-S, we should avoid the Ratio concept (at least on the user side) because it's a bit confusing.

I'm picturing something like this:

from time.duration import Duration, Unit

fn get_time_to_sleep() -> Duration[Unit.Seconds]: ...

fn main():
    sleep_time = get_time_to_sleep().cast[Unit.Microseconds]()
    esp_sleep_enable_timer_wakeup(sleep_time)
    esp_deep_sleep_start()

each can be a ratio underneath etc. etc.

@bgreni
Copy link
Contributor Author

bgreni commented Feb 27, 2025

@martinvuyk @Julian-J-S

Exposing the ratio in some way allows users to define arbitrary time intervals. Like this example taken from the C++ docs.

#include <chrono>
#include <iostream>
 
using namespace std::chrono_literals;
 
template<typename T1, typename T2>
using mul = std::ratio_multiply<T1, T2>;
 
int main()
{
    using microfortnights = std::chrono::duration<float,
        mul<mul<std::ratio<2>, std::chrono::weeks::period>, std::micro>>;
    using nanocenturies = std::chrono::duration<float,
        mul<mul<std::hecto, std::chrono::years::period>, std::nano>>;
    using fps_24 = std::chrono::duration<double, std::ratio<1, 24>>;
 
    std::cout << "1 second is:\n";
 
    // integer scale conversion with no precision loss: no cast
    std::cout << std::chrono::milliseconds(1s).count() << " milliseconds\n"
              << std::chrono::microseconds(1s).count() << " microseconds\n"
              << std::chrono::nanoseconds(1s).count() << " nanoseconds\n";
 
    // integer scale conversion with precision loss: requires a cast
    std::cout << std::chrono::duration_cast<std::chrono::minutes>(1s).count()
              << " minutes\n";
    // alternative to duration_cast:
    std::cout << 1s / 1min << " minutes\n";
 
    // floating-point scale conversion: no cast
    std::cout << microfortnights(1s).count() << " microfortnights\n"
              << nanocenturies(1s).count() << " nanocenturies\n"
              << fps_24(1s).count() << " frames at 24fps\n";
}

Most of the time users are just going to use the provided aliases, and I agree we could namespace common ratios into a Unit struct to make it more ergonomic so most users won't be exposed to that concept. Personally I've used std::chrono::duration many times, and I had no idea it was implemented this way until I looked to see if I could port it to Mojo.

I think Julian's idea is basically just a datetime, and in that case you should just use that.

d = Minutes(15) # user questions: How would I create a "complex" duration of 15 minutes and 30 seconds?

You would do this d = Minutes(15) + Seconds(30), which I mentioned wouldn't be possible in Mojo at the moment due to
lack of compiler support for determining that the return type of that operation should be a Seconds type, so initially you couldn't do that. Which I think is ok for now as the idea here is to provide a type safe mechanism for representing time durations that also only need a
single number to represent them so it can be register passable and performant.

@bgreni bgreni force-pushed the duration-proposal branch 3 times, most recently from 83ce184 to 2d983a6 Compare February 28, 2025 01:40
@Julian-J-S
Copy link

@bgreni thanks for the explanation. I understand from a technical point of view.

I think we are coming from different perspectives.
I come more from a Data Science/Analytics point of view where a Duration is the difference between two DateTimes where a Duration and also be negative and is like "1day 2hours 3minutes" or "11s 22ms 33ns" (all stored as a single Int with a base unit).
You seem to come more from a Frequency/Refresh-Rate point of view which is quite different I guess.

Nevertheless, I think mojo needs to decide on a certain api-style. Yours is rather close to C++/Rust while I think staying closer to python might be better for users.

Maybe one option is to provide both options as constructor

# Duration of 1hour 2 minutes
Duration(
    hours=1,
    minutes=2,
)
# or Duration is terms of frequency: 1/24s
Duration(
    N=1,
    D=24
    base_unit="s",
)

On the other hand maybe it would be better to have 2 distinct objects for Duration and Frequency that can be converted into each other 🤔

f = Frequency(
    N=1,
    D=24
    base_unit="s",
)
f.to_duration()

@martinvuyk
Copy link
Contributor

I'll chime in here again, because I have to say I don't like timedelta nor the concept of mixing different units of measurement in the same object. From experience I can also tell you that it complicates the implementation much more, we would need to add overloads for each parametrized unit of measurement to make building e.g. a Duration[Unit.Minutes](seconds=30) impossible.

In my implementation of datetime I made subtraction return a datetime, not a timedelta object because it makes no sense. On underflow, the object goes to the end datetime of it's given calendar.

The Frequency idea is quite interesting. I see its usefulness in many hardware-related APIs. It would be interesting once the language permits to just make it inherit duration and add a constructor which takes in Hz .

@Julian-J-S
Copy link

In my implementation of datetime I made subtraction return a datetime, not a timedelta object because it makes no sense. On underflow, the object goes to the end datetime of it's given calendar.

I cannot follow 😮
The difference between two points in time (DateTime) should be some sort of Duration/TimeDelta with a direction (positive/negative).
(almost) all languages/frameworks/implmentations have this conecpt because it is absolutely neccessary.
some examples:

  • 2025-01-20 minus 2025-01-22
    • DateTime: 9999-xxxxx ??? # underflow to the "end datetime" makes no sense
    • Duration: -48hours (physically stored as Int in some time unit)
  • 2025-01-22 minus 2025-01-20
    • DateTime: 1970-xxxxx / 0000-00-02 ??? # makes no sense as a DateTime
    • Duration: 48hours (physically stored as Int in some time unit)

Also a Duration between 2 DateTimes is always "absolute". A single quantifyable number.
If you have the result as a DateTime this is NOT true because for example "1year 3month 2 days" is NOT quantifyable in terms of days/seconds. Years have different length/days, even days can have different length (daylight savings) etc. Therefore modelling the "delta" between two points in time as a DateTime is very problematic und impossible for many use cases 🤔

@martinvuyk
Copy link
Contributor

The difference between two points in time (DateTime) should be some sort of Duration/TimeDelta with a direction (positive/negative).

No difference in time can be absolute. It depends on the calendar it is built upon. There is a reason Python's timedelta only considers up to days. Which won't be true for places where the day doesn't last the same as the current rotation speed of the earth.

  • 2025-01-20 minus 2025-01-22
    • DateTime: 9999-xxxxx ??? # underflow to the "end datetime" makes no sense
    • Duration: -48hours (physically stored as Int in some time unit)

It's the same thing as with integers, an underflow is the problem of the programmer because that is not how this type is meant to be used. The dates should be converted to a format like a Duration (e.g. seconds since unix epoch) to subtract and have a negative result. My implementation has a .seconds_since_epoch() which returns the proleptic Gregorian calendar epoch by default (Python's default).

  • 2025-01-22 minus 2025-01-20
    • DateTime: 1970-xxxxx / 0000-00-02 ??? # makes no sense as a DateTime
    • Duration: 48hours (physically stored as Int in some time unit)

Idem here. Convert to seconds and subtract.

The main logical problem is that when subtracting two datetimes the concept is this:
(taken from the tests in my repo)

def test_add():
    # using python and unix calendar should have no difference in results
    alias pycal = PythonCalendar
    alias unixcal = UTCCalendar
    alias dt = DateTime[iana=False, pyzoneinfo=False, native=False]
    alias TZ = dt._tz
    tz_0_ = TZ("Etc/UTC", 0, 0)
    tz_1 = TZ("Etc/UTC-1", 1, 0)
    tz1_ = TZ("Etc/UTC+1", 1, 0, -1)

    # test february leapyear
    result = dt(2024, 2, 29, tz=tz_0_, calendar=pycal) + dt(
        0, 0, 1, tz=tz_0_, calendar=pycal
    )
    offset_0 = dt(2024, 3, 1, tz=tz_0_, calendar=unixcal)
    offset_p_1 = dt(2024, 3, 1, hour=1, tz=tz_1, calendar=unixcal)
    offset_n_1 = dt(2024, 2, 29, hour=23, tz=tz1_, calendar=unixcal)
    add_seconds = dt(2024, 2, 29, tz=tz_0_, calendar=unixcal).add(
        seconds=24 * 3600
    )
    assert_equal(result, offset_0)
    assert_equal(result, offset_p_1)
    assert_equal(result, offset_n_1)
    assert_equal(result, add_seconds)
def test_subtract():
    # using python and unix calendar should have no difference in results
    alias pycal = PythonCalendar
    alias unixcal = UTCCalendar
    alias dt = DateTime[iana=False, pyzoneinfo=False, native=False]
    alias TZ = dt._tz
    tz_0_ = TZ("Etc/UTC", 0, 0)
    tz_1 = TZ("Etc/UTC-1", 1, 0)
    tz1_ = TZ("Etc/UTC+1", 1, 0, -1)

    # test february leapyear
    result = dt(2024, 3, 1, tz=tz_0_, calendar=pycal) - dt(
        0, 0, 1, tz=tz_0_, calendar=pycal
    )
    offset_0 = dt(2024, 2, 29, tz=tz_0_, calendar=unixcal)
    offset_p_1 = dt(2024, 2, 29, hour=1, tz=tz_1, calendar=unixcal)
    offset_n_1 = dt(2024, 2, 28, hour=23, tz=tz1_, calendar=unixcal)
    sub_seconds = dt(2024, 3, 1, tz=tz_0_, calendar=unixcal).subtract(days=1)
    assert_equal(result, offset_0)
    assert_equal(result, offset_p_1)
    assert_equal(result, offset_n_1)
    assert_equal(result, sub_seconds)

It might seem quite counterintuitive to allow "weird" start and end date calendars. But using this, I actually managed to build a calendar and datetime type which can represent timedelta (that has months and years as well).

fn timedelta[
    dst_storage: ZoneStorageDST = ZoneInfoMem32,
    no_dst_storage: ZoneStorageNoDST = ZoneInfoMem8,
    iana: Bool = True,
    pyzoneinfo: Bool = True,
    native: Bool = False,
](
    years: UInt = 0,
    months: UInt = 0,
    days: UInt = 0,
    hours: UInt = 0,
    minutes: UInt = 0,
    seconds: UInt = 0,
    m_seconds: UInt = 0,
    u_seconds: UInt = 0,
    n_seconds: UInt = 0,
    tz: Optional[
        DateTime[
            dst_storage=dst_storage,
            no_dst_storage=no_dst_storage,
            iana=iana,
            pyzoneinfo=pyzoneinfo,
            native=native,
        ]._tz
    ] = None,
) -> DateTime[
    dst_storage=dst_storage,
    no_dst_storage=no_dst_storage,
    iana=iana,
    pyzoneinfo=pyzoneinfo,
    native=native,
] as output:
    """Return a `DateTime` with `ZeroCalendar`.

    Args:
        years: The years.
        months: The months.
        days: The days.
        hours: The hours.
        minutes: The minutes.
        seconds: The seconds.
        m_seconds: The miliseconds.
        u_seconds: The microseconds.
        n_seconds: The nanoseconds.
        tz: The TimeZone for the timedelta object.

    Returns:
        A `DateTime` with a calendar set to using 0000-00-00 as epoch start.
        Beware this `DateTime` kind should only be used for adding/subtracting
        for instances in the same timezone.
    """
    output = __type_of(output)(
        int(years),
        int(months),
        int(days),
        int(hours),
        int(minutes),
        int(seconds),
        int(m_seconds),
        int(u_seconds),
        int(n_seconds),
        tz,
        ZeroCalendar,
    )

We could still make it so that the subtraction/addition of two DateTime returns a ZeroCalendar based DateTime. It actually makes more sense as you pointed out. And if a e.g. PythonCalendar based DateTime gets subtracted a ZeroCalendar based datetime, a DateTime of PythonCalendar type is returned.

Anyway, this is deviating a lot from the proposal at hand. My design for DateTime I know will take a lot of proposals and convincing to merge. But it remains a topic for the future.

@bgreni
Copy link
Contributor Author

bgreni commented Feb 28, 2025

@Julian-J-S

The proposed solution is intended more for the systems developer crowd within Mojo. While Mojo does try to be pythonic, at the end of the day it is at its core a systems language, and often times we have to invite implementation complexity tradeoffs to conserve both flexibility and performance. And since Mojo does/intends to have the compile time programming features to implement something like this, I think we should.

Maybe one option is to provide both options as constructor

# Duration of 1hour 2 minutes
Duration(
    hours=1,
    minutes=2,
)
# or Duration is terms of frequency: 1/24s
Duration(
    N=1,
    D=24
    base_unit="s",
)

This kind of design doesn't solve the problem of building a type safe and performant API for representing time intervals. Consider this example.

fn wait_minutes(d: Duration):
    ...

fn wait_minutes(d: Minutes):
    ...

...

wait_minutes(Duration(milliseconds=10)) # quietly does nothing, or throws an exception
wait_minutes(Milliseconds(10)) # compiler error

With a keyword based approach we lose the ability to catch incompatible durations at compile time, and have to settle for either it being a noop, or throwing a runtime error. While in the proposed approach we can tell at compile time that milliseconds cannot be safely implicitly converted to minutes, and we produce a compile time error. We also lose out on the flexibility of defining arbitrary duration types as I mentioned before, and making it a more complex struct at runtime also means it's likely a non-starter for performance sensitive applications.

@bgreni bgreni force-pushed the duration-proposal branch from 9b6efe9 to cfdacd3 Compare March 2, 2025 17:37
@owenhilyard
Copy link
Contributor

I wonder if, since Mojo is aiming for some scientific use, it would make more sense to do a unified system for handling units of measure, similar to Rust's uom. Ideally, we would default to using reasonable-size integers, and then have a parameter which lets you use other integers, floats or arbitrary precision numbers. Getting all of the SI units in there should be a decent "first issue" project if the inital push has the 7 basic SI units + more detail for time. This would allow you to store things in whatever units you have, until such time as you bother to convert it to proper SI units.

I think we do need a "Numeric" or "Real Number" trait in Mojo to abstract over combinations of dtype + arbitrary precision numbers.

Ideally, I would like strings kept as far away from the API as possible. I know it's "pythonic", but that tends to turn into a massive mess and ends up requiring rules around whitespace resolution, instead of just using something like a "unit" enum and a magnitude (number or enum).

I agree Ratio is nice to have, but I think we can generalize even more by making everything else able to use the same type aliases as we use for dealing with various kinds of second.

Formatting should be a separate concern, formatting time is hard, and we should really leave that to a separate issue. Let's figure out how to store the data first.

I think that the keyword style is going to have extra overhead, which isn't something HPC users would accept. I think this needs to be as compile-time heavy as we can make it so we can use it for correctness.

@Julian-J-S I think we should have date handling out of this since that means we need to deal with time zones, which is a mess.

Similarly, @martinvuyk, let's table calendars for now and stick to SI unit durations.

@martinvuyk
Copy link
Contributor

Similarly, @martinvuyk, let's table calendars for now and stick to SI unit durations.

SI units are based on universal constants so I have np. with using them. Hours and hence days aren't always the same though.

I wonder if, since Mojo is aiming for some scientific use, it would make more sense to do a unified system for handling units of measure, similar to Rust's uom.

That project is just... wow

Maybe I'm daydreaming but this would just scratch a scientific itch in me:

alias kilo = Magnitude[10**3]()
alias Newtons = (kg * m) / s**2

fn does_material_break(material: Material, incident_force: kilo * Newtons) -> Bool:
    return material.breaking_strength < incident_force

If we go in the direction of building a whole unit framework, wouldn't that supersede the present proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants