-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax idea for for_loop #456
Comments
Hi @JNmpi, Extend .iter to macro programmingSo currently we have something ever-so-slightly simpler than you posted (also updated to reflect that we're not registering nodes to self.nodes = my_path.MyNode.for_node(
iter_on="element",
obj='Al',
other=['Fe']
) Also, currently my_path.MyNode(a=1, b=2).iter(element=['Al', 'Fe']) is instantiating a node, then creating an ephemeral for-node, running it, and returning the result as a dataframe. So if we want it to do anything else, we'll need to abandon that functionality (I'm 100% fine with this, but just FYI since this is a feature you requested).
I'm afraid I need to push back a bit against this. I think we could do it with some complicated black-magic, but I'm not convinced that we're doing either the users or future maintainers any service by going through the contortions necessary to make it work. Assuming that we want to abandon the current
So far I'm totally on-board and this initially sounded good to me. But thinking about it deeper, what happens when some of the non-iterated input is coming from another node? wf.something = my_path.Something("foo")
wf.loop_node = my_path.MyNode(a=1, b=wf.something).iter(element=['Al', 'Fe']) Now this ephemeral wf.something = my_path.Something("foo")
wf.my_node = my_path.MyNode(a=1, b=wf.something, element='U')
wf.loop_node = wf.my_node.iter(element=['Al', 'Fe']) Now Alternative (that I also don't really like)One thing we could do both easily and cleanly is to overload my_path.MyNode.iter(element=['Al', 'Fe'])(a=1, b=2) is equivalent to my_path.MyNode.for_node(
iter_on="element",
element=['Al', 'Fe']
)(a=1, b=2) which is in turn equivalent to Workflow.create.for_node(
body_node_class=my_path.MyNode,
iter_on="element",
element=['Al', 'Fe']
)(a=1, b=2) This is unambiguous and easy to implement and maintain, but note that the shortest-cut still differs fundamentally from the two more verbose approaches: in the latter two, because we've explicitly said which inputs are being looped on, we could move the my_path.MyNode.for_node(
iter_on="element",
element=['Al', 'Fe'],
a=1,
b=2
) For self.loop_child = my_path.MyNode.iter(element=['Al', 'Fe'])
self.loop_child.inputs.a = 1
self.loop_child.inputs.b = 2 So while this approach is explicit and easy to maintain, I still see it being a headache for new users who try something like self.loop_child = my_path.MyNode.iter(element=['Al', 'Fe'])(a=1, b=2) inside a macro definition, which doesn't work because it's actually running the node and assigning the run output as the attribute. I'm not totally convinced that we're in a "no free lunch" situation here, there might be an intuitive syntax tighter than (server comment moved to #475) |
Just some thoughts regarding the syntax suggestions for the for_loop. An easy and intuitive way to overcome the issues you described would be the following syntax:
gives a class object and can be used e.g. in writing a macro. To actually run it we would use:
This syntax avoids the pitfalls and having to call explicitly run() makes it much more consistent with our existing syntax. The extra code/overhead to implement such a syntax is for me perfectly fine if it reduces the barriers for peope to use our tools. |
No, I'm afraid that doesn't resolve things for me. This doesn't address my concern that Taking the rest very literally, having |
Your argument would be true if
would modify the input, and only when we run it would we have a conventional instance. My syntax suggestion is to use exactly this delayed feature to modify the input. So if we support delayed input, there is no reason not to extend it to iter etc. |
I'm afraid you've really lost me here. As you say, this instance runs behaviour held inside node =my_path.MyNode(a=1, b=2)
node.inputs.a = 3 updates the input. But then node.run() Does not return the "conventional instance"; it does not, in general, return I'm extremely, extremely uncomfortable with the thought of a node instantiation call -- Just to clarify the stakes here, any proposed changes are so that we can write my_path.MyNode(
a=1,
b=2
).iter(
element=['Al', 'Fe']
) instead of my_path.MyNode.for_node(
a=1,
b=2,
iter_on="element",
element=['Al', 'Fe']
) Honestly, the gains here just don't seem to be worth anything remotely approaching the costs. |
As in my first reply, I think the most reasonable way to accomplish this is to have the For users I still think this is a little sketchy as they can wind up destroying one of their node instances by calling an innocuous sounding method (ie one that isn't named "delete" or similar). So I'm not convinced it's great but neither do I think it's the worst. The more pressing issue is that this package has a really bad bus factor right now and for the sake of maintainability I don't think we should be adding features with this level of complexity to payoff. |
The main idea is not to create a new node or destroy the original one, but to modify the node. This is something the user experiences when calling .run(), which modifies the node by executing the function and setting the output. The same is true for another dot property - .inputs. Calling node.inputs modifies the node instead of returning a copy of the original. This behavior is made possible by delayed execution, which is a key design feature of our workflow approach. My idea is to extend exactly this existing behavior to provide high flow/control complexity with easy to understand syntax. Thus, if we made it a design criterion, applying the dot functions to the node would modify the existing node and return it (rather than a copy). This would open up a clean way to provide new paths for intuitive syntax constructs. It would also not add much complexity, since all this method would do is pass the input arguments to another function. |
Suppose we start with a function node First is for the The second is to change paradigms where for-looped behaviour is achieved in some fundamentally different way, and modifying the The third path is as I suggested in my first reply, that |
Since @JNmpi told me to leave a comment here, I also weigh in. I’m also in favor of the syntax: wf.nodes = wf.create.my_path.MyNode(a=1, b=2).iter(element=['Al', 'Fe']) And from the impression that I got from this discussion, I think we can go for the third option that you named in the previous comment1. The biggest concern that arises in this option is the kind of a case that you have already named above: wf.something = my_path.Something("foo")
wf.my_node = my_path.MyNode(a=1, b=wf.something, element='U')
wf.loop_node = wf.my_node.iter(element=['Al', 'Fe']) But on a more conceptual level, I actually don’t really understand what this code snippet would mean in terms of workflow. For me it basically makes as little sense as this: wf.something = my_path.Something("foo")
wf.something.inputs.my_parameter.value = some_parameter
wf.something_else = wf.something So in my opinion there’s no need to be worried about this case. (server comment moved to #475) Alternatively, I’m also ok with the current wf.something = my_path.Something("foo")
wf.loop_node = my_path.MyNode(a=1, b=wf.something, element='U')
wf.loop_node.iter(element=['Al', 'Fe']) For me this is very much equivalent to @JNmpi’s suggestion and I guess this doesn’t create an orphan. Footnotes
|
@samwaseda, good to hear your thoughts! I have been waiting in hope that someone beyond @JNmpi and I would be involved here -- if this package is going to survive as an open source project, it is critical that someone other than me be introducing new features.
Agreed! So.... do you want to do it? 😈 It's really not too hard it's just a bit gritty because to do it well one needs to understand a bunch of different aspects of how the nodes are behaving. Here is a mock-up guide for what I have in mind: def StaticIO.iter(self, **kwargs) -> For:
node_kwargs, iterating_input = self._parse_iter_kwargs(kwargs)
original_state = self.__getstate__()
updated_state = self._fix_iterating_type_hints_and_values(original_state, iterating_input)
iterable = self.__class__.for_node(
iter_on=tuple(iterating_input.keys()),
)
iterable.__setstate__(updated_state)
if self.executor is not None:
# Executors get purged from state because they are not pickleable
iterable.executor = self.executor
broken_connections = self.disconnect()
# self._reform_connections(iterable, broken_connections)
# Honestly I don't know how self.disconnect will interact with having copied the state...
if parent := self.parent is not None:
# Parents get purged from state so serialized nodes don't take their whole graph with them
warning.warn("Maybe we should let people know they are losing a child??")
parent.remove_child(self)
iterable.parent = parent
return iterable Additionally, we'll probably want to pass annotations from the child class onto the iter method so users get docstring and tab completion about what input channel names are available. There's also a couple commented lines there with connections -- channels similarly break their connections on Anyway, only takes about 10 minutes to draft the outline of how to do it, but there are some details that will be a bit time consuming, and what I think will really take time is writing tests to make sure the edge cases are all captured. So it's all doable, but (a) I don't have time to do it right now, and (b) at some point somebody else needs to be able to maintain and debug these things, and while very easy on the user-side this is a feature that introduces complexity on the developer side.
I have a non-blocking disagreement with you here. "Well, this is a silly way to do things so users won't do it and thus we don't need to worry about it" is really, really bad defense. Don't underestimate the ridiculousness of our users! (Sorry, users 😂) With that said, I agree that it's unlikely enough that we can get away with a warning or log entry that it looks like maybe they've lost something they cared about and leave it at that. But we shouldn't let it pass completely silently!
Yes, we could replace the current wf.something = my_path.Something("foo")
wf.body_node = my_path.MyNode(a=1, b=wf.something, element='U')
wf.loop_node = wf.body_node.iter(element=['Al', 'Fe']) In this case
I think this related to a "new paradigm". Perhaps Joerg can speak more to it, but the only interpretation I had was that Maybe it instantiates a sort of meta-node that only metastacizes into an actual node with some fixed behaviour (i.e. run the function wrapped by the node definition directly or iterate over the wrapped function?) at runtime? I can't rule this sort of thing out exactly because it's a different paradigm, but it is definitely far from what we have, so regardless of whether or not it's a good idea it would be a very time consuming idea. |
What I meant was to apply it really in-place, so my example was as it was: wf.something = my_path.Something("foo")
wf.loop_node = my_path.MyNode(a=1, b=wf.something, element='U')
wf.loop_node.iter(element=['Al', 'Fe']) The reason being because I started seeing the point that it might be a problem that
Hmm that was not really the impression that I got when I talked to him. I got the feeling that It was more like "you can change the input values via |
Ah, I indeed misunderstood your example. I think maybe I chose the term "orphan" poorly; I don't mean there's a node that used to have a parent and now it doesn't, I mean there's a I don't really see room for an "in place modification" that, for instance, transforms our So to me this is what the different paradigm is, it's when I can't rule out such a different paradigm, but it is certainly very different. A major disadvantage I see is this the IO type hinting (which is strictly enforced by default) differs between the looped and non-looped versions, so we couldn't make (typed) IO available at the class level as we currently do. This is just one of many things that would change if for-loops are not simply composite nodes but some other entirely different beast. I have no clear picture for what rules would govern such a different paradigm that is not based around nodes with static IO, but I'm relatively confident it would not be a quick transition, and so far don't see that it's critical to throw out the current paradigm. |
I think I see where the two of you are coming from -- a node does a thing when you run it, and you would like to be able to tell the node to instead loop over doing that thing when you run it. But that's just not how the current infrastructure is set up; nodes do what they do, and if you want them to do something different then you need a different node. This is coupled to all sorts of systems: to type hinting and connection validation, to parallelization, to provenance tracking, etc. And it is largely sensible because we start with this paradigm that there's nodes, we know their interface (IO) at the class level, and when you run them they do the thing they do. From this pretty simple axiom you can derive how all sorts of other pieces "ought" to behave pretty nicely, with extremely few exceptions and only a couple other axioms needed to build the entire workflow paradigm (mostly with how data/signal connections behave). Again, there might be some other internally consistent paradigm where a node-like thing sometimes runs one |
@liamhuber, below are a couple of ideas and thoughts to make the syntax for controls such as for loops and executors in pyiron_workflows easier. They are not intended as a final solution but more to start a discussion of how features that are considered by users to be (too) complex can be made easier and more intuitive.
Extend .iter to macro programming
This should be easy to achieve by taking all relevant info to perform the present definition, i.e.,
This would be much more intuitive. During the workshop people were actually trying such a construct.
(server comments moved to #475)
The text was updated successfully, but these errors were encountered: