-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow class in pyiron_contrib as basis for ironflow #194
Comments
Hi @JNmpi, thanks for the feedback!
Agreed. In terms of timeframe, I think I want to wait until the stuff in contrib actually gets moved to base before translating ironflow on top of it. Currently it is still evolving too quickly and missing key features the ryven+ironflow set has (e.g. ontological typing). I am finding it very useful when developing the workflow stuff in contrib to keep the eventual GUI representation in mind -- it helps to streamline the logic.
So good news: some are just already implemented! After our last conversation, worked on the
Good idea! I didn't implement this yet, but it's easy. For
This works! So does item access, e.g.: wf.my_array_or_list_node[some_starting_index:some_ending_index]
I also like this. It will muddy the source code a little bit, because the code that interprets connections needs an extra
This is also done! The core
I like these ideas and they seem achievable, but I haven't done enough work yet on these topics to have anything really concrete to say. The only difficult thing I see is here:
In many graphs, I think it's impossible to say what the "final" node is, so this will be difficult to fully automate. However, we could probably handle edge cases, like when the workflow has only a single un-connected output, or allow users to specify which output to use here manually when creating the workflow/macro.
Long story short: I agree with the principle, and I think it's possible, but the implementation will take some thinking/cleverness. This is something that Marvin and I have talked about in this tinybase PR, e.g. in this comment. Overall, I am in complete agreement that collecting IO together in composable classes is desirable. The catch is to make this compatible with a lazy graph-based approach. E.g. this works quite nicely: plt.scatter(wf.my_md_node.outputs.md_output.temperatures) But, if wf.create.node.scatter_node(y=wf.my_md_node.outputs.md_output.temperatures) In such an example simply pointing to I think the trick will be to get the
I know we talked about taking file paths as IO, which is a good and necessary idea, but I'm afraid this example is a little sparse for me to get what you're driving at. Could you expand?
Yep, absolutely! This is another thing where I agree 100% and think it should be very doable, but just haven't done enough to have a particular comment. |
Ok, the |
Hi @liamhuber, thanks for your super fast reply and translating some of the ideas so quickly into code. Below some thoughts/comments regarding the discussions above:
Jan is pushing towards keeping pyiron_base small and actually reducing its functionality to make it easier to maintain and less complex. Thus, the best strategy would be to develop these concepts independently from pyiron_base until it reaches the functionality and can replace pyiron_base or integrate it.
Thanks! I tried the new features and the ones you implemented yesterday and it is really fun using them.
An idea would be that the dataobject is used in the function definition but 'translated' by the decorator into a real object containing all the subparts of the output. Thus, node.outputs.x would be directly available, no magic is needed in the function call.
Below a sketch of the functionality that I had in mind. Essentially, the workflow/transfer of data should be extended to files.
|
@liamhuber, one more issue. When playing with your latest workflow_example notebook and extending it I encountered a strange behavior. I first extended your plot node to show also labels (it is really easy now to make such extensions):
Using this modified function I run then your pyiron example workflow:
If I then change the element to Cu I get a superposition of the Al and Cu graph. This is a super-nice feature. However, in the print statement and in the label this change is somehow not reflected, which is really strange since the correct element and structure are entering the workflow:
Any idea what is going on and how to cure it? |
Another small idea/suggestion: To effectively work with the new objects/nodes it would be helpful that they have nice repr infos. For example, when calling
it would be nice to see which labels are available. This applies also for the outputs as well as some other objects. Generally, for every user-accessible object, it would be nice to get not the cryptic default message but the relevant info about the content of the object. One last observation: When using label as parameter name (e.g. in the plot example above) things were not working. I guess this is related to label being a function for inputs. Since label is a rather common name for a parameter it would be nice to enable it or at least through an error message if the user wants to use this name. |
@JNmpi, thanks for all the great feedback! TimeframeYes, I really like this direction for base. I think we're on the same page then. I mostly meant that while layering ironflow on top of this workflow stuff is absolutely the plan, it's not imminent. IO classesYes, wrapping the raw data classes is a great idea! I wasn't thinking in this direction yet so it's good you brought it up, but the wrappers/decorators are working so well for packaging nodes that I'm very optimistic for a similar attack here. File-based functionalityOk, I think I see where you're going now. This sort of thing is definitely necessary for operations like Vasp, and I see also how it reflects your ideas on a SnakeMake node. Plot extension
Yep, I see the issue! The catch is that the plot argument is set to a value rather than a channel: You're actually disallowed from connecting the input of one node to the input of the other, so just removing If you want to re-use the chemistry information, there are two generic possibilities: create an upstream source for this information and pass it to both the Here's some example code that uses the former approach (should work with the current HEAD of pyiron_contrib): @Workflow.wrap_as.single_value_node("fig")
def scatter(
x: Optional[list | np.ndarray] = None,
y: Optional[list | np.ndarray] = None,
plt_label: Optional[str] = None
):
print (x, y, plt_label)
plt.scatter(x, y, label=plt_label)
plt.legend()
return plt
@Workflow.wrap_as.single_value_node("element")
def element(element: str = "Al"):
return element
wf = Workflow("with_prebuilt")
wf.element_chooser = element()
wf.structure = wf.add.atomistics.BulkStructure(
repeat=3,
cubic=True,
element=wf.element_chooser
)
wf.engine = wf.add.atomistics.Lammps(structure=wf.structure)
wf.calc = wf.add.atomistics.CalcMd(
job=wf.engine,
run_on_updates=True,
update_on_instantiation=True
)
wf.plot = scatter(
x=wf.calc.outputs.steps,
y=wf.calc.outputs.temperature,
plt_label=wf.element_chooser
) Then update the species to generate the layered plot with The fact the plots get layered was actually not intentional on my part. And now you'll see it can really be a bit of a pain, because we actually get three plots! This is perfectly sensible, albeit inconvenient, because the We could avoid this by manually managing the flow control, e.g. setting Representations
This actually surprised me, because I thought it was already there! It turns out there were two problems: First, Second, As a temporary stop I've added an extra method These fixes are here, and are already available on HEAD. "label" and naming
IMO, this is a little bit tricky from a UX perspective. I actually switched to using "label" from "name" exactly because I thought it wasn't so common! I still think that's true, but obviously I wasn't in matplotlib-state-of-mind 🤣 The trouble is that this variable name needs to be: short, descriptive, easy to understand/remember, and not commonly used by any other developers. We could solve this pretty easily by adding specificity, i.e. I'm not closed to changing this, I just don't currently have any ideas I like better than "label". In the meantime, you're absolutely right that a better error is needed when users try to define a node with protected channel names. This is already the case for, e.g., assigning nodes to workflows with names that conflict with workflow, so I'll do something similar for nodes |
@liamhuber, thanks again for your fast and very helpful reply. Some more thoughts regarding the pyiron plot example. Based on your explanation and example I came up with the following code that behaves as intended:
and calling the workflow for different elements via:
While the above code is working, the syntax is for my taste not as simple and flexible as I would like. An idea would be to overload call to generate new instances:
The above pseudocode has in my opinion several advantages. It allows to access all input parameters in any node of the workflow in a highly intuitive manner. An additional bonus is that this the syntax is similar to our present pyiron notation. Where I am not so sure is the first line. Effectively it combines a copy statement
(to create a new workflow instance) and giving it a name. It thus uses a workflow instance as a class definition. This may be a bit unusual but I think I could get quickly used to it. If the input is on the node level we could consider the following pseudocode:
|
I like your wrapper Using the workflow instance as a class might be possible, e.g. by overloading the from pyiron_contrib.workflow import Workflow
@Workflow.wrap_as.node("y")
def some_node(x):
return x
some_node_instance = some_node(label="foo")
wf = Workflow(label="original", some_node_instance, bar=some_node())
# Initializes the workflow with two nodes
# the arg parsing is already working, and the node keeps the label "foo"
# kwarg parsing is not implemented, but I imagine it adds the node and re-labels it,
# in this case to "bar"
wf.input.foo_x = 5
# Currently, workflows dynamically look for unconnected channels
# and expose them as {node.label}_{channel.label}, so this works
new_wf = Workflow(workflow=wf, label="new", bar_x=6, baz=some_node())
# Where...
# the workflow kwarg lets us know we're initializing as a deep copy
# the label kwarg is optional and obvious
# (might be mandatory later, e.g. if labels set working directories or something)
# the bar_x kwarg gets found in inputs, and is used to initialize the value
# the baz kwarg is _not_ found in inputs and its value is a node, so we add it like usual One thing you couldn't do this way is something like So for your example above it would just be new_wf = Workflow(workflow=wf, label="new_wf_name", element_chooser_element="Ni")
new_wf.run()
# (although the Workflow.run stuff still needs to be implemented) Under the hood I think this gives a relatively tight and understandable syntax without the implementation headaches of overloading I'm a bit under the weather today, so unfortunately I don't think I'll get to the implementation of these ideas until tomorrow or Friday. |
There is no urgency in discussing these issues. The discussion mainly serves to write down and sort some of my ideas and to check them with you. With respect to instantiating a new workflow I see your points. What I am not happy yet is that a node has a different look and feel in the above-suggested implementation. The picture/use case I have in mind is that the user constructs a workflow with a specific set of default parameters for each node it contains. This workflow can be uploaded to a database (e.g. node/workflow repository by providing a semantic path). The user could then access the node via
and providing structured input down to each node:
What I would like to have is that all parameters are available without having to define a single_node decorator for each variable that I want to expose to the user. The original workflow with its specific parameter settings acts then as a default template, where to modify the workflow template I have to modify only a few elements. This structure corresponds to our present job structure, only making it much easier to write and implement new job types. A further advantage is that workflow templates can be created both graphically (via ironflow) as well as directly from code. Like you were mentioning we probably will need a few iterations to come up with a intuitive and flexible solution. |
Summary
The functionality of @liamhuber workflow class in pyiron_contrib could be a good starting point for interesting extensions and merging pyiron_base and ironflow.
@liamhuber, I had only now a chance to install your workflow demo class and play with it. It is really great work. Even though it is presently only a demo I see a lot of potential. I tried in the attached Jupyter Notebook to briefly summarize and sketch some of my ideas. It is only a very preliminary sketch and it would be good to talk about it. The notebook contains many ideas and suggestions. Some of them are probably straightforward to implement, others may be hard or impossible. It is more meant as a collection of ideas rather than a list of tasks.
Node-based pyiron concept
Based on the workflow example implemented in pyiron_contrib by Liam a few suggestions/ideas.
Make Workflow the next generation pyiron Project object
Design concepts
A node with a single output should view this output directly at object level
This node object should behave as similarly as possible to the output object, i.e., its representation should be the same
To set a node link the node reference should be sufficient
The original code block can then be simplified:
New version (pseudocode)
Execute workflow
Utility functions
Create new workflow with same workflow but modified parameters
Extend data objects
In nodes such as calc_MD the number of output objects/values is rather large. It would be therefore highly attractive to replace them by data objects, which could provide additional functionality (e.g. ontology). The decorator should automatically translate the object to provide all fields. Alternatively, we could use ontolgy to connect individual elements (e.g. positions) with such a data object (ontology knows that positions is part of MD_data.
Extend input and output
File objects
Make sure that files are correctly transfered, copied, etc.
Storage
Include (HDF5) storage in workflow object
When node gets output, put (filtered) data in workflow data store. Should replace our present HDF5 storage.
The text was updated successfully, but these errors were encountered: