Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snapshot: gather full pci tree, handling bridges #224

Merged
merged 3 commits into from
Mar 9, 2021

Conversation

ffromani
Copy link
Collaborator

@ffromani ffromani commented Feb 5, 2021

the current implementation of the PCI device snapshot only scans the first level in the PCI tree (actually, PCI forest). Anything attached to bridges is ignored. This was unintended, and this PR aims to fix it acknowledging bridges explictely and handling them correctly. In practices, this means to scan also the devices attached to each bridge, recursively.

Note this affects only the snapshotting code.

@ffromani
Copy link
Collaborator Author

ffromani commented Feb 5, 2021

Ready for design review; docstrings are missing, and I need to review unit tests. hence the WIP. Conflicts with #223

@ffromani ffromani mentioned this pull request Feb 5, 2021
@ffromani ffromani changed the title WIP: snapshot: gather full pci tree, handling bridges snapshot: gather full pci tree, handling bridges Feb 8, 2021
ffromani added a commit to ffromani/ghw that referenced this pull request Feb 8, 2021
We want to expose the NUMA locality of the PCI devices,
much like we already do for GPU devices.
This is very important for high-end, high-performance devices
(NICs, storage controllers).

To enable testing, we update the snapshot with another one
captured with more PCI device informations.
See: jaypipes#224

Signed-off-by: Francesco Romani <[email protected]>
pkg/pci/pci_linux.go Outdated Show resolved Hide resolved
@ffromani ffromani force-pushed the snapshot-pci-tree branch 3 times, most recently from aba6ca6 to 7ec16bd Compare February 11, 2021 15:41
ffromani added a commit to ffromani/ghw that referenced this pull request Feb 11, 2021
We want to expose the NUMA locality of the PCI devices,
much like we already do for GPU devices.
This is very important for high-end, high-performance devices
(NICs, storage controllers).

To enable testing, we update the snapshot with another one
captured with more PCI device informations.
See: jaypipes#224

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Collaborator Author

clarified (again) the commit message

@ffromani
Copy link
Collaborator Author

still not completely happy. Let me evaluate another approach, will update ASAP.

@ffromani
Copy link
Collaborator Author

I'm still testing 9b19ce6 and I'm happier with this approach so far.

@ffromani ffromani force-pushed the snapshot-pci-tree branch 2 times, most recently from fa85ffd to f08ab4b Compare February 22, 2021 09:20
ffromani added a commit to ffromani/ghw that referenced this pull request Feb 23, 2021
We want to expose the NUMA locality of the PCI devices,
much like we already do for GPU devices.
This is very important for high-end, high-performance devices
(NICs, storage controllers).

To enable testing, we update the snapshot with another one
captured with more PCI device informations.
See: jaypipes#224

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Collaborator Author

rebased on top of #227 . I think this PR is now good enough.

@ffromani
Copy link
Collaborator Author

rebased again. No code changes.

Copy link
Owner

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @fromanirh. Just one little request inline to return an error from findPCIEntryFromPath, otherwise this is awesome.

pkg/snapshot/clonetree_net.go Show resolved Hide resolved
pkg/snapshot/clonetree_pci.go Show resolved Hide resolved
pkg/snapshot/clonetree_pci.go Outdated Show resolved Hide resolved
pkg/snapshot/clonetree_pci.go Outdated Show resolved Hide resolved
pkg/snapshot/clonetree_pci.go Show resolved Hide resolved
ffromani added 2 commits March 1, 2021 09:19
In order to maximize the code reuse, we move the code which handles
PCI addresses to its own subpackage.
This is driven by a upcoming patch, which wants to parse PCI addresses
from strings, without consuming the pci package, to avoid circular
imports.

Only trivial code movement + needed renames
e.g.: AddressFromString() -> address.FromString()

Signed-off-by: Francesco Romani <[email protected]>
let's add Warn()ings when PCI scan fails, to improve
the troubleshooting process.

Signed-off-by: Francesco Romani <[email protected]>
Modern (x86_64) systems have multiple PCI buses.
For ghw needs, considering the abstraction provided by the linux kernel,
there is no practical difference from PCI-express and PCI to date,
so we will call them all "PCI".

PCI-to-PCI (see initial note about PCI express) are common, and their
topology depends on the CPU, mainboard and system manufacturer choice.

The linux kernel does not flatten the PCI tree in sysfs.
This choice provides the closest representation to reality, and
unfortunately is also easy to overlook
when exploring simple PCI device trees, such as ones found on laptops,
but becomes evident when exploring more complex trees, like the ones
we can see on servers, even low end.

This means, in case of PCI bridges, we will have
entries inside the directory which represent a bus which
will be both devices and root of subtrees.

In other words, these entries will contain BOTH subdirectories
to represent the devices attached to that bridge, and the files
which describe the bridge device per se.

For ghw, the net effect is we cannot just scan the entries
in `/sys/bus/pci/devices` and be done with it; we need to detect
the entries which represent PCI bridges, and scan those recursively.
Otherwise, we miss to gather the data pertaining to these devices.

So we need a smarter function that does this traversal, moving away
from the nice static list of paths we previously had.

Without this patch (example taken from dev laptop):
{host bridge}
-[0000:00]-+-00.0
           +-02.0
           +-14.0
           +-14.2
           +-16.0
           +-16.3
           +-1c.0 {PCI bridge}
           +-1c.2 {PCI bridge}
           +-1d.0 {PCI bridge}
           +-1f.0
           +-1f.2
           +-1f.3
           +-1f.4
           \-1f.6

With this patch (example taken from dev laptop):
{host bridge}
-[0000:00]-+-00.0
           +-02.0
           +-14.0
           +-14.2
           +-16.0
           +-16.3
           +-1c.0 | -+-3a {PCI bridge}
                     \-00.0 ** PREVIOUSLY NOT DETECTED!
           +-1c.2 | -+-3c {PCI bridge}
                     \-00.0 ** PREVIOUSLY NOT DETECTED!
           +-1d.0 {PCI bridge - no devices here}
           +-1f.0
           +-1f.2
           +-1f.3
           +-1f.4
           \-1f.6

Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the snapshot-pci-tree branch from 341af48 to 1f10653 Compare March 1, 2021 08:19
Copy link
Owner

@jaypipes jaypipes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @fromanirh! :)

@jaypipes jaypipes merged commit b593e32 into jaypipes:master Mar 9, 2021
@jaypipes jaypipes added this to the v0.7 milestone Mar 9, 2021
@ffromani ffromani deleted the snapshot-pci-tree branch March 9, 2021 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants