35

YouTube:https://www.youtube.com/watch?v=zgqISj_5YGI
Text:
I'll try to explain you the view of the Euro HPC and especially the scientific people who are making use of supercomputers and how they plan to use CXL. This is based on our experience being involved in quite a number of European projects where CXL is envisioned as playing a big role.

So first, starting with SiPearl, what do we do? So we do microprocessor for supercomputers, so that type of device, microprocessor. On this surface, we plan to integrate up to 60 billions of transistors. And of course, this will be integrated into blades and racks and finally supercomputers by our partners, typically Eviden and HPE. On this type of supercomputer, we plan that there can be between 5K and 150K microprocessors involved. And in that respect, energy efficiency is very key. In this respect, when you have such a number of microprocessors, we try to make it energy efficient by betting on technology like 60 nanometers. So integrating more and more microprocessors or transistors on the same surface will reduce the energy consumption.

So before jumping into CXL, I will make an introduction of SiPearl  because we are quite new in the ecosystem. So as I said in introduction, our DNA is coming from Europe. It is based on the fact that today, one third of the supercomputers are based in Europe, but only 5% of the technology is coming from Europe and 0% of the microprocessors is coming from Europe. So this is a fact for the European Union and especially the sovereignty and the IP issues and security. It has been decided that we should get more independence. And so there has been a consortium that has been created in 2018 that is called EPI, European Processor Initiative. And actually, SiPearl is the industrial arm of that consortium. So we started in 2019. So today we are 130 employees distributed across France, Germany and Spain. Recently, maybe you heard in the press, we finalized our series A closing. So now we are more and more investor. So still European funding like the coming from the European Bank of Investment and the European Innovation Consortium. We also have funding from the French Sovereignty Fund and also Eviden and Arm are now stakeholders of SiPearl. Okay, so our architecture is based on Arm. The reason is that with Arm we can build some energy efficient microprocessor and it's the best bet for a quick time to market. Also Arm has a proven ecosystem across Arm system ready and that's the path we want to take to be already in the market. So today we are of course fabless and we plan to manufacture in Taiwan with TSMC. This is mandatory at this point at this technological level.

Okay, now a short slide on EPI. So that's the ecosystem in which we run. So EPI was funded in 2018 under the impulsion of Atos that is now Eviden. And in this ecosystem we can leverage technological input from our industrial partners. And you can also see that in this ecosystem there are a lot of research institute, university, and supercomputing centers which are our final customers. And these are the guys that are giving us inputs and explaining to us their view of how they want to use CXL.

Okay so EPI is not our only ecosystem. We also have some very strong partnership with leading supercomputer manufacturers like Eviden and HPE. And also as we build a general purpose processor we also have to anticipate integration with accelerator specialists. So we have a very close partnership with the four major actors in this market which are AMD, Graphcore, Intel and NVIDIA.

Okay now maybe a snapshot of the high-level architecture of our first generation microprocessor that we call Rhea. So it is based on a squad of ARM Neoverse V1 cores which are architecture around the ARM Mesh Fabric so it's the current mesh network CMN700. And interestingly we also have one SVE per core scalable vector extension that will be used for increasing the performance of the application. Also we have integrated high bandwidth memory, HBM. So HBM, yes, with the extensive bandwidth. So we have huge bandwidths and the fact that HBM is integrated on the same package as the computing chips then we can save energy and also bet on very low latency. That's a balance we want to make between HBM, DDR and also CXL memory. And interestingly we also have a bunch of PCIe/CXL controllers. I cannot disclose the exact number but it's quite a huge number of CXL interfaces we have and we can configure them as either root or endpoint. So that's our building blocks.

So next let's see how we can find a path and how the European project can leverage CXL for their use cases.

Okay so it all comes with EuroHPC and EPI. So actually we have two flagship projects. So on the left the general purpose processor, so the Rhea that we are making as SiPearl. And on the other side there is the EUPILOT project. So the EU project is combining software stack or open source software stack, open source hardware stacks to create the infrastructure based on RISC-V chips. And this SoC will be combined with our general purpose and to build the combination of the two will be based on the CXL interface. So once we have those flagship projects we create some software development vehicles. So basically SiPearl are anticipating and we are creating a server that we can distribute and make available to our customers. And through the EUPILOT path there will be some PCIe acceleration platform that will be created. And this will be the input for downstream application projects where the scientific people will be involved. So here I'm quoting three of these projects but there are many more. So the RISC-V project, OpenCUBE and PLASMA. I will go into more details for two of these later on and you can understand how they want to leverage CXL. So actually for the European HPC ecosystem CXL is perceived as a standard and a unique one that will allow for a current link between the general purpose processor and accelerators. I'm not saying that there is no interest in Type 3 and the memory expansion. That's of course of importance but at this stage what we want to test and evaluate is Type 1 and Type 2. That's the major attack we want to bring from CXL at this stage.

So now I would like to give you a snapshot of two of these projects. So PLASMA, that's a very emblematic project because it reconciles a scientific project with the technology that is developed in the world of supercomputing in Europe. So there are four existing applications that are targeted to simulation of PLASMA. So two of these, the BEAT and the GENE, will be used to simulate PLASMA and anticipate their use in fusion devices like the ITER project that is developed in Europe. There is also the Vlasiator project which is purposed at predicting near-Earth space dynamics and typically solar winds which may affect devices like spacecraft in a geostationary position or a ground-based power grid. So the designers of this application will work very closely with our engineers and they will take advantage of the feature of our chip. First, we will adapt the existing application and take advantage of the scalable vector extension. That's one thing. We will profile the existing application and the reference dataset so that we find the right balance between HBM and DDR. And the third one, which is of interest for that talk, is that we will also profile the application and the dataset so that we really take advantage of CXL. So in that respect, the data movement between the GPP and the accelerator will be very deeply analyzed and the objective is really, through that project, to evaluate the value of CXL for that type of project.

Another project on which CXL is planned to be used is the RISER. So RISER will develop the first all-European RISC-V cloud server infrastructure. And this is still to increase the Europe independence in that respect. So it will take the technology that we develop both on the general-purpose processor and on the accelerator processor. So we'll develop two boards in that respect, in that project. One board will be a PCIe acceleration board and another will be a server board. And with the PCIe acceleration board, we want to test or evaluate some very computer demanding processes like data compression, decompression, and also machine learning and AI will be evaluated in that respect. And in this project, here we have the RISER accelerator and the plan is to use CXL type 2 as the cache current chip-to-chip links. So the plan is to deliver some open hardware and for that we want to bet on standard and CXL is seen as the standard that we should use for that type of use case.

Okay, so now let's see how it will work in the details in terms of topology. So we have our Rhea-based server on which the application will be executing and we have a collection of accelerators. So here I represented three but there may be less or more accelerator board in the neighborhood of the Rhea server. So within each accelerator, there will be a tightly coupled EUPILOT chips forming a NUMA node and all these accelerators will be connected to the Rhea through CXL and in this context we will establish or leverage the cache coherency so that we eliminate or we reduce the data movement between memory.

And interestingly, we also have a low-power and low-latency chip-to-chip interconnects between the EUPILOT. So currently it's internal technology but I think that peer-to-peer communication taken from CXL3 could be used in that respect in the future. When it comes to having to have more accelerator in the neighborhood and more than the Rhea and support natively, then in this case we plan to use CXL switches and this will allow us to connect more devices, more accelerator device and distribute more efficiently the workload of the application between those accelerators. So that's really what we want to do. It's quite simple in theory and I think we just take a few bits of CXL and also the message I want to give is that in the description of action, meaning all the spec that we receive from the European stakeholders, they consider that CXL is available on the shelf and they think that CXL type 1, type 2 is something that is already ready and we can start developing on. In this configuration also, what we will look into is the effect of the switch latency on the workload because as you know, by using switch we are adding an op and that may have an impact on the latency.

Okay, so that's my concluding slide and I want to pass a message that European HPC can be considered as a relevant market fraction for the CXL cache coherency use case, not only for the memory disaggregation but cache coherency is really something we want to bet on. But as much as we go into the project and when we start the implementation specification and the implementation steps, we figure out that the reality is not really aligned with the expectation. It's the fact that there is no effective industry implementation of CXL type 2 and not even type 1 in the market. And also discussing with the partner, we feel that the feedback is lukewarm on CXL type 2 and that's really a worry for us and our partner in the European ecosystem. And we see that the slow and late takeoff of CXL type 1 and type 2 may affect the success and the dissemination of our project, so we hope that we can accelerate all this and maybe find some gap filler solution. So you mentioned your Gismo project, that's maybe something that could make sense and we can fill the gap. Also what we see is that if CXL supports only memory expansion, then it will be only dedicated to the cloud market. And at SiPearl, we really encourage initiative on type 1 and type 2 in the support of the European HPC and we want that CXL really becomes the Swiss army knife and not only the single blade add-on. So that's our view. Maybe it needs to be corrected and I'm happy to hear your feedback on it. So with the message that as Europeans we think that CXL is a standard we should go and we will make sure that it happens.