• No results found

Device owner daemon

In document PCIe Device Lending (Page 77-79)

//Prepare message

request->msg = 42;

request->id = global_id++; send_interrupt();

//Wait for the other host to respond

while(response->id != request->id) {

/* Send once in a while, in case an interrupt is lost. * Not so often as to storm the other machine and * cause slowdown

*/

send_interrupt(); }

//We have the result

result = response->msg;

Code snippet 4.7: Our simple shared memory based communcation channel algorithm

void handle_interrupt() { if(request->id != response->id) { do_stuff(); response->id = request->id; } return; }

Code snippet 4.8: The client of the shared memory based communcation channel

code snippet 4.8.

4.9

Device owner daemon

The host that wants to share one of its devices need to provide a few services to the loaner. In our implementation these services are implemented by a user space daemon.

We want to give the loaner kernel access to the devices configuration space. This it vital for a lot of reasons, including identifying the device type and other vital information. It is also vital in setting up the device. On the other hand, there are parts of the configuration space that can disrupt or even adversely affect the owner host. Some of these registers the Linux kernel will write to under normal circumstances to configure the device. One example are the BAR registers. When the kernel on the loaner host discovers our virtual device, it can attempt to write to the BAR registers to configure it to be within its own windows. However, any such address would not be valid within the other host and even if they where, it would disrupt the mappings. We need to prevent this from happening. In some cases we can bypass parts of the kernel to prevent it, but at other parts of the kernel we have no choice. In addition, the kernel will frequently read back the previously written registers to confirm that it succeeded. We need

to allow the kernel to believe it has access to these registers and to emulate it in such a way that the kernel will not detect it.

Our first take on this problem was to have an in-RAM copy of the configuration space and have the configuration space accesses redirected to this. The accesses we needed the user to be able to physically modify on the device we passed on to the device in addition. Access to the configuration space is exposed as a file to user space programs in Linux. It was fairly easy to implement a first version of this solution. After a while however, we saw that correctly implementing this can be hard. To do this we need to understand all the different PCIe capability registers and emulate anything we can’t allow. Fortunately for us, we discovered that the VFIO interface does this. The capabilities of VFIO is described in more details in 3.4.1.

Since VFIO is designed for use in virtual machines, user space drivers and not "bare metal" use, it has some short comings that we need to circumvent. The first we discovered was that VFIO virtualizes parts of the MSI capability and completely circumvents it as interrupts are still delivered through the Linux kernel. There is a separate interface for controlling interrupts. This means that for some of our accesses, we need to use another access method than VFIO.

While VFIO has the capability to provide us with increased isolation and better handling of configuration space accesses in general, the need for special handling would have increased the complexity of our proof of concept implementation. Because of this we decided to not implement VFIO support.

Drivers and devices will not be expecting multiple hosts to access the same device at the same time. As we have already stated, this is outside our mission statement. The exception to this, SR-IOV devices, are regarded as separate devices and will get no special treatment. To enforce this, the daemon must always know which other host, if any, is currently using a device. Before loaning the device to another host, it needs to disconnect the device from the current user. The current user can also be the local host. The best way for it to prevent the local kernel from interfering with the device is to unbind any currently bound driver from the device and bind a shim driver. The shim driver will prevent another driver from binding to the device. This is the same method used by VFIO and is required before the VFIO interface can be used on a device.

One of our user space daemons will be spawned per device that is exported over the NTB link. When started, it should unbind the driver that has currently bounded to the device and bind a shim driver instead. This ensures that the local host will not use the device concurrently with a remote host. To ensure that only one remote accesses the device a the same time, the daemon will only accept a single connection. The current implementation does not unbind the host driver or attempt to prevent the host from concurrently using the device. This should be implemented, when the solution gets closer to being finished.

Providing the other host with access to the device’s BARs is done using the SISCI API. The API allows us to expose the BAR areas over the NTB. The daemon also provides a reverse mapping service to the loaner. This will allow the loaner to map parts of its own memory and make it available for the device. This is used to allow device-to-host DMA and MSI interrupts. Upon a request, the daemon will connect to the memory mapped by the other host. It will return the local bus address to this mapping. This is important as this is the address is the one the device must use to reach the mapped memory. This is used when the driver orders the device to perform DMA.

In document PCIe Device Lending (Page 77-79)