Old Research Activities

During my Ph.D (from 2002 to 2005, in the RESO Team at the LIP lab, ENS Lyon, France), I was working on Distributed File System in Clusters with Loïc Prylli, Olivier Glück, and Pascal Vicat-Blanc Primet. My Ph.D dissertation is available here.

Main Topics
Efficient Usage of the I/O Architecture when Distributing Files in Clusters
Existing projects focus on data and workload striping and parallelizing to provide scalable servers that can sustend high performance needs of clusters file access. System Area Network that are available in this environment are not fully used. We developed ORFA to study the impact of an efficient usage of such network when accessing remote files. This work may later be used as a underlying layer to enhance network usage in existing parallel file systems.
Dealing with Memory Registration in the Linux File System Stack
We ported the ORFA client into the Linux kernel (ORFS) to benefit from VFS metadata caches. Memory registration caused several issues to appear. I developed several patches to make GM registration model usable inside the kernel with user-memory. I also developed GMKRC to enhance ORFS registration. This required to add a address space tracing infrastructure in the Linux kernel. This is what VMASpy aims at.
MX support for in-kernel applications and file systems
MX (Myrinet Express) is the new Myrinet driver. We work with Myricom to enable in-kernel application such as file-systems to fully benefit from the MX performance improvement. This especially includes proposing the entire MX API in the kernel so that both user and kernel memory and even physical memory might be used in communications. This work is based on our work on GM. It is already included in MX CVS.
Transparent User-Level Remote File Access
ORFA client is implemented as a shared library that transparently intercepts I/O calls from any legacy applications. No application rewriting or recompiling is required to get full support for remote files manipulations. Moreover local file access is still available without any library compatibility troubles.
Mixing Myrinet Networks with Standard I/O
Trying to make ORFA server as efficient as possible, we faced the problem of efficiently mixing Myrinet I/O with standard I/O. I wrote several patches to get BIP events through the epoll interface.
Softwares
ORFA - Optimized Remote Filesystem Access
ORFA allows user-level access to remote file system using the LD_PRELOAD environment variable in the client. Several network communication layers are available at compile time : TCP over Ethernet, BIP and GM over Myrinet. The ORFA server provides access to native or memory file system.
The user-level client is fully transparent for legacy applications. Almost all POSIX I/O functions on remote files without being recompiled. Even fork and exec calls, and multithreaded applications are supported.
Documentation
The original design and implementation of ORFA has been fully described in this paper. It was last updated on September 17th, 2003 and may thus not cover all recent modifications in ORFA. However, the main ORFA features should be covered.
Download
The last CVS snapshot (March 6th, 2004) of ORFA is now available for download here. It is released under the GPL license. It has been fully tested on Linux boxes, with hundreds of clients.
ORFS - Optimized Remote File System
ORFA provides high bandwidth remote file access and very low request latency. Its simple client avoids complex caching protocol but could requires to much metadata access.
Instead of adding a cache in our client, we are moving our implementation into the kernel. ORFA protocol is going to be mixed with VFS API to get an interesting compromise between high performance data access of ORFA and metadata caching of the VFS.
ORFS adds support of the MX driver for Myrinet networks and Asynchronous file access for 2.6 kernels.
Download
ORFS comes with a custom memory file system implemented in user-space, a interactive client for debugging, a real file system implemented as a kernel module for Linux 2.4 and 2.6 and a mouting program.
The last CVS snapshot of ORFS (November 12th, 2004) is now available for download here. It is released under the GPL license. It has been fully tested on Linux boxes.
GMKRC - GM Kernel Registration Cache
The high overhead of memory (de-)registration requires to reduce its usage. GMKRC is a kernel module that is placed between your kernel module registering user memory, and the GM module. GMKRC implements a registration cache for user-memory buffers. It requires a the VMA Spy kernel and a patched GM to allow user-memory registration with a kernel port.
VMA Spy
GMKRC requires to trace address space modifications to maintain the registration cache up-to-date. The 2.2 Linux kernel provided a unmap VMA operation which was called when pages were unmapped. But this has been removed since 2.4 because it was unused. Moreover, GMKRC also requires to know when a VMA is duplicated (through fork).
The VMASpy patch implements a generic infrastructure to attach spies to some VMA. Each spy may precise several callbacks that are called when the VMA is modified, that is partially or entirelly unmapped, forked or when its protection or other flags are changed. Spies are automatically propaged when VMA are split or merged. Update: This feature is mostly superseded by the MMU Notifiers that were included in upstream Linux kernel 2.6.27 in late 2008.
Download
The VMASpy patch is available for download for Linux kernel 2.6.11 and 2.4.30. The latter should apply on most recent 2.4 kernels. The former won't apply on recent 2.6 kernels.
Updated on Tue Jan 26 11:00:23 2010.