Old Research Activities
During my Ph.D
(from 2002 to 2005,
in the RESO Team
at the LIP lab,
ENS Lyon, France),
I was working on Distributed File System in Clusters
with Loïc Prylli, Olivier Glück, and Pascal Vicat-Blanc Primet.
My Ph.D dissertation is available
here.
Main Topics
Efficient Usage of the I/O Architecture when Distributing Files in Clusters
Existing projects focus on data and workload striping and parallelizing
to provide scalable servers that can sustend high performance needs of
clusters file access.
System Area Network that are available in this environment are not fully used.
We developed
ORFA to study the impact of an efficient
usage of such network when accessing remote files.
This work may later be used as a underlying layer to enhance network usage
in existing parallel file systems.
Dealing with Memory Registration in the Linux File System Stack
We ported the ORFA client into the Linux kernel (
ORFS)
to benefit from VFS metadata caches.
Memory registration caused several issues to appear.
I developed several patches to make GM registration model usable inside
the kernel with user-memory.
I also developed
GMKRC to enhance ORFS registration.
This required to add a address space tracing infrastructure in the Linux
kernel. This is what
VMASpy aims at.
MX support for in-kernel applications and file systems
MX (Myrinet Express) is the new Myrinet driver.
We work with Myricom to enable in-kernel application such as file-systems to
fully benefit from the MX performance improvement.
This especially includes proposing the entire MX API in the kernel so that
both user and kernel memory and even physical memory might be used in communications.
This work is based on our work on GM. It is already included in MX CVS.
Transparent User-Level Remote File Access
ORFA client is implemented as a shared library that
transparently intercepts I/O calls from any legacy applications.
No application rewriting or recompiling is required to get full support
for remote files manipulations.
Moreover local file access is still available without any library
compatibility troubles.
Mixing Myrinet Networks with Standard I/O
Trying to make
ORFA server as efficient as
possible, we faced the problem of efficiently mixing Myrinet I/O
with standard I/O. I wrote several patches to get BIP events
through the epoll interface.
Softwares
ORFA - Optimized Remote Filesystem Access
ORFA allows user-level access to remote file system using
the
LD_PRELOAD environment variable in the client.
Several network communication layers are available at compile
time : TCP over Ethernet, BIP and GM over Myrinet.
The ORFA server provides access to native or memory file system.
The user-level client is fully transparent for legacy applications.
Almost all POSIX I/O functions on remote files without being recompiled.
Even
fork and
exec calls, and multithreaded
applications are supported.
Documentation
The original design and implementation of ORFA has been fully described in
this paper.
It was last updated on September 17th, 2003
and may thus not cover all recent modifications in ORFA.
However, the main ORFA features should be covered.
Download
The last CVS snapshot (March 6th, 2004) of ORFA is now available for download
here.
It is released under the
GPL license.
It has been fully tested on Linux boxes, with hundreds of clients.
ORFS - Optimized Remote File System
ORFA provides high bandwidth remote file access and very low request latency.
Its simple client avoids complex caching protocol but could requires
to much metadata access.
Instead of adding a cache in our client, we are moving our
implementation into the kernel.
ORFA protocol is going to be mixed with VFS API to get an interesting
compromise between high performance data access of ORFA and metadata
caching of the VFS.
ORFS adds support of the MX driver for Myrinet networks and
Asynchronous file access for 2.6 kernels.
Download
ORFS comes with a custom memory file system implemented in user-space,
a interactive client for debugging, a real file system implemented as
a kernel module for Linux 2.4 and 2.6 and a mouting program.
The last CVS snapshot of ORFS (November 12th, 2004) is now available for download
here.
It is released under the
GPL license.
It has been fully tested on Linux boxes.
GMKRC - GM Kernel Registration Cache
The high overhead of memory (de-)registration requires to reduce its
usage. GMKRC is a kernel module that is placed between your kernel
module registering user memory, and the GM module.
GMKRC implements a registration cache for user-memory buffers.
It requires a the
VMA Spy kernel and a patched
GM to allow user-memory registration with a kernel port.
VMA Spy
GMKRC requires to trace address space
modifications to maintain the registration cache up-to-date.
The 2.2 Linux kernel provided a
unmap VMA operation
which was called when pages were unmapped.
But this has been removed since 2.4 because it was unused.
Moreover, GMKRC also requires to know when a VMA is duplicated
(through
fork).
The VMASpy patch implements a generic infrastructure to attach
spies to some VMA. Each spy may precise several callbacks that
are called when the VMA is modified, that is partially or
entirelly unmapped, forked or when its protection or other flags
are changed.
Spies are automatically propaged when VMA are split or merged.
Update:
This feature is mostly superseded by the
MMU Notifiers
that were included in upstream Linux kernel 2.6.27 in late 2008.
Download
The VMASpy patch is available for download for Linux kernel
2.6.11
and
2.4.30.
The latter should apply on most recent 2.4 kernels.
The former won't apply on recent 2.6 kernels.
Updated on Tue Jan 26 11:00:23 2010.