Follow this link to skip to the main content

Collaboration Topics - System Software

This collaboration focuses on research and development of parallel file system interfaces and tools, system resource management capabilities, operating system evaluation, and software for high-performance interconnects. Current activities include the test and evaluation of technologies and tools associated with the Lustre parallel file system, the development and analysis of middleware to encapsulate application I/O requirements and abstract the capabilities of the underlying parallel file system, enhancements to the SLURM resource manager to support evolving extreme-scale application workflow requirements, evaluation of the Portals low-level interconnect programming interface, and evaluation of the Kitten lightweight kernel operating system.

Accomplishments

1)  Lustre: ZFS backend development by LLNL tested by CEA, discussions and shared experiments about failover in Lustre for example can be considered as a success in that domain. Moreover presentations or focus on some tools like Robinhood or shine for example are also very interesting to highlight the filesystem administration.

2)  I/O layers: discussions, strategies, point of vue exchanges (diod versus ganesha and 9P for example), but also several group meetings about application I/O layers like Hercules, parallel log structured file systems(PLFS), and scalable/check-point restart code(SCR) have been done. This is a real success for this new collaboration subject.

3)  Simple Linix Utility for Resource Management (SLURM): SLURM workshop organization, new features like topology aware, kerberos support, high availability were some examples of the success of the collaboration (all these new features were integrated in the main branch of the SLURM opensource project). Moreover the link created between teams (exchanges through mailing lists), allow the group to share experiences about more general things like batch or scheduling strategies.

4)  Operating systems: the last meeting showed us that it would be very useful to collaborate on this subject especially for testing and exchanging about new operating systems (like kitten for example) and administration tools evaluations and enhancements (like the open source nodediag).

Links and References

More information can be found about Lustre at http://www.lustre.org
More information can be found about SLURM at http://www.computing.llnl.gov/slurm
More information can be found about Portals at http://www.cs.sandia.gov/Portals
More information can be found about Kitten at http://software.sandia.gov/trac/kitten