Student Spotlight: Advancing Data Storage

February 27, 2008

Shobhit Dayal is helping research groups understand the demands on data storage and one day enhance system designs.

INI graduate student Shobhit Dayal (Pittsburgh MSIN, Class of 2008) is working to design better data storage systems. He is studying these systems as a member of Carnegie Mellon's Parallel Data Lab (PDL) and for a research project under the umbrella of the Petascale Data Storage Institute (PDSI), under the supervision of ECE and CS Professor Garth Gibson.

His work entails collecting and analyzing statistics of static file attributes, such as file size, directory size, access times, name length and disk space utilization. A part of the effort is to provide researchers with tools and services that help to collect data worldwide on static file tree attributes. This data will be aggregated into a large database that can be queried and viewed by anyone.

Researchers and students like Shobhit study the trends of data storage so that storage systems evolve with them. For example, if we know that the vast majority of file systems have 90 percent of files that are smaller than 4KB, but most bytes in the file systems come from the remaining 10 percent that are large files, then the designer can optimize the common case. Understanding these trends gives a great insight into how to build or design storage systems for the future.

Over time, disks get faster and denser, storage gets cheaper, and users generate more data than they ever did in the past. Storage system software is also becoming faster and more reliable, and data formats for file storage keep evolving. These changes affect the amount of data that users and applications store and the way it is accessed.

“As machines are becoming more and more parallel, a newer kind of demand is being placed on storage systems," said Shobhit. "Reacting to highly concurrent accesses and modifications from thousands of parallel threads will become common soon. Petabytes of storage within a single file system is already a requirement. I'd like to help build the next generation of file systems than can manage this sort of workload.”

In the past, studies on data storage have been limited within an organization. The goal of Shobhit's PDSI research team is for users to gather and share data publicly, especially to collect information on static file attributes from across industries, organizations and users and analyze them for trends and patterns. Various national labs and research organizations are contributors, including PDL, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Lawrence Berkeley National Laboratory, Argonne National Laboratory and Pittsburgh Supercomputing Center.

Contributed by Eunjung Yoon, MS18