Tag: Linux

Operating Systems: File System Implementation

Operating Systems: File System Implementation

 

File System Structure

The file system is made of two things: the logical storage unit and the collection of related informations. It resides on secondary storage (disks), providing efficient and convenient access to them (disks) for storing and retrieving data. Also provide a user interface to the storage, mapping logical to physical with the purpose to create an abstraction level for the user.

The structure contains also:

  • the file control block, the storage structure to collect file informations;
  • the device driver, to control the physical device.

The organization of the file-system is layers-based:

Layered Organization for File System Implementation

Starting from the end the device drivers manage the IO devices at the IO control layer. The basic file system once received a command such as “retrieve block 123” translate this sentence into something more device driver compatible. The file-organization module understand files, logical addresses, and physical blocks. The logical file system managed metadata information.

 

Virtual File System

It provides an object-oriented way of implementing file systems, allowing the same system call inferface (API) to be used for different types of file systems. With the VFS it is important to notice the concept of vnode which holds inode of network file details.

Directory Implementation

The implementation of the directories can be done following two strategies:

  • Linear List
  • Hash Table

Let us focus on the pros and the cons of both of them. The Linear List is simple to develop but for search operations is time-consuming. A typical improvement is the B+ Tree option, an  alphabetically ordered list: in fact, in this case, when a miss occurs it is not needed to scan the whole list.

The other option is the Hash Table, which is good for search operations but collisions can occur. This second strategy is useful when the entries are of fixed size.

Allocation Method

The allocation method refers to how disk blocks are allocated for files.

  • CONTIGUOUS: in this case the file occupies a set of contiguous blocks.
    • best performance in many cases;
    • simple;
    • Problems:
      • find space for file;
      • know the file size;
      • external fragmentation;
      • need for compaction off-line or on-line;
    • Variation (Extent-Based Systems, Veritas File System):
      • allocation in extents;
      • an extent is a contiguous block of disks.

 

  • LINKED: each file is a linked list of blocks.
    • no ext. fragmentation;
    • each pointer contains pointer to next block;
    • file ends at null pointer;
    • free space management system (FSMS) called when new block needed;
    • improve efficiency by clustering blocks into groups;
    • Problems:
      • clustering blocks into groups increases internal fragmentation;
      • no reliability-oriented;
      • locate a block can take many IOs and disk seeks.
    • Variation (FAT, File Allocation Table):
      • At the beginning of the volume there is this table, indexed by block number;
      • it looks like a linked list, but faster and cachable;
      • new block allocation simple.

 

  • INDEXED: each file has its own index block(s) of pointers to its data blocks
    • needs index table;
    • random access;
    • dynamic access without external fragmentation, but have overhead of index block;

UNIX File System (UFS) Case Study

It is a file system that combine the schemes:

  • it is composed by a main 4kB block table by 32-bit addresses;
  • there are portion of the table that works as an index allocation scheme with:
    • direct indexing;
    • single indexing;
    • double indexing.

Allocation Methods: summary

The contiguous allocation method is good for sequentiality and randomness. Instead, the linked allocation method is good for sequential but not for randomness. Last but not least, the indexed is a bit more complex because a single block access could require two index block read, and then, a data block read. However, for the indexing, a possible improvment is the clustering which is always a way to improve the throughput reducing the CPU overhead.

The Sun NFS (Network File System)

It is an implementation of a software system for access remote files accross LANs (or WANs). It uses an unreliable datagram protocol (UDP).

Interconnected workstations are viewed as a set of independent machines with independent file system, which allows sharing among these file systems in a transparent manner. In fact, a remote directory is mounted over a local file system directory. With the right permissions (access-right accreditation), any file system (or directory within the file system) can be mounted remotely to a local directory.

It has been developed to operate in a heteogenous environment of different machines, oerating system, and network architectures; and this independence is achieved through the use of RPC primitives built on top of XDR.

The mount procolol is:

  1. establish connection between server and client
  2. mount operation that needs of
    1. name of remote directory
    2. name of server
  3. after a mounting request the server reply with a file hande
  4. the moun operation terminates and changes the view of the client and not the server.

This protocol does not provide concurrecncy-control mechanisms.

The NFS servers are stateless: each request has to provide a full set of arguments. Possible modificaiton has to be committed to the server’s disk before resuls are returned to the client.