linux内核--虚拟文件系统【转】-白红宇

linux内核--虚拟文件系统【转】

阅读量：4594 次

发布时间：2019-06-09

本文共 17747 字，大约阅读时间需要 59 分钟。

源地址：

虚拟文件系统，也不知道大家听过没有，反正我是听过了！我们知道在计算机行业，很多东西都不是一定有个官方说：朋友，我最大，你们做的东西，都要是这个样子，否则是非法的。事实上，很多东西都是靠的一种实力，通过实力来慢慢在人们心中成为既定事实。这个事实同样是没有官方的。好了，问题来了，没有官方，就没有标准，没有标准就没有统一，没有统一那就是三国时代，混战当道也！

怎么办？特别是百花争鸣的文件系统，这时linux的内核开发者们想到了VFS(虚拟文件系统)。VFS使得用户可以直接使用open(),read()和write()这样的系统调用而不用关注具体文件系统和实际物理介质。也许你感觉不是很新奇啊，告诉你新奇的事情：在老式操作系统上(比如DOS)，任何对非本地文件系统的访问都必须依靠特殊工具才能完成。这种实现的方式是内核在它的底层文件系统接口上建立了一个抽象层。该抽象层是linux能够支持各种文件系统，即便是它们在功能和行为上存在很大差别。为了支持文件系统，VFS提供了一个通用文件系统模型，该模型囊括了我们所能想到的文件系统的常用功能和行为。这个VFS抽象层之所以能衔接各种各样的文件系统，是因为它定义了所有文件系统都支持的基本抽象接口和数据结构，同时实际系统也将自身的诸如“如何打开文件”，“目录是什么”等概念在形式上与VFS的定义保持一致。因为实际文件系统的代码在统一的接口和数据结构隐藏了具体的实现细节，所以在VFS层和内核的其他部分看来，所有文件系统都是相同的，它们都支持像文件和目录这样的概念，同时也支持像创建和删除文件这样的操作。

实际文件系统通过编程提供VFS所期望的抽象接口和数据结构，这样，内核就可以毫不费力地和任何文件系统协同工作。那么接下的问题，它们直接的关系如何呢，看下边的例子：

                         
write(f,&buf,len);

该代码不用说，应该明白。这个用户调用首先被一个通用系统调用sys_write()处理，sys_write()函数要找到f所在的文件系统实际给出的是哪个写操作，然后再执行该操作。实际文件系统的写方法是文件系统实现的一部分，数据最终通过该操作写入介质。下图给出流程：

下面我就先从整体上对unix(linux)文件系统做个概述，然后在具体下去。Unix使用了四种和文件系统相关的传统抽象概念，如下：

1.文件：就是一个有序字节串。

2.目录项：文件是放在目录中，目录又可以层层嵌套，形成文件路径。路径中的每一项就叫做目录项。目录是文件，这个文件列出了该目录下的所有文件.

3.索引节点:一个文件其实是由两部分组成：相关信息和文件本身。这里的相关信息指的是访问控制权限，大小，拥有者，创建时间等。文件相关信息也叫

做元数据，被存储在一个单独的数据结构中，这个结构就叫做索引点(index node,简写inode)。

4.安装点(挂载点):文件系统被安装在一个特定的安装点上，该安装点在全局层次结构中被称为命名空间，所有的已安装文件系统都作为根文件树的树叶出

现在系统中。

5.超级块：是一种包含文件系统信息的数据结构，里边是文件系统的控制信息。

对应于上图，我们知道VFS是介于用户文件和文件系统之间的一个概念，所以如果文件系统想要穿透VFS供用户空间使用，就必须经过封装，提供一个符合这些概念的界面。上述每个元素都对应一个对象，该对象有属性结构体，描述了该对象的属性。有操作结构体，包含了自身所支持的操作，下面详细介绍：

1.超级块对象：代表一个已安装的文件系统。由数据结构super_block结构体表示，定义在linux/fs.h中。如下所示：

                         
struct
super_block { 
        
struct
list_head        s_list;            
/* list of all superblocks */
        
dev_t                   s_dev;             
/* identifier */
        
unsigned 
long
s_blocksize;       
/* block size in bytes */
        
unsigned 
long
s_old_blocksize;   
/* old block size in bytes */
        
unsigned 
char
s_blocksize_bits;  
/* block size in bits */
        
unsigned 
char
s_dirt;            
/* dirty flag */
        
unsigned 
long
long
s_maxbytes;        
/* max file size */
        
struct
file_system_type s_type;            
/* filesystem type */
        
struct
super_operations s_op;              
/* superblock methods */
        
struct
dquot_operations *dq_op;            
/* quota methods */
        
struct
quotactl_ops     *s_qcop;           
/* quota control methods */
        
struct
export_operations *s_export_op;     
/* export methods */
        
unsigned 
long
s_flags;          
/* mount flags */
        
unsigned 
long
s_magic;          
/* filesystem's magic number */
        
struct
dentry            *s_root;          
/* directory mount point */
        
struct
rw_semaphore      s_umount;         
/* unmount semaphore */
        
struct
semaphore         s_lock;           
/* superblock semaphore */
        
int
s_count;          
/* superblock ref count */
        
int
s_syncing;        
/* filesystem syncing flag */
        
int
s_need_sync_fs;   
/* not-yet-synced flag */
        
atomic_t                 s_active;         
/* active reference count */
        
void
*s_security;      
/* security module */
        
struct
list_head         s_dirty;          
/* list of dirty inodes */
        
struct
list_head         s_io;             
/* list of writebacks */
        
struct
hlist_head        s_anon;           
/* anonymous dentries */
        
struct
list_head         s_files;          
/* list of assigned files */
        
struct
block_device      *s_bdev;          
/* associated block device */
        
struct
list_head         s_instances;      
/* instances of this fs */
        
struct
quota_info        s_dquot;          
/* quota-specific options */
        
char
s_id[32];         
/* text name */
        
void
*s_fs_info;       
/* filesystem-specific info */
        
struct
semaphore         s_vfs_rename_sem; 
/* rename semaphore */
};
        
      

创建，管理和销毁超级块对象的代码位于文件fs/super.c中，超级块对象通过alloc_super()函数创建并初始化。在文件系统安装时，内核会调用该函数以便从磁盘读取文件系统超级块，并且将其信息填充到内存中的超级块对象中。其中最重要的一个是s_op,指向超级块的操作函数表，由super_operations结构体表示，定义在linux/fs.h中，如下：

                         
struct
super_operations { 
        
struct
inode *(*alloc_inode) (
struct
super_block *sb); 
        
void
(*destroy_inode) (
struct
inode *); 
        
void
(*read_inode) (
struct
inode *); 
        
void
(*dirty_inode) (
struct
inode *); 
        
void
(*write_inode) (
struct
inode *, 
int
); 
        
void
(*put_inode) (
struct
inode *); 
        
void
(*drop_inode) (
struct
inode *); 
        
void
(*delete_inode) (
struct
inode *); 
        
void
(*put_super) (
struct
super_block *); 
        
void
(*write_super) (
struct
super_block *); 
        
int
(*sync_fs) (
struct
super_block *, 
int
); 
        
void
(*write_super_lockfs) (
struct
super_block *); 
        
void
(*unlockfs) (
struct
super_block *); 
        
int
(*statfs) (
struct
super_block *, 
struct
statfs *); 
        
int
(*remount_fs) (
struct
super_block *, 
int
*, 
char
*); 
        
void
(*clear_inode) (
struct
inode *); 
        
void
(*umount_begin) (
struct
super_block *); 
        
int
(*show_options) (
struct
seq_file *, 
struct
vfsmount *); 
};
        
      

当文件系统需要对其超级块执行操作时，首先要在超级块对象中寻找需要的操作方法。比如一个文件系统要写自己的超级块，需要调用：sb->s_op->write_super(sb)这里的sb是指向文件系统超级块的指针，沿着该指针进入超级块操作函数表，并从表中取得希望得到的write_super()函数，该函数执行写入超级块的实际操作。

2.索引节点对象：由inode结构体表示，定义在linux/fs.h中，如下：

                         
struct
inode { 
        
struct
hlist_node       i_hash;              
/* hash list */
        
struct
list_head        i_list;              
/* list of inodes */
        
struct
list_head        i_dentry;            
/* list of dentries */
        
unsigned 
long
i_ino;               
/* inode number */
        
atomic_t                i_count;             
/* reference counter */
        
umode_t                 i_mode;              
/* access permissions */
        
unsigned 
int
i_nlink;             
/* number of hard links */
        
uid_t                   i_uid;               
/* user id of owner */
        
gid_t                   i_gid;               
/* group id of owner */
        
kdev_t                  i_rdev;              
/* real device node */
        
loff_t                  i_size;              
/* file size in bytes */
        
struct
timespec         i_atime;             
/* last access time */
        
struct
timespec         i_mtime;             
/* last modify time */
        
struct
timespec         i_ctime;             
/* last change time */
        
unsigned 
int
i_blkbits;           
/* block size in bits */
        
unsigned 
long
i_blksize;           
/* block size in bytes */
        
unsigned 
long
i_version;           
/* version number */
        
unsigned 
long
i_blocks;            
/* file size in blocks */
        
unsigned 
short
i_bytes;             
/* bytes consumed */
        
spinlock_t              i_lock;              
/* spinlock */
        
struct
rw_semaphore     i_alloc_sem;         
/* nests inside of i_sem */
        
struct
semaphore        i_sem;               
/* inode semaphore */
        
struct
inode_operations *i_op;               
/* inode ops table */
        
struct
file_operations  *i_fop;              
/* default inode ops */
        
struct
super_block      *i_sb;               
/* associated superblock */
        
struct
file_lock        *i_flock;            
/* file lock list */
        
struct
address_space    *i_mapping;          
/* associated mapping */
        
struct
address_space    i_data;              
/* mapping for device */
        
struct
dquot            *i_dquot[MAXQUOTAS]; 
/* disk quotas for inode */
        
struct
list_head        i_devices;           
/* list of block devices */
        
struct
pipe_inode_info  *i_pipe;             
/* pipe information */
        
struct
block_device     *i_bdev;             
/* block device driver */
        
unsigned 
long
i_dnotify_mask;      
/* directory notify mask */
        
struct
dnotify_struct   *i_dnotify;          
/* dnotify */
        
unsigned 
long
i_state;             
/* state flags */
        
unsigned 
long
dirtied_when;        
/* first dirtying time */
        
unsigned 
int
i_flags;             
/* filesystem flags */
        
unsigned 
char
i_sock;              
/* is this a socket? */
        
atomic_t                i_writecount;        
/* count of writers */
        
void
*i_security;         
/* security module */
        
__u32                   i_generation;        
/* inode version number */
        
union
{ 
                
void
*generic_ip;         
/* filesystem-specific info */
        
} u; 
};
        
      

有时，某些文件系统可能并不能完整的包含索引节点结构体要求的所有信息。举个例子，有的文件系统可能并不记录文件的创建时间，这时，该文件系统就可以在实现中选择任意合适的办法来解决这个问题，它可以在i_ctime中存储0，或者让i_ctime等于i_mtime,甚至任何其他值。索引节点对象中的inode_operations项存放了操作函数列表，定义在linux/fs.h中，如下：

                         
struct
inode_operations { 
        
int
(*create) (
struct
inode *, 
struct
dentry *,
int
); 
        
struct
dentry * (*lookup) (
struct
inode *, 
struct
dentry *); 
        
int
(*link) (
struct
dentry *, 
struct
inode *, 
struct
dentry *); 
        
int
(*unlink) (
struct
inode *, 
struct
dentry *); 
        
int
(*symlink) (
struct
inode *, 
struct
dentry *, 
const
char
*); 
        
int
(*mkdir) (
struct
inode *, 
struct
dentry *, 
int
); 
        
int
(*rmdir) (
struct
inode *, 
struct
dentry *); 
        
int
(*mknod) (
struct
inode *, 
struct
dentry *, 
int
, dev_t); 
        
int
(*
rename
) (
struct
inode *, 
struct
dentry *, 
struct
inode *, 
struct
dentry *); 
        
int
(*readlink) (
struct
dentry *, 
char
*, 
int
); 
        
int
(*follow_link) (
struct
dentry *, 
struct
nameidata *); 
        
int
(*put_link) (
struct
dentry *, 
struct
nameidata *); 
        
void
(*truncate) (
struct
inode *); 
        
int
(*permission) (
struct
inode *, 
int
); 
        
int
(*setattr) (
struct
dentry *, 
struct
iattr *); 
        
int
(*getattr) (
struct
vfsmount *, 
struct
dentry *, 
struct
kstat *); 
        
int
(*setxattr) (
struct
dentry *, 
const
char
*,
const
void
*, 
size_t
, 
int
); 
        
ssize_t (*getxattr) (
struct
dentry *, 
const
char
*, 
void
*, 
size_t
); 
        
ssize_t (*listxattr) (
struct
dentry *, 
char
*, 
size_t
); 
        
int
(*removexattr) (
struct
dentry *, 
const
char
*); 
};
        
      

同样，操作调用时，用以下方式：i->i_op->truncate(i).

由于版面原因，我不得不分两次说了，下次继续后面有关虚拟文件系统的剩余部分.

接着上次的来，我今天讲虚拟文件系统剩下的一点知识.

3.目录项对象.目录项的概念上节已经说了,我就不多说.目录项中也可包括安装点.在路径/mnt/cdrom/foo中，/,mnt,cdrom都属于目录项对象。目录项由dentry结构体表示，定义在文件linux/dcache.h中，描述如下:

                         
struct
dentry { 
        
atomic_t                 d_count;      
/* usage count */
        
unsigned 
long
d_vfs_flags;  
/* dentry cache flags */
        
spinlock_t               d_lock;       
/* per-dentry lock */
        
struct
inode             *d_inode;     
/* associated inode */
        
struct
list_head         d_lru;        
/* unused list */
        
struct
list_head         d_child;      
/* list of dentries within */
        
struct
list_head         d_subdirs;    
/* subdirectories */
        
struct
list_head         d_alias;      
/* list of alias inodes */
        
unsigned 
long
d_time;       
/* revalidate time */
        
struct
dentry_operations *d_op;        
/* dentry operations table */
        
struct
super_block       *d_sb;        
/* superblock of file */
        
unsigned 
int
d_flags;      
/* dentry flags */
        
int
d_mounted;    
/* is this a mount point? */
        
void
*d_fsdata;    
/* filesystem-specific data */
        
struct
rcu_head          d_rcu;        
/* RCU locking */
        
struct
dcookie_struct    *d_cookie;    
/* cookie */
        
struct
dentry            *d_parent;    
/* dentry object of parent */
        
struct
qstr              d_name;       
/* dentry name */
        
struct
hlist_node        d_hash;       
/* list of hash table entries */
        
struct
hlist_head        *d_bucket;    
/* hash bucket */
        
unsigned 
char
d_iname[DNAME_INLINE_LEN_MIN]; 
/* short name */
};
        
      

由于目录项并非真正保存在磁盘上，所有目录项没有对应的磁盘数据结构，VFS根据字符串形式的路径名现场创建它，目录项结构体也没有是否被修改的标志。目录项对象有三种状态：被使用，未被使用和负状态。一个被使用的目录项对应一个有效的索引节点(即d_inode指向相应的索引节点)并且该对象存在一个或多个使用者(即d_count为正值)。一个未被使用的目录项对应一个有效的索引节点(d_inode指向一个索引节点)，但是VFS当前并未使用它(d_count为0)。该目录项对象仍然指向一个有效对象，而且被保留在内存中以便需要时再使用它。显然这样要比重新创建要效率高些。一个负状态的目录项没有对应的有效索引节点(d_inode为NULL).因为索引节点已被删除了，或路径不再正确了，但是目录项仍然保留，以便快速解析以后的路径查询。虽然负的状态目录项有些用处，但如果需要的话话，还是可以删除的，可以销毁它。

结构体dentry_operation指明了VFS操作目录的所有方法，如下：

                         
struct
dentry_operations { 
        
int
(*d_revalidate) (
struct
dentry *, 
int
); 
        
int
(*d_hash) (
struct
dentry *, 
struct
qstr *); 
        
int
(*d_compare) (
struct
dentry *, 
struct
qstr *, 
struct
qstr *); 
        
int
(*d_delete) (
struct
dentry *); 
        
void
(*d_release) (
struct
dentry *); 
        
void
(*d_iput) (
struct
dentry *, 
struct
inode *); 
};
        
      

其实，如果VFS遍历路径名中所有的元素并将它们逐个地解析成目录项对象，将是一件非常耗时的事情。所以内核将目录项对象缓存在目录项缓存(dcache)中，目录项缓存包括三个主要部分：

1.“被使用的”目录项链表，该链表通过索引节点对象中的i_dentry项连接相关的索引节点，因为一个给定的索引节点可能有多个链接，所以就可能有多

个目录项对象，因此用一个链表来连接它们。

2.“最近被使用的”双向链表。该链表包含未被使用的和负状态的目录项对象。该链表是按时间插入的。

3. 哈希表和相应的哈希函数用来快速地将给定路径解析为相关目录项对象。

哈希表有数组dentry_hashtable表示，其中每一个元素都是一个指向具有相同键值的目录项对象链表的指针。数组的大小取决于系统中物理内存的大小。实际的哈希值由d_hash()计算，它是内核提供给文件系统的唯一的一个哈希函数。查找哈希表要通过d_lookup()函数，如果该函数在dcache中发现了与其相匹配的目录项对像，则匹配对象被返回；否则，返回NULL指针。dcache在一定意义上也提供了对索引节点的缓存。和目录项对象相关的索引节点对象不会被释放，因为目录项会让相关索引节点的使用计数为正，这样就可以确保索引节点留在内存中。只要目录项被缓存，其相应的索引节点也就被缓存了。

4.文件对象:文件对象表示进程以打开的文件。文件对象仅仅在进程观点上代表已打开文件，它反过来指向目录项对象(反过来指向索引节点)，其实只有目录项对象才表示已打开的实际文件。虽然一个文件对应的文件对象不是唯一的，但对应的索引节点和目录项对象无疑是唯一的。文件对象由file结构表示，定义在文件linux/fs.h中，如下:

                         
struct
file { 
        
struct
list_head       f_list;        
/* list of file objects */
        
struct
dentry          *f_dentry;     
/* associated dentry object */
        
struct
vfsmount        *f_vfsmnt;     
/* associated mounted fs */
        
struct
file_operations *f_op;         
/* file operations table */
        
atomic_t               f_count;       
/* file object's usage count */
        
unsigned 
int
f_flags;       
/* flags specified on open */
        
mode_t                 f_mode;        
/* file access mode */
        
loff_t                 f_pos;         
/* file offset (file pointer) */
        
struct
fown_struct     f_owner;       
/* owner data for signals */
        
unsigned 
int
f_uid;         
/* user's UID */
        
unsigned 
int
f_gid;         
/* user's GID */
        
int
f_error;       
/* error code */
        
struct
file_ra_state   f_ra;          
/* read-ahead state */
        
unsigned 
long
f_version;     
/* version number */
        
void
*f_security;   
/* security module */
        
void
*private_data; 
/* tty driver hook */
        
struct
list_head       f_ep_links;    
/* list of eventpoll links */
        
spinlock_t             f_ep_lock;     
/* eventpoll lock */
        
struct
address_space   *f_mapping;    
/* page cache mapping */
};
        
      

文件对象的操作有file_operations结构表示，在linux/fs.h中，如下:

                         
struct
file_operations { 
        
struct
module *owner; 
        
loff_t (*llseek) (
struct
file *, loff_t, 
int
); 
        
ssize_t (*read) (
struct
file *, 
char
*, 
size_t
, loff_t *); 
        
ssize_t (*aio_read) (
struct
kiocb *, 
char
*, 
size_t
, loff_t); 
        
ssize_t (*write) (
struct
file *, 
const
char
*, 
size_t
, loff_t *); 
        
ssize_t (*aio_write) (
struct
kiocb *, 
const
char
*, 
size_t
, loff_t); 
        
int
(*readdir) (
struct
file *, 
void
*, filldir_t); 
        
unsigned 
int
(*poll) (
struct
file *, 
struct
poll_table_struct *); 
        
int
(*ioctl) (
struct
inode *, 
struct
file *, unsigned 
int
, unsigned 
long
); 
        
int
(*mmap) (
struct
file *, 
struct
vm_area_struct *); 
        
int
(*open) (
struct
inode *, 
struct
file *); 
        
int
(*flush) (
struct
file *); 
        
int
(*release) (
struct
inode *, 
struct
file *); 
        
int
(*fsync) (
struct
file *, 
struct
dentry *, 
int
); 
        
int
(*aio_fsync) (
struct
kiocb *, 
int
); 
        
int
(*fasync) (
int
, 
struct
file *, 
int
); 
        
int
(*lock) (
struct
file *, 
int
, 
struct
file_lock *); 
        
ssize_t (*readv) (
struct
file *, 
const
struct
iovec *, 
                          
unsigned 
long
, loff_t *); 
        
ssize_t (*writev) (
struct
file *, 
const
struct
iovec *, 
                           
unsigned 
long
, loff_t *); 
        
ssize_t (*sendfile) (
struct
file *, loff_t *, 
size_t
, 
                             
read_actor_t, 
void
*); 
        
ssize_t (*sendpage) (
struct
file *, 
struct
page *, 
int
, 
                             
size_t
, loff_t *, 
int
); 
        
unsigned 
long
(*get_unmapped_area) (
struct
file *, unsigned 
long
, 
                                            
unsigned 
long
, unsigned 
long
, 
                                            
unsigned 
long
); 
        
int
(*check_flags) (
int
flags); 
        
int
(*dir_notify) (
struct
file *filp, unsigned 
long
arg); 
        
int
(*flock) (
struct
file *filp, 
int
cmd, 
struct
file_lock *fl); 
};
        
      

最后，除了以上几种VFS基础对象外，内核还使用了另外一些数据结构来管理文件系统的其它相关数据，如下：

1.file_system_type:因为linux支持众多的文件系统，所以内核必有由一个特殊的结构来描述每种文件系统的功能和行为：

                         
struct
file_system_type { 
        
const
char
*name;     
/* filesystem's name */
        
struct
subsystem        subsys;    
/* sysfs subsystem object */
        
int
fs_flags;  
/* filesystem type flags */
        
/* the following is used to read the superblock off the disk */
        
struct
super_block      *(*get_sb) (
struct
file_system_type *, 
int
,
char
*, 
void
*); 
        
/* the following is used to terminate access to the superblock */
        
void
(*kill_sb) (
struct
super_block *); 
        
struct
module           *owner;    
/* module owning the filesystem */
        
struct
file_system_type *next;     
/* next file_system_type in list */
        
struct
list_head        fs_supers; 
/* list of superblock objects */
};
        
      

其中，get_sb()函数从磁盘上读取超级块，并且在文件系统被安装时，在内存中组装超级块对象，剩余的函数描述文件系统的属性。每种文件系统，不管有多少个实力安装到系统中，还是根本就没有安装到系统中，都只有一个file_system_type结构。更有趣的是，当文件系统被实际安装时，将有一个vfsmount结构体在安装点被创建。该结构体被用来代表文件系统的实例----换句话说，代表一个安装点.

2.vfsmount结构被定义在linux/mount.h中，下面是具体结构：

                         
struct
vfsmount { 
        
struct
list_head   mnt_hash;        
/* hash table list */
        
struct
vfsmount    *mnt_parent;     
/* parent filesystem */
        
struct
dentry      *mnt_mountpoint; 
/* dentry of this mount point */
        
struct
dentry      *mnt_root;       
/* dentry of root of this fs */
        
struct
super_block *mnt_sb;         
/* superblock of this filesystem */
        
struct
list_head   mnt_mounts;      
/* list of children */
        
struct
list_head   mnt_child;       
/* list of children */
        
atomic_t           mnt_count;       
/* usage count */
        
int
mnt_flags;       
/* mount flags */
        
char
*mnt_devname;    
/* device file name */
        
struct
list_head   mnt_list;        
/* list of descriptors */
        
struct
list_head   mnt_fslink;      
/* fs-specific expiry list */
        
struct
namespace
*mnt_namespace   
/* associated namespace */
};
        
      

vfs中维护的各种链表是为了跟踪文件系统和所有其他安装点的关系，mnt_flags保存了安装时指定的标志信息，下表给出了标准的安装标志：

安装那些管理不充分信任的移动设备时，这些标志很有用处。

系统中每一个进程都有自己的一组打开的文件，有三个数据结构将VFS层和文件的进程紧密联系在一起，它们分别是file_struct,fs_struct和namespace.

1.file_struct:该结构体有进程描述符中的files域指向，如下：

                         
struct
files_struct { 
        
atomic_t    count;              
/* structure's usage count */
        
spinlock_t  file_lock;          
/* lock protecting this structure */
        
int
max_fds;            
/* maximum number of file objects */
        
int
max_fdset;          
/* maximum number of file descriptors */
        
int
next_fd;            
/* next file descriptor number */
        
struct
file **fd;               
/* array of all file objects */
        
fd_set      *close_on_exec;     
/* file descriptors to close on exec() */
        
fd_set      *open_fds;           
/* pointer to open file descriptors */
        
fd_set      close_on_exec_init; 
/* initial files to close on exec() */
        
fd_set      open_fds_init;      
/* initial set of file descriptors */
        
struct
file *fd_array[NR_OPEN_DEFAULT]; 
/* default array of file objects */
};
        
      

fd数组指针指向以打开的文件对象链表，默认情况下，指向fd_arrar数组。NR_OPEN_DEFAULT默认是32，所以该数组可以容纳32个文件对象。如果一个进程所打开的文件对象超过32个，内核将分配一个新数组，并且将fd指针指向它。这个值也是可以调整的。

2.第二个结构体是fs_struct：由进程描述符的fs域指向。它包含文件系统和进程相关的信息，在linux/fs_struct.h中，如下：

                         
struct
fs_struct { 
        
atomic_t        count;       
/* structure usage count */
        
rwlock_t        lock;        
/* lock protecting structure */
        
int
umask;       
/* default file permissions*/
        
struct
dentry   *root;       
/* dentry of the root directory */
        
struct
dentry   *pwd;        
/* dentry of the current directory */
        
struct
dentry   *altroot;    
/* dentry of the alternative root */
        
struct
vfsmount *rootmnt;    
/* mount object of the root directory */
        
struct
vfsmount *pwdmnt;     
/* mount object of the current directory */
        
struct
vfsmount *altrootmnt; 
/* mount object of the alternative root */
};
        
      

该结构包含了当前进程的当前工作目录和根目录。

3.最后一个是namespace：由进程描述符namespace域指向，定义在linux/namespace.h中,如下:

                         
struct
namespace
{ 
        
atomic_t            count; 
/* structure usage count */
        
struct
vfsmount     *root; 
/* mount object of root directory */
        
struct
list_head    list;  
/* list of mount points */
        
struct
rw_semaphore sem;   
/* semaphore protecting the namespace */
};
        
      

list域是连接已安装文件系统的双向链表，它包含的元素组成了全体命令空间。上述这些数据结构都是通过进程描述符连接起来的。对多数进程来说，它们的描述符都指向唯一的files_struct和fs_struct结构体。但是，对于那些使用克隆标志CLONE_FILES或CLONE_FS创建的进程，会共享这两个结构体。所以多个进程描述符可能指向同一个files_struct或fs_struct结构体。每个结构体都维护一个count域作为引用计数，它防止进程正使用该结构时，该结构被销毁。而namespace却不是这样，默认情况下，所有的进程共享同样的命名空间，也就是说，它们都看到同一个文件层层结构。只有在进行clone()操作时使用CLONE_NEWS标志，才会给进程一个另外的命名空间结构体的拷贝。

转载于:https://www.cnblogs.com/EE-NovRain/archive/2012/07/31/2616746.html

你可能感兴趣的文章

centos7 安装中文编码

查看>>

POJ - 3683 Priest John's Busiest Day

查看>>

正则表达式start(),end(),group()方法