Google Answers: UNIX Internals

View Question

Q: UNIX Internals ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: UNIX Internals
Category: Computers > Operating Systems
Asked by: salman1-ga
List Price: $50.00

Posted: 21 Apr 2003 20:19 PDT
Expires: 21 May 2003 20:19 PDT
Question ID: 193616

I need detailed (as much as possible) answers to the following
questions by Thursday (4/24/03):
(Note: All the questions below apply to a typical UNIX system)

=== 1 ===
Category: File System Implementation

a. What are the benefits of allocating and deallocating inodes
dynamically?
b. Why do s5fs and FFS have a fixed number of on-disk inodes in each
file system?
c. What is the purpose of the free space reserve in FFS?
d. What happens if a disk error destroys the s5fs superblock?

=== 2 ===
Category: Threads

a. Why does each LWP need a separate kernel stack? Can the system save
resources by allocating a kernel stack only when a LWP makes a system
call?
b. The proc structure and the u area contain process attributes and
resources. In a multithreaded system, which of their fields may be
shared by all LWP’s of the process, and which must be per-LWP?
c. Suppose one LWP invokes fork just at the same instance that another
LWP of the same process invokes exit.
  i. What would be the result if the system uses fork to duplicate all
LWPs of the process?
  ii. What if fork duplicates only one LWP?

=== 3 ===
What is the drawback of having signal handlers be persistent, i.e.,
remain installed after being invoked? Are there any specific signals
that should not have persistent handlers?

=== 4 ===
Compare the IPC functionality provided by pipes and message queues.
What are the advantages and drawbacks of each? When is one more
suitable than the other?

=== 5 ===
The majority of all Unix systems allow a process to attach the same
shared-memory region to more than one location in its address space.
Is this a bug or a feature? When would this be useful? What problems
could it cause?

=== 6 ===
Is it possible to implement resource locking through (a) signals alone
or (b) shared memory and signals? What would be the performance of
such a facility?

=== 7 ===
Discuss the relative merits and drawbacks of hint-based and
reference-based directory name lookup caches.

If it helps, these questions were taken from the following book: 
UNIX Internals - by: Uresh Vahalia (ISBN:0-13-101908-2)

Thanks
Salman.

Answer

Subject: Re: UNIX Internals
Answered By: maniac-ga on 22 Apr 2003 05:55 PDT
Rated: 5 out of 5 stars

Hello Salman, 1 File Systems a. What are the benefits / drawbacks of dynamic allocation of inodes? Benefits : more "available storage" if fewer large files are created, allows an "unlimited" number of small files (restricted by total storage only). Drawbacks: more overhead to find inodes, fragmentation of inodes across the disk, requires allocation of an inode before creating files, performance impacts (more disk reads / writes). b. Why do s5fs (System V File System) and ffs (Berkeley Fast File System) use fixed on disk inodes. Both of these were implemented many years ago when both CPU's and disks were far slower and both memory and disks were far smaller. Basically, the performance impacts were the prime determining factor. For example, refer to Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry, "A Fast File System for UNIX," ACM Transactions on Computer Systems Vol. 2, No. 3, August, 1984,, pp. 181-197. which is the paper describing FFS. I can provide similar references for s5fs if interested or search using phrases such as s5fs performance ffs file system performance fixed on disk inodes s5fs [or ffs] c. What is the purpose of the free space reserve in FFS? To allow root processes to continue to support critical system operations when the disk is "full". Often set to 10% of the total disk size (e.g., man newfs on a BSD system). d. What happens if a disk error destroys the s5fs superblock? Bad things, don't do that :-). Basically, this stores critical information about the disk volume and the system will not be able to mount the disk. More modern file systems store several copies of the super block in different locations on the disk for just this reason. It may be possible to recover some or all of the super block by analyzing the contents of the disk (e.g., fschk). 2. Threads a. Why does each LWP need a separate kernel stack? Because more than one thread can be in the kernel at the same time. Can the system save resources by allocating a kernel stack only when a LWP makes a system call? Strictly yes, but this is often not a meaningful optimization. In a low memory situation, this could cause deadlock and zombie processes. This is could also cause security problems, so the practical answer is no. b. The proc structure and the u[ser] area contain process attributes and resources. In a multithreaded system, which of their fields may be shared by all LWP?s of the process, and which must be per-LWP? This actually varies by system and is a function of the scheduling algorithm. On a 1:1 threading model such as used by Linux, a thread == process, most of the data is not shared. But even in that case, to get Posix compliance, threads share signal handlers (but not signal masks), memory, file descriptors, current working directory, and so on. An old reference to this explanation is at: http://pauillac.inria.fr/~xleroy/linuxthreads/README On Linux in particular, the threads don't share a process id (a non-compliance with the Posix standard). c. Suppose one LWP invokes fork just at the same instance that another LWP of the same process invokes exit. First, the operating system must protect itself and in general one call will be completed before the other (it will exit w/o fork, or it will fork & then exit). This won't matter if it is a uni processor or multi processor system. i. What would be the result if the system uses fork to duplicate all LWPs of the process? Both the parent and child process will exit. ii. What if fork duplicates only one LWP? If the forking LWP is duplicated, the child will continue to run since the LWP that does the exit will not run in the child process. 3. Signals What is the drawback of having signal handlers be persistent, i.e., remain installed after being invoked? Are there any specific signals that should not have persistent handlers? Historically, this has been implemented both ways. For some details of this behavior, check http://unixhelp.ed.ac.uk/CGI/man-cgi?signal+2 which describes how BSD Unix and System V implement it differently. See the notes near the bottom describing that operation after SIGPFE, SIGSEGV, and SIGILL is "undefined". Using SIGSEGV as an example, this usually indicates memory corruption and continued operation is not suggested. 4. Inter Process Communication Compare the IPC functionality provided by pipes and message queues. What are the advantages and drawbacks of each? When is one more suitable than the other? Pipes are certainly easier to implement and use for the typical case. Pipes are used as a key part of the function of the Unix shell where operations such as grep -n value *.c \| sort are performed where the output of the first command is sent to the second as its input. No coding was required in either grep nor sort to make this work. Message queues are a richer interface, usually used in a client /server type of application where data must be sent in both directions. Pipes could be used in that case, but are harder to set up and use. 5 Shared memory The majority of all Unix systems allow a process to attach the same shared-memory region to more than one location in its address space. Is this a bug or a feature? When would this be useful? What problems could it cause? Feature. You can specify different memory protection (e.g., read, write, execute), and other flags (e.g., MAP_INHERIT) for the different regions of memory. The main problem gets into how the operating system manages page tables and must reference count the physical page within the process (not just between processes). It is hard to come up with some good examples of why this is useful - perhaps for a computer emulator or virtual machine implementation. I have seen some references to techniques on Intel processors where the virtual mapping of executable code is different than for data accesses to implement Plex86. 6 Resource locking Is it possible to implement resource locking through (a) signals alone or (b) shared memory and signals? What would be the performance of such a facility? It would be extremely difficult to do resource locking with signals. It may be possible with a combination of signal masks and signals [to ensure just a single LWP or process gets activated], but it would not be simple to set up, inefficient, and not portable between systems. Adding shared memory simplifies the effort quite a bit if you assume cache coherent computers (most are). There are some cautions with this. Some high performance systems (e.g., Dec Alpha) which also require a memory barrier operation to ensure that memory is in sync before operations on shared memory. [I can provide references if needed] With proper hardware support, the overhead of resource locking can be quite, on the order of microseconds if no contention is seen. 7 Caches Discuss the relative merits and drawbacks of hint-based and reference-based directory name lookup caches. In general, caches that do not require hints provide acceptable performance for a wide variety of applications. There are cases though where hints will improve performance by marking items that will not be cached (and making the cache more effective for other uses). The main drawback to hints is that it requires the application to provide the hint, increases the complexity of the main path (additional IF statements), and can reduce the general case performance. If the special case is seen frequently enough, the trade off is acceptable. Do not hesitate to ask for clarification if any part of this answer is not complete to your satisfaction. Good luck on your work. --Maniac
Request for Answer Clarification by salman1-ga on 22 Apr 2003 08:25 PDT Hi: Thanks for your wonderful work. However, if possible, I will need a little more details on 1b. And for 3 were you able to find the drawbacks for persistent handlers? Because, I know older-UNIX systems implemented non-persistent handlers and for that reason user-handlers had to be reinstalled every time that signal occured. That's the drawback for non-persistent handlers, what is the draw back for persitent handlers? Everything else looks great Salman.
Request for Answer Clarification by salman1-ga on 22 Apr 2003 08:30 PDT Also on 2(b): "...a thread == process, most of the data..." does not make sense. Maybe the line got chopped off? Thanks
Clarification of Answer by maniac-ga on 22 Apr 2003 16:48 PDT Hello Salman1, To answer your specific questions in order: 1b. Why the fixed number of on disk inodes? As I mentioned before, the choice was made primarily for performance reasons. For example, if I am looking up a directory and get an inode number (say, as a 32 bit integer), you can look up the contents of the s5fs file with the following steps: - seek / read the i'th inode (compute as an offset from the start of the inode area) - for each block in the inode, seek / read that block This is a very simple method to find the contents of the file. The FFS modified this by splitting the inodes into "cylinder groups" where the inode & file contents are "usually" in the same cylinder group. This adds a little complexity, split the inode into two fields (cylinder group, offset) but that is usually just a shift & mask operation, but does not introduce more seek / read operations. If the inodes are dynamically allocated - you first have to find the inode you want. That may require a number of seek / read operations to look up the data. Seek / read operations are very expensive - adding a single seek / read can slow transfers substantially. 3 Drawbacks for persistent handlers? The main drawback is that it is not compatible with the previous behavior. The definition of the signal interface allows for SA_RESETHAND to specify that you want the old behavior (reset the handler to SIG_DFL when a signal is handled). Of course, the other possible drawback is that it does not require the signal handler to reaffirm that it wants to handle the signal. This might be considered a "safer" implementation, but the problem with multiple rapid signal delivery was the more serious one. On 2(b): "...a thread == process, most of the data..." does not make sense. Maybe the line got chopped off? Perhaps a better explanation is to revise slightly to read On a 1:1 threading model such as used by Linux, a thread == process, most of the [process attributes and resource] data is not shared. That does not mean the values are not initially identical. For all the gory details, check out http://www.die.net/doc/linux/man/man2/clone.2.html which goes into a lot of detail with respect to what can and cannot be shared. As a simple example, specifying CLONE_FS will make the parent and child share file system information (both affected by chroot, chdir, and umask). If you don't specify CLONE_FS, the initial values are the same but the parent and child have separate copies and the child can't affect the parent and vice versa. Having said all that however, most Unix systems (e.g., Solaris, Irix) use a different scheduling method where threads are mapped N:1 to processes. In that case, almost all of the "process" data is shared by the threads (since there is only one process), and the threads have unique data such as the register set, stack location, signal masks, and so on. There is a pretty nice overview of threads and how they are mapped to operating systems (but doesn't cover Linux) at http://www.nswc.navy.mil/cosip/feb99/cots0299-1.shtml which includes some nice diagrams and describes some of the tradeoff's involved. --Maniac
Request for Answer Clarification by salman1-ga on 23 Apr 2003 08:29 PDT Hey: The answer you provided for 7 I think is incorrect. Because, the question is in the context of directory name lookup caches and, your answer is in process-execution context. Hint-based caching also occurs during loading contents into memory for application execution (which is different from question 7). Can you please verify that? Thanks Salman.
Clarification of Answer by maniac-ga on 23 Apr 2003 10:55 PDT Hello Salman1, With respect to #7 Hint based vs. reference based directory name lookup caches, let me add some clarifying comments to my answer. Application execution is another place where hints can be used [and I'll refer to that below], but my reference to the application is primarily since it must provide the hints [since if the OS generated the hints, it would be part of a general purpose caching method]. A referenced based cache is usually implemented by an LRU (or Least Recently Used) method. The cost of keeping track of the "last access" (sometimes called atime) for a directory name [or any other item stored on disk] is far less than the cost of accessing that same data if the cache was not present. A number of studies have shown the effectiveness of this in typical usage patterns. The effectivness of hint based caches [which can be used for a variety of purposes] basically depend on the application generating the "hints" to disclose future accesses - and allowing the file system to make optimal global prefetch decisions. Reference: http://www.ipd.uka.de/~florin/Publications/dagstuhl.ps or in Google's cache as text at http://216.239.53.100/search?q=cache:tUZwkwBsMd4C:www.ipd.uka.de/~florin/Publications/dagstuhl.ps+directory+name+lookup+%22hint+based%22+cache&hl=en&ie=UTF-8 In this case, it is assumed the application "knows better" than the operating system what the upcoming references are going to be and it does not match the typical case. As I mentioned previously, the hints have to come from somewhere - generally from the application. The hints can be implemented as described above (trigger file system pre-fetch) , as I mentioned (leave these items out of the cache), or as you mentioned (loading memory for program execution). Other examples of where hints have been implemented include: - choice of "large block" vs. "small block" memory page tables [something now being implemented in Linux] to minimize overhead for large applications (instead of minimizing lost memory for small applications) - marking file access as "streaming" so the data will be prefetched and past data will not be cached - used in media servers - CPU affinity - I want this process to have a dedicated CPU to meet response requirements and reduce CPU cache misses - visual display systems - the vehicle is moving "this way" (so the visual will prefetch terrain data in that direction) and so on. In general, all of these cases require the application to do something "special" to provide the hints to the system level software. The case of directory name lookup is just one case in this spectrum of problems to be solved. The same general techniques apply in each of these cases and most often (but not always) refer to management of the in memory cache of data on disk. The action taken in response to the hint may be one of - pre fetch data into the cache (increase blocks read, but reduce latency at time of access) - leave data out of the cache (don't waste memory on data not expected to be read "soon") - choose this item (instead of that item) to remove from the cache and all of these methods may be driven by hints. --Maniac

salman1-ga rated this answer: 5 out of 5 stars

Great work.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy