Google Answers Logo
View Question
 
Q: UNIX Internals ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: UNIX Internals
Category: Computers > Operating Systems
Asked by: salman1-ga
List Price: $50.00
Posted: 21 Apr 2003 20:19 PDT
Expires: 21 May 2003 20:19 PDT
Question ID: 193616
I need detailed (as much as possible) answers to the following
questions by Thursday (4/24/03):
(Note: All the questions below apply to a typical UNIX system)

=== 1 ===
Category: File System Implementation

a. What are the benefits of allocating and deallocating inodes
dynamically?
b. Why do s5fs and FFS have a fixed number of on-disk inodes in each
file system?
c. What is the purpose of the free space reserve in FFS?
d. What happens if a disk error destroys the s5fs superblock?

=== 2 ===
Category: Threads

a. Why does each LWP need a separate kernel stack? Can the system save
resources by allocating a kernel stack only when a LWP makes a system
call?
b. The proc structure and the u area contain process attributes and
resources. In a multithreaded system, which of their fields may be
shared by all LWP’s of the process, and which must be per-LWP?
c. Suppose one LWP invokes fork just at the same instance that another
LWP of the same process invokes exit.
  i. What would be the result if the system uses fork to duplicate all
LWPs of the process?
  ii. What if fork duplicates only one LWP?

=== 3 ===
What is the drawback of having signal handlers be persistent, i.e.,
remain installed after being invoked? Are there any specific signals
that should not have persistent handlers?

=== 4 ===
Compare the IPC functionality provided by pipes and message queues.
What are the advantages and drawbacks of each? When is one more
suitable than the other?

=== 5 ===
The majority of all Unix systems allow a process to attach the same
shared-memory region to more than one location in its address space.
Is this a bug or a feature? When would this be useful? What problems
could it cause?

=== 6 ===
Is it possible to implement resource locking through (a) signals alone
or (b) shared memory and signals? What would be the performance of
such a facility?

=== 7 ===
Discuss the relative merits and drawbacks of hint-based and
reference-based directory name lookup caches.

If it helps, these questions were taken from the following book: 
UNIX Internals - by: Uresh Vahalia (ISBN:0-13-101908-2)

Thanks
Salman.
Answer  
Subject: Re: UNIX Internals
Answered By: maniac-ga on 22 Apr 2003 05:55 PDT
Rated:5 out of 5 stars
 
Hello Salman,

1 File Systems
a. What are the benefits / drawbacks of dynamic allocation of inodes?
Benefits : more "available storage" if fewer large files are created,
allows an "unlimited" number of small files (restricted by total
storage only).
Drawbacks: more overhead to find inodes, fragmentation of inodes
across the disk, requires allocation of an inode before creating
files, performance impacts (more disk reads / writes).

b. Why do s5fs (System V File System) and ffs (Berkeley Fast File
System) use fixed on disk inodes.
Both of these were implemented many years ago when both CPU's and
disks were far slower and both memory and disks were far smaller.
Basically, the performance impacts were the prime determining factor.
For example, refer to
  Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert
S. Fabry, "A Fast File System for UNIX," ACM Transactions on Computer
Systems Vol. 2, No. 3, August, 1984,, pp. 181-197.
which is the paper describing FFS.
I can provide similar references for s5fs if interested or search
using phrases such as
  s5fs performance
  ffs file system performance
  fixed on disk inodes s5fs [or ffs]

c. What is the purpose of the free space reserve in FFS?
To allow root processes to continue to support critical system
operations when the disk is "full". Often set to 10% of the total disk
size (e.g., man newfs on a BSD system).

d. What happens if a disk error destroys the s5fs superblock? 
Bad things, don't do that :-). Basically, this stores critical
information about the disk volume and the system will not be able to
mount the disk. More modern file systems store several copies of the
super block in different locations on the disk for just this reason.
It may be possible to recover some or all of the super block by
analyzing the contents of the disk (e.g., fschk).

2. Threads 
 
a. Why does each LWP need a separate kernel stack?
Because more than one thread can be in the kernel at the same time.
Can the system save resources by allocating a kernel stack only when a
LWP makes a system call?
Strictly yes, but this is often not a meaningful optimization. In a
low memory situation, this could cause deadlock and zombie processes.
This is could also cause security problems, so the practical answer is
no.

b. The proc structure and the u[ser] area contain process attributes
and resources. In a multithreaded system, which of their fields may be
shared by all LWP?s of the process, and which must be per-LWP?
This actually varies by system and is a function of the scheduling
algorithm. On a 1:1 threading model such as used by Linux, a thread ==
process, most of the data is not shared. But even in that case, to get
Posix compliance, threads share signal handlers (but not signal
masks), memory, file descriptors, current working directory, and so
on. An old reference to this explanation is at:
  http://pauillac.inria.fr/~xleroy/linuxthreads/README
On Linux in particular, the threads don't share a process id (a
non-compliance with the Posix standard).

c. Suppose one LWP invokes fork just at the same instance that another
LWP of the same process invokes exit.
First, the operating system must protect itself and in general one
call will be completed before the other (it will exit w/o fork, or it
will fork & then exit). This won't matter if it is a uni processor or
multi processor system.
  i. What would be the result if the system uses fork to duplicate all
LWPs of the process?
Both the parent and child process will exit.
  ii. What if fork duplicates only one LWP? 
If the forking LWP is duplicated, the child will continue to run since
the LWP that does the exit will not run in the child process.

3. Signals

What is the drawback of having signal handlers be persistent, i.e.,
remain installed after being invoked? Are there any specific signals
that should not have persistent handlers?

Historically, this has been implemented both ways. For some details of
this behavior, check
  http://unixhelp.ed.ac.uk/CGI/man-cgi?signal+2
which describes how BSD Unix and System V implement it differently.
See the notes near the bottom describing that operation after SIGPFE,
SIGSEGV, and SIGILL is "undefined". Using SIGSEGV as an example, this
usually indicates memory corruption and continued operation is not
suggested.
 
4. Inter Process Communication

Compare the IPC functionality provided by pipes and message queues.
What are the advantages and drawbacks of each? When is one more
suitable than the other?
 
Pipes are certainly easier to implement and use for the typical case.
Pipes are used as a key part of the function of the Unix shell where
operations such as
  grep -n value *.c | sort
are performed where the output of the first command is sent to the
second as its input. No coding was required in either grep nor sort to
make this work. Message queues are a richer interface, usually used in
a client /server type of application where data must be sent in both
directions. Pipes could be used in that case, but are harder to set up
and use.

5 Shared memory

The majority of all Unix systems allow a process to attach the same
shared-memory region to more than one location in its address space.
Is this a bug or a feature? When would this be useful? What problems
could it cause?

Feature. You can specify different memory protection (e.g., read,
write, execute), and other flags (e.g., MAP_INHERIT) for the different
regions of memory. The main problem gets into how the operating system
manages page tables and must reference count the physical page within
the process (not just between processes). It is hard to come up with
some good examples of why this is useful - perhaps for a computer
emulator or virtual machine implementation. I have seen some
references to techniques on Intel processors where the virtual mapping
of executable code is different than for data accesses to implement
Plex86.

6 Resource locking 

Is it possible to implement resource locking through (a) signals alone
or (b) shared memory and signals? What would be the performance of
such a facility?

It would be extremely difficult to do resource locking with signals.
It may be possible with a combination of signal masks and signals [to
ensure just a single LWP or process gets activated], but it would not
be simple to set up, inefficient, and not portable between systems.

Adding shared memory simplifies the effort quite a bit if you assume
cache coherent computers (most are). There are some cautions with
this. Some high performance systems (e.g., Dec Alpha) which also
require a memory barrier operation to ensure that memory is in sync
before operations on shared memory. [I can provide references if
needed] With proper hardware support, the overhead of resource locking
can be quite, on the order of microseconds if no contention is seen.
 
7 Caches

Discuss the relative merits and drawbacks of hint-based and
reference-based directory name lookup caches.

In general, caches that do not require hints provide acceptable
performance for a wide variety of applications. There are cases though
where hints will improve performance by marking items that will not be
cached (and making the cache more effective for other uses). The main
drawback to hints is that it requires the application to provide the
hint, increases the complexity of the main path (additional IF
statements), and can reduce the general case performance. If the
special case is seen frequently enough, the trade off is acceptable.

Do not hesitate to ask for clarification if any part of this answer is
not complete to your satisfaction. Good luck on your work.

  --Maniac

Request for Answer Clarification by salman1-ga on 22 Apr 2003 08:25 PDT
Hi:

Thanks for your wonderful work. However, if possible, I will need a
little more details on 1b.

And for 3 were you able to find the drawbacks for persistent handlers?
Because, I know older-UNIX systems implemented non-persistent handlers
and for that reason user-handlers had to be reinstalled every time
that signal occured. That's the drawback for non-persistent handlers,
what is the draw back for persitent handlers?

Everything else looks great
Salman.

Request for Answer Clarification by salman1-ga on 22 Apr 2003 08:30 PDT
Also on 2(b): "...a thread == process, most of the data..." does not
make sense. Maybe the line got chopped off?

Thanks

Clarification of Answer by maniac-ga on 22 Apr 2003 16:48 PDT
Hello Salman1,

To answer your specific questions in order:

1b. Why the fixed number of on disk inodes?

As I mentioned before, the choice was made primarily for performance
reasons. For example, if I am looking up a directory and get an inode
number (say, as a 32 bit integer), you can look up the contents of the
s5fs file with the following steps:
 - seek / read the i'th inode (compute as an offset from the start of
the inode area)
 - for each block in the inode, seek / read that block
This is a very simple method to find the contents of the file. The FFS
modified this by splitting the inodes into "cylinder groups" where the
inode & file contents are "usually" in the same cylinder group. This
adds a little complexity, split the inode into two fields (cylinder
group, offset) but that is usually just a shift & mask operation, but
does not introduce more seek / read operations.

If the inodes are dynamically allocated - you first have to find the
inode you want. That may require a number of seek / read operations to
look up the data. Seek / read operations are very expensive - adding a
single seek / read can slow transfers substantially.
 
3 Drawbacks for persistent handlers?

The main drawback is that it is not compatible with the previous
behavior. The definition of the signal interface allows for
SA_RESETHAND to specify that you want the old behavior (reset the
handler to SIG_DFL when a signal is handled). Of course, the other
possible drawback is that it does not require the signal handler to
reaffirm that it wants to handle the signal. This might be considered
a "safer" implementation, but the problem with multiple rapid signal
delivery was the more serious one.

On 2(b): "...a thread == process, most of the data..." does not make
sense. Maybe the line got chopped off?

Perhaps a better explanation is to revise slightly to read

On a 1:1 threading model such as used by Linux, a thread == process,
most of the [process attributes and resource] data is not shared.

That does not mean the values are not initially identical. For all the
gory details, check out
  http://www.die.net/doc/linux/man/man2/clone.2.html
which goes into a lot of detail with respect to what can and cannot be
shared. As a simple example, specifying CLONE_FS will make the parent
and child share file system information (both affected by chroot,
chdir, and umask). If you don't specify CLONE_FS, the initial values
are the same but the parent and child have separate copies and the
child can't affect the parent and vice versa.

Having said all that however, most Unix systems (e.g., Solaris, Irix)
use a different scheduling method where threads are mapped N:1 to
processes. In that case, almost all of the "process" data is shared by
the threads (since there is only one process), and the threads have
unique data such as the register set, stack location, signal masks,
and so on.

There is a pretty nice overview of threads and how they are mapped to
operating systems (but doesn't cover Linux) at
  http://www.nswc.navy.mil/cosip/feb99/cots0299-1.shtml
which includes some nice diagrams and describes some of the tradeoff's
involved.

  --Maniac

Request for Answer Clarification by salman1-ga on 23 Apr 2003 08:29 PDT
Hey:

The answer you provided for 7 I think is incorrect. Because, the
question is in the context of directory name lookup caches and, your
answer is in process-execution context. Hint-based caching also occurs
during loading contents into memory for application execution (which
is different from question 7).

Can you please verify that?
Thanks
Salman.

Clarification of Answer by maniac-ga on 23 Apr 2003 10:55 PDT
Hello Salman1,

With respect to #7 Hint based vs. reference based directory name
lookup caches, let me add some clarifying comments to my answer.
Application execution is another place where hints can be used [and
I'll refer to that below], but my reference to the application is
primarily since it must provide the hints [since if the OS generated
the hints, it would be part of a general purpose caching method].

A referenced based cache is usually implemented by an LRU (or Least
Recently Used) method. The cost of keeping track of the "last access"
(sometimes called atime) for a directory name [or any other item
stored on disk] is far less than the cost of accessing that same data
if the cache was  not present. A number of studies have shown the
effectiveness of this in typical usage patterns.

The effectivness of hint based caches [which can be used for a variety
of purposes] basically depend on the application generating the
"hints" to disclose future accesses - and allowing the file system to
make optimal global prefetch decisions. Reference:
  http://www.ipd.uka.de/~florin/Publications/dagstuhl.ps
or in Google's cache as text at
  http://216.239.53.100/search?q=cache:tUZwkwBsMd4C:www.ipd.uka.de/~florin/Publications/dagstuhl.ps+directory+name+lookup+%22hint+based%22+cache&hl=en&ie=UTF-8
In this case, it is assumed the application "knows better" than the
operating system what the upcoming references are going to be and it
does not match the typical case.

As I mentioned previously, the hints have to come from somewhere -
generally from the application. The hints can be implemented as
described above (trigger file system pre-fetch) , as I mentioned
(leave these items out of the cache), or as you mentioned (loading
memory for program execution).

Other examples of where hints have been implemented include:
 - choice of "large block" vs. "small block" memory page tables
[something now being implemented in Linux] to minimize overhead for
large applications (instead of minimizing lost memory for small
applications)
 - marking file access as "streaming" so the data will be prefetched
and past data will not be cached - used in media servers
 - CPU affinity - I want this process to have a dedicated CPU to meet
response requirements and reduce CPU cache misses
 - visual display systems - the vehicle is moving "this way" (so the
visual will prefetch terrain data in that direction)
and so on. In general, all of these cases require the application to
do something "special" to provide the hints to the system level
software. The case of directory name lookup is just one case in this
spectrum of problems to be solved. The same general techniques apply
in each of these cases and most often (but not always) refer to
management of the in memory cache of data on disk. The action taken in
response to the hint may be one of
 - pre fetch data into the cache (increase blocks read, but reduce
latency at time of access)
 - leave data out of the cache (don't waste memory on data not
expected to be read "soon")
 - choose this item (instead of *that* item) to remove from the cache
and all of these methods may be driven by hints.

  --Maniac
salman1-ga rated this answer:5 out of 5 stars
Great work.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy