Google Answers: Timeline conccurncy counting

View Question

Q: Timeline conccurncy counting ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: Timeline conccurncy counting
Category: Computers > Programming
Asked by: epoulsen-ga
List Price: $35.00

Posted: 27 Feb 2003 11:34 PST
Expires: 29 Mar 2003 11:34 PST
Question ID: 167930

Algorithm Needed: Synopsis: Given a set of input timelines (events), which have a begin and end time, I need the output data to reflect time segments with a count of the concurrent events. For Example: Input (in pseudo-perl): { begin => 10, end => 50 }, { begin => 10, end => 20 }, { begin => 15, end => 20 }, { begin => 40, end => 70 }, { begin => 100, end => 120 } I should get an output of: {begin => 10, end=>14, count=>2} {begin => 15, end=>20, count=>3} {begin => 21, end=>39, count=>1} {begin => 40, end=>50, count=>2} {begin => 51, end=>70, count=>1} {begin => 100, end=>120, count=>1} An acceptable algorithm will _NOT_ iterate through the entire numeric range of the input set. It has to be computationally efficient. Also, if this is the sort of problem this computationally / mathematically expensive to solve (like the traveling salesman problem, though I doubt it is), proof of this is an acceptable answer.
Request for Question Clarification by mathtalk-ga on 27 Feb 2003 12:19 PST Hi, epoulsen-ga: The problem is certainly not "hard" in the same sense as a travelling salesman problem, but I'm unsure what is required when you write: "An acceptable algorithm will _NOT_ iterate through the entire numeric range of the input set." Perhaps you mean no "nested" iterations? It would at least be necessary to "iterate" once through the input set (to read the input data). I can provide a computationally efficient algorithm for this problem, along the lines of a computationally efficient sorting algorithm. It's not quite linear complexity but close enough for practical purposes, esp. if the data is small enough to be held in memory. Would this be a satisfactory answer? regards, mathtalk
Clarification of Question by epoulsen-ga on 27 Feb 2003 12:45 PST hat I mean by an unacceptable iteration would be: pseudo Perl:: my $begin = 10; ## Determined from my $end = 120; ## example dataset in question my %count; for($begin .. $end) { foreach my $segment (@segments) { if ($_ >= $$segment{begin} && $_ <= $$segment{end}) { $count{$_} ++; } count[i-begin]++; } } Then use %count to find segments ...
Request for Question Clarification by mathtalk-ga on 27 Feb 2003 13:05 PST Hi, epoulsen-ga: Thanks for the prompt clarification of your question. Your example of an "unacceptable" iteration has a "nested" iteration over all input(?) segments within a loop over the integers i(?) from $begin to $end. I'm guessing that you intuitively object to two "inefficient" aspects of this approach. First, it is not necessary to work with all integers i which happen to lie between $begin and $end, only those integers which occur in connection with the input segments (either as beginning or ending "points"). Second the complexity of the nested iteration shown is quadratic, i.e. it grows with the square of the number of segments. While not a terrible operation count (if compared to travelling salesman algorithms), it is not the most efficient. I can provide an algorithm whose complexity is proportional to n log(n) for inputs with n segments. Would this be acceptable? Also, I'm wondering if your use of "pseudo-perl" suggests a preference for an actual perl implementation. I would rather aim at a pseudo-code description of the algorithm, esp. as the precise input and output characteristics are not given. regards, mathtalk
Clarification of Question by epoulsen-ga on 27 Feb 2003 13:19 PST The final implementation will be in Perl. I try not to make my examples too Perl-esque because the nuances of Perl is often lost, but a C example would be too verbose. Perferrably, any answer would be in some form of pseudo-code, as a mathematical description of the problem doesn't lend itself well to translation into code, at least not if I'm the one that has to translate. =) An english description of what I'm looking for: Essentially, I have a series of events that have a begin time, and an end time. They may or may not overlap. I need this data translated into non-overlapping segments of time where one or more events is occuring. Each segment would indicate how many events where in-progress for that segment. In other words, I need to output "between 9:05 am and 9:10 am, 3 events were occuring" "between 9:11 am and 9:20 am, 2 events where occuring" Since the input events could be separated by thousands or millions of time units, iterating through the entire range (lowest start time to higest end time of the entire input set) would be unacceptable.

Answer

Subject: Re: Timeline concurrency counting
Answered By: mathtalk-ga on 28 Feb 2003 06:54 PST
Rated: 5 out of 5 stars

Hi, epoulsen-ga:

Let's begin by stating some assumptions about the data in your problem
and the required output.

Your first sample data shows integer endpoints for the "event"
intervals.  The last example uses "times" instead of integers.

What is important about these data types is that they are discrete;
for any particular value there is a well-defined predecessor value and
a well-defined successor value.  This characteristic of the datatype
comes into play in your treatment of the input segments and output
segments as "closed" intervals.  An artifact that results from this is
that output segments may have "end points" which differ by "one" from
some "end point" that appears in the input.

This "off by one" aspect is tricky.  The input and output segments in
your illustration seem to be "closed" intervals, in the sense of
containing both their beginning and ending points.  For example, if
the input consists of two segments:

{begin 10, end 20}
{begin 20, end 30}

then judging by your example, the proper output would be three
segments:

{begin 10, end 19, count 1}
{begin 20, end 20, count 2}
{begin 21, end 30, count 1}

The "natural" data structure to perform this counting operation turns
out to be "half-open" intervals which contain their beginning point
but not their ending point.  The "first" and "last" steps of our
algorithm provide adjustments to these ending points so that the body
of the algorithm is effectively working with such half-open intervals.

A less intricate aspect of your illustrated output is that intervals
during which no events are active are suppressed in the output.

After we outline the algorithm next, we will apply it to two example
data sets.  The final section discusses possible ways to optimize the
algorithm by combining multiple "steps" into a common pass through the
data.

If you have any questions about the algorithm or examples, please use
the "Request Answer Clarification" button and I will be notified of
your concerns.

regards, mathtalk-ga


STEP 0: Input
------

Let's assume the event intervals are represented in some sequential
data file as Begin, End pairs:

{begin Tx1, end Ty1}
{begin Tx2, end Ty2}
 . . .
{begin Txn, end Tyn}

STEP 1: Separate beginnings and endings
------

Make a pass through the file, and put the "times" on separate
lines/records with increment +1 for begins and -1 for ends.  Adjust
the ending times so that these become "plus one" in whatever sense is
appropriate for the datatype:

Tx1, +1
(Ty1 + 1), -1
Tx2, +1
(Ty2 + 1), -1
 .
 .
 .
Txn, +1
(Tyn + 1), -1

This resulting data can be sent to a new text file or, if feasible, to
an array/list, depending on the facilities of your programming
environment.

STEP 2: Sort the resulting records
------

Sort the output of the first step according to "time" values, either
in a file or in memory (if the array was used).  Efficient sorting is
provided by a variety of external file utilities.  If the number of
segments in the input is not much more than 25, then an insertion sort
can be done in combination with the activities of Step 1.  If the
number of segments is typically well beyond this, it will probably pay
in the long run to implement a quicksort or heapsort algorithm.  A
typical perl interpreter under Unix would use the system provided
qsort( ) routine for sorting.

STEP 3: Combine records with same time
------

Make a pass through the sorted list of "times" and increments:

Tz1, +1
Tz2, +/-1
Tz3, +/-1
Tz4, +/-1
 .
 .
 .
TzN, -1

where N = 2n.  Naturally e1 = +1 and eN = -1, but we cannot predict in
general more than this about the pattern of increments.

There may be duplicates among the "times" given here.  As we pass
through the list, when a group of records with the same "time" Tzk is
encountered, we treat that group as a whole by summing up all the
increments.  If the total of increments for such a group is zero, it
can be ignored (skipped over).  Otherwise create a single entry for
any group of entries that has a nonzero total increment:

Tw1, e1
Tw2, e2
 . . .
TwK, eK

STEP 4: Create draft output segments
------

Initialize a counter $count = e1, the first increment of a distinct
entry, at the beginning of the pass.  Also store a "time" value
$timeOld = Tw1 of that entry.

Loop over each additional entry listed:

Set the new "time" value $timeNew = Twj and increment $incr = ej.

If $count > 0, then
   output {begin [$timeOld], end [$timeNew], count [$count]}
   
Then update the variables $count and $timeOld:

$count = $count + $incr

$timeOld = $timeNew

When the last entry pair TwK, eK is processed, we are done with this
step.

STEP 5: Adjust output ending points
------

Thus far the ending point of one segment will match the beginning
point of the next segment.  In your desired output there will not be
any "overlap" of these ending and beginning points.  To remove the
overlap, "subtract one" from the ending point of each output record.

Example 1
---------

Let's illustrate the steps of the algorithm using the simple input
described above:

{begin 10, end 20}
{begin 20, end 30}

In Step 1 we make a pass through the input, separating the beginning
and ending values.  The ending "times" are adjusted by adding 1, and
an increment of +1 or -1 is associated with the beginning and ending
times, respectively:

10, +1
21, -1
20, +1
31, -1

In Step 2 we sort the records according to "time" values:

10, +1
20, +1
21, -1
31, -1

In Step 3 we would combine records sharing a common "time", but in
this case no records are combined.

In Step 4 we create the draft output records.  After initializing
$count = 1 and $timeOld = 10 using the first pair of entries, we
proceed to output a record for each of the three remaining pairs of
entries:

{begin 10, end 20, count 1}
{begin 20, end 21, count 2}
{begin 21, end 31, count 1}

In Step 5 we adjust the ending points of these output records:

{begin 10, end 19, count 1}
{begin 20, end 20, count 2}
{begin 21, end 30, count 1}

Example 2
---------

Now let's do the more complicated data given in your initial example:
 
{begin  10, end  50 }
{begin  10, end  20 }
{begin  15, end  20 }
{begin  40, end  70 }
{begin 100, end 120 } 
 
In Step 1 we get this list of entries by separating the begin and end
"times":

10, +1
51, -1
10, +1
21, -1
15, +1
21, -1
40, +1
71, -1
100, +1
121, -1

After sorting in Step 2 we have:

10, +1
10, +1
15, +1
21, -1
21, -1
40, +1
51, -1
71, -1
100, +1
121, -1

After "telescoping" the entries in Step 3 to obtain distinct "time"
entries:

10, +2
15, +1
21, -2
40, +1
51, -1
71, -1
100, +1
121, -1

Now Step 4 creates these draft output records:

{begin  10, end  15, count 2}
{begin  15, end  21, count 3}
{begin  21, end  40, count 1}
{begin  40, end  51, count 2}
{begin  51, end  71, count 1}
  Note: We supress outputting {begin 71, end 100, count 0}
  next, because $count is zero.
{begin 100, end 121, count 1}

At last in Step 5 we adjust the ending points:

{begin  10, end  14, count 2}
{begin  15, end  20, count 3}
{begin  21, end  39, count 1}
{begin  40, end  50, count 2}
{begin  51, end  70, count 1}
{begin 100, end 120, count 1}

to get the final output.

Discussion
----------

The purpose of breaking the algorithm down into five separate steps is
partly for clarity, esp. to explain the treatment of the adjustments
to ending points, and partly to allow for flexibility.  In
implementing the algorithm it would be natural to combine various
steps so that the number of passes through the data is reduced.

Step 0 doesn't "do" anything; it merely acts to identify our input. 
Step 1 is a first pass through the data.  If one were implementing the
sort of Step 2 in custom code, e.g. as a straight insertion sort, then
Steps 1 and 2 could be naturally combined.  However for larger data
sets it would be desirable to reuse a well tested "fast" sorting
routine, and thus to keep Step 1 and 2 distinct.

Likewise Steps 3,4,5 can be combined into a single pass through the
data.  One would aggregate the first set of records and use that to
initialize $count and $timeOld.  Thereafter each set of records that
gets aggregated, summing the increments of all pairs of entries up to
one with a distinct "time" value (or to the end of the list), allows
us to create an new output record.  The ending point can be adjusted
as the output record is written.

epoulsen-ga rated this answer: 5 out of 5 stars

and gave an additional tip of: $10.00

Your answer is simple and elegant.  Thank you!

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy