Media Synchronization
Edward Chow
Based on the material in the IEEE JSAC Vol 14,
No. 1, Jan. 1996, survey paper
"A Media Synchronization Survey: Reference Model,
Specification, and Case Studies"
by Gerald Blakowski and Ralf Steinmetz.
Definition
-
A time-dependent media
object is presented as a media stream where temporal relations between
consecutive units of the media stream exists.
-
If the presentation
durations of all units of time-dependent media object are equal, it is
called a continuous media objects, e.g., NTSC video, audio.
-
A time-independent media
object is any kind of traditional media like text/graphic.
-
Multimedia system:
A system or application that supports the integrated processing of several
media types with at least one time-dependent medium.
-
Synchronization in multimedia
system refers to the temporal relations between media objects in a multimedia
system
Classification of media use in multimedia
system
-
number of media
-
type of supported media
-
degree of media integration
Synchronization are supported at different
system levels
-
OS and lower comm. layer
provide intramedia synchronization, e.g., avoid jitter in one media stream.
-
run-time support for
synchronization of multiple streams, e.g., restrict skew between media
streams.
-
run-time support for
synchronization between time-dependent and time-independent streams. Objective
is to start or stop time-independent streams within a tolerable time interval,
if some predefined points of a time-dependent stream are reached, e.g.,
slide show synchronization with audio comments.
-
Authoring systems/tools
where temporal relations may be specified explicitly where media streams
are captured or created independently.
Synchronization Relations in Multimedia Systems
-
Content relations: e.g., data and graphic
of a spreadsheet. The changes on one should reflect that on the other.
-
Spatial relations: e.g., audio (surround sound)/text/graphic/video
(objects in a 3-D VR environment). related standard-CSS2.
-
Temporal relations: This paper focus on
this aspect of synchronization.
Intra-object synchronization example (25f/s
movie)
Inter-object synchronization example
Logical Data Unit (LDU) Hierarchy
LDU is the unit of a media stream. Different
streams have different LDU size.
-
Granularity Level of LDU
-
content level hierarchy: conductor/musician
works on symphony/movement/note levels
-
coding level hierarchy: recording engineer
work even lower at coding/sampling levels
Example of Video/Audio/User-Defined LDU
Lip Synchronization Problem
Live Synchronization Configurations
Without storage
With storage (recording
the session for playing back)
The Gap Problem
-
Gap between
related LDUs of Different streams may be increased due to transmission
channel or processing speed differences in the system.
-
Restrict blocking can be used
to hold back those that are faster
-
Resampling (on-line or off-line)
to fill the gap.
-
Playing back the last LDU on
the faster stream (video OK, audio annoying)
Lip Sync Experiments by IBM ENC
-
Video/Audio Experiment
-
-80<-->+80ms
are in-sync
-
>+160ms
or <-160ms are out of sync
-
inbetween:
annoying, but video ahead of audio can be tolerated better.
-
Telpointer/Audio (Shared Viewing)
Experiment
(Computer Supported Co-operative
Work,CSCW)
-
-500ms<-->+750ms
are in-sync
-
>+1250ms
or <-1000ms are out of sync.
-
inbetween:
sensed but not annoying compared with previous experiment.
Four Layer Synchronization Reference
Model
-
Media Layer
(application executes a process like this)
-
Stream Layer
-
provide abstraction where streams with timing
parameters concerning QoS for both inter and intra stream synchronization.
-
Interstream interaction is performed by the
attachment of events to the continuous media stream e.g., setcurepoint(stream/group,
at, event). Event will report back to the applications.
-
IBM Multimedia Presentation Manager (MMPM)
for OS2: comprise of sync/stream manager (resource management, control
registration and activities of stream handlers), several stream handlers.
Programming example:
Stream layer implementations can be classified
by
-
support for distribution,
-
type of guarantees,
-
types of supported streams.
-
Object Layer: operate on all types of media
and hides the differences between time-dependent and time independent media.
-
offered abstract of complete synchronized
presentation to applications.
-
take sync. specification as input and generate
correct schedule of overall presentation.
-
responsible for initiating preparative actions
for each stream. e.g.,
-
build up buffer of continuous stream
-
prefetch time-independent stream data
-
adapt color map of output devices
-
MHEG (ISO Multimedia Hypermedia Expert Group)
example:
provides standard for the coded representation
of multimedia hypermedia information objects that will be interchanged
among applications and services using a variety of interchange media.
-
can be classified by
-
distribution capabilities (local, based on
server structure, or no restriction)
-
type of presentation schedule computation
(run-time or compile time)
-
Specification Layer (open layer)
contains applications and tools for creating
synchronization specification.
The specification method can be classified
into
-
(time) Interval-based specification
-
Axes-based specification
-
Control flow-based specification (given synchronization
points)
-
Event-based specification (event trigger actions)
Synchronization in Distributed Environment
(DE)
We may have multiple sources and sinks in
DE.
Transport of Synchronization Specification
Three main approaches:
-
Delivery before presentation start
-
Use additional channel
-
Multiplexed data stream
Location of Synchronization Operations
-
combine/mixing objects into a new media object
(merge audio/video streams)
-
may reduce communication overhead.
-
need to be supported at different layers.
Clock Synchronization
-
Clock drift differently at sources and sinks.
(3-60 ms per hour on workstations)
- NTP (<10msec resolution) or GPS can be used
to synchronized the clock.
-
Time travel problem
Multiple Communication Relations
-
Many Possible Multimedia Multiparty connections
-
Multicast and Broadcast mechanisms can be
used to reduced bandwidth (stream layer)
-
Object layer responsible for efficient planning
of operation execution in each communication pattern
Multi-step Synchronization
-
synchronization during object acquisition
(during digitizing video)
-
synchronization of retrieval (access frame
in a stored video)
-
synchronization during delivery of LDU's to
network
-
synchronization during the transport (protocol,
router in the network)
-
synchronization at the sink (delivery to the
output devices)
-
synchronization within the output devices.
Manipulation of the presentation
-
operations need to be distributed
-
network connection need to be adjusted (resolution
change -> bandwidth change)
Consequence for Synchronization
-
run-time handling of clock offset, multicast/broadcast
connections
-
may require re-planning at run-time
Summary of Synchronization Reference Model
QoS Parameters
Frame rate, Sample rate
Jitter
Jitter
Error rate, Error rate
Skew specification relationship
a video conference example with video, audio,
and presentation with (tele)pointer.
video and audio data need to meet lip
sync requirement.
audio and telepointer need to meet pointer
sync requirement.
Complex example (Language lesson: English/Spanish
Audio)
Find greatest common denominator
Select most stringent set of requirements
Compute relationship between individual pair
of streams
Criteria for Assessing Multimedia Synchronization
Methods
-
support object consistency and maintenance
of synchronization spec.
-
support abstraction of content but allow reference
of individual LDUs
-
easy to describe all type of synchronizations
-
integration of time independent and time dependent
media objects
-
support definition of QoS
-
support hierarchical level of synchronization
(for large/complex presentation)
Interval-Based Synchronization Specifications
-
Interval: presentation duration of a object
-
There are 13 different types.
-
Some types are invertible such as before and
after.
-
Figure 23 shows a reduction set of 7 non-invertible
types.
10 operators for enhanced Intervalbased specification
-
Operators with one delay parameter: before,
beforeendof, cobegin, coend
-
Operators with two delay parameters: while,
delayed, startin, endin, cross
-
Operators with three delay parameters: overlaps
-
durations and delays may not be known in advance.
-
For beforeendof, delayed, startin, endin,
cross and overlaps delay parameters can not be 0.
-
A slide show with slide Slidesi (1<=i<=n)
and an audio object Audio can be specified as
Slide1 cobegin(0) Audio
Slidei before(0) slidei+1 (1<=i<=n-1)
-
Lip synch specified as
Audio while(0,0) Video
-
Figure 13 example can be specified as
Audio1 while (0,0) Video
Audio1 before RecordedInteraction
RecordedInteraction before (0) B1
P1 before (0) P2
P2 before (0) P3
P3 before (0) Interaction
P3 before (0) Animation
Animation while (2,5) Audio2
Interaction before (0) P4.
Assessment of enhanced Interval-based Synchronization
Specification
-
easy to handle open LDU.
-
need to include skew specifications
-
how to verify inconsistent specification
(a general problem to be addressed by
all specification method.)
Axes-Based Synchronization Specifications
-
the presentation events, such as start or
end of a presentation, are mapped to axes that are shared by objects of
the presentation.
1. Synchronization based on a global timer
-
all single-medium objects attached to a time
axis that represented an abstraction of real-time.
-
removing one object does not affect synchronization
of other objects.
-
problems arise when objects include LDUs with
unpredictable duration.
See Figure 25 below for the problem.
-
Additional QoS spec (including skew spec.)
is needed.
-
Audio stream is often used as the global timer.
It is more difficult to synchronize audio with other streams due to re-sampling
problems.
-
How about multiple audio streams? Which one
to be selected as global timer?
-
Example: Quicktime
Assessment
Virtual Axes Synchronization Specification
-
allow specification of coordinate systems
with user defined measurement units.
-
allow multiple virtual axes
-
use in project Athena and HyTime
standard.
Control Flow-Based Synchronization Specification
The flow of the concurrent presentation threads
is synchronized in predefined points
of presentation.
-
Basic Hierarchical Specification
Based on serial/paralel
synchronization of actions
order of serial synchronization is
from left to right. They begin at the conclusion of previous action on
their right.
-
It is enhanced by the Introduction of delay
as an action.
-
Limitations: each action can only be synchronized
at the beginning or end?.
Example where hierarchical structure is
not adequate. Here three pair-wise synchronization points exist among three
objects.
Synchronization via Reference Points
-
reference points include the start and stop
time of object and their subunits.
-
synchronization is defined by connecting reference
points of media objects.
-
a set of connected reference points is called
a synchronization point.
-
The presentation of involved subunits must
be started or stopped when the synchronization point is reached.
-
This allow specification of temporal relations
without explicit reference to time.
-
very intuitive to use
-
accommodate the synchronization of time-dependent
media objects very well
-
more synchronization points for tighter lip
sync.
-
detecting inconsistency is more difficult.
-
real-time based delay need to be specified.
Time Petri Nets
-
Rules for a time Petri net:
-
A transition fires, if all input places
contain a non blocking token
-
If a transition fires, a token is removed
from each input place and a token is added to each output place.
-
A token that is added to a new place is blocked
for the duration that is assigned to this place. (delay is associated with
place)
-
There are other time Petri net models that
associate delay with the firing of transition and specify the duration
of transitions.
-
For time dependent media objects, each place
in the Petri net represents an LDU.
-
Lip sync can be represented by connecting
appropriate LDUs with transitions.
Event-Based Synchronization Specification
-
Presentation actions are initiated by synchronization
events:
-
start of a presentation
-
stop a presentation
-
prepare a presentation
-
Events can be external (timer) or internal
to a presentation (generated by the reach of a specific LDU in a time dependent
media object
-
Creation and maintenance are more difficult.
Scripts
-
A textual description of a synchronization
scenario.
-
They are often full programming language extended
by timing operations.