3-DIMENSIONAL TELEVISION
Television is the most favorite pastime activity of the world. Remarkably, it has so far ignored the digital revolution; the way we watch television hasn’t changed since its invention 75 years ago. But the time of passive TV consumption may be over soon: Advances in video acquisition technology, novel image analysis algorithms, and the pace of progress in computer graphics hardware together drive the development of a new type of visual entertainment medium. The scientific and technological obstacles towards realizing 3D-TV, the experience of interactively watching real-world dynamic scenes from arbitrary perspective, are currently being put out of the way by researchers all over the world.
The 3D TV concept was put forward in the ATTEST project, which started in March 2002 as part of the Information Society Technologies (IST) programme, sponsored by the European Commission. Here, several industrial and academic partners cooperate towards a flexible, 2D-compatible and commercially feasible 3D TV system for broadcast environments.
An entire 3D-video chain was developed. The goals were content creation, coding, transmission, display and the central role that human 3D perception research will play in optimizing the entire chain. The goals include the development of a new 3D camera, algorithms to convert existing 2D-video material into 3D, a 2D-compatible coding and transmission scheme for 3D video using MPEG-2/4/7, and two new autostereoscopic displays. Obviously, if a workable and commercially acceptable solution can be found, the introduction of 3D TV will generate a huge replacement market for the current 2D-TV sets. In this decade, we expect that technology will have progressed far enough to make a full 3D-TV application available to the mass consumer market, including content generation, coding, transmission and display.
1. INTRODUCTION
3D TV is expected to be the next major revolution in the history of television. Both at professional and consumer electronics exhibitions, 3D-video and 3D displays always attract a lot of interest. Obviously, if a workable and commercially acceptable solution can be found, the introduction of 3D-TV will generate a huge replacement market for the current 2D-TV sets.
This paper describes the ATTEST project that started in March 2002 as part of the Information Society Technologies (IST) programme , sponsored by the European Commission. In the 2-year project, several industrial and academic partners cooperate towards a flexible, 2D-compatible and commercially feasible 3DTV system for broadcast environments. 3D TV includes real-time acquisition, coding, and transmission of dynamic scenes.
The entire 3D-video chain block diagram is shown in Fig.1.We use an array of hardware-synchronized cameras to capture multiple perspective views of the scene. The need for the 3D-video content will be satisfied in two different ways. First, a range camera will be converted into a broadcast 3D camera that will require a redesign of the camera optics and electronics. Secondly, as the need for 3D content can only partially be satisfied by newly recorded material, a new algorithm will also be developed to convert existing 2D-video material into 3D. Both offline (content provider) and online (set-top box) conversion tools will be provided. Compatibility with conventional 2D-TV is of vital importance, as 2D and 3D-TV will co-exist during the evolution period. Therefore, the coding schemes will be developed within the current MPEG-2/4/7 broadcast standards that allow for the transmission of depth information in an enhancement layer, while providing full compatibility with existing 2D decoders. For transmission, a DVB network will be used. As the area of 3D displays is still rapidly evolving, the video chain should be adaptable to a wide range of both 2D and 3D displays. The transmitted video plus depth information allows for rendering images for many such displays. Two new 3D displays will be developed, one for a single user and one for multiple users. Both allow free viewing (without stereo-glasses) over a wide viewing angle. The 3D display provides horizontal parallax with 16 independent perspective views at 1024×768 resolution. Head tracking will be used to drive the display optics such that the appropriate images are projected into the eyes of the viewers.
These three-dimensional displays hold tremendous potential for many applications in entertainment, information presentation, reconnaissance, tele-presence, medicine, visualization, remote manipulation and art.
In the following sections we will elaborate in greater detail on the individual parts of the 3D-video processing chain. We will start with a brief description of the content generation part, where we will discuss the development of the novel 3D camera as well as the 2D-to-3D conversion. Special emphasis will then be put on the coding and transmission aspects. This will be followed by some results for the novel view generation, a short description of the 3D displays and the human perceptual evaluation of the 3D-video chain. We will finish the paper with a short conclusion.
2. SUBJECT DETAILING
2.1 CONTENT CREATION
The 3D-video content will be supplied by novel 3D cameras and via conversion from existing 2D-video material .The content creation depends upon whether the object is static or dynamic.
2.1.1 NOVEL CAMERAS FOR 3D STATIC VIDEO
The 3D-video camera that will be developed is based on Zcam™; an existing depth camera. In the project, the camera will be improved to meet the resolution and accuracy demands of 3D-TV. Next we compare the new approach with more conventional approaches using multiple 2D cameras, followed by a discussion on the used technology and its challenges.
2.1.1. a DEPTH CAMERA VERSUS MULTIPLE 2D CAMERAS
The new depth camera will yield conventional 2D-video accompanied with depth-per-pixel via a direct depth sensing process.In the case of conventional approaches with a stereo camera consisting of two or more conventional 2D cameras the actual 3D-video is then acquired via subsequent image processing such as camera calibration, correspondence estimation and stereo triangulation.
However, the accuracy of stereo triangulation deteriorates with the distance from the cameras. As video production requires the ability to shoot both close-ups as well as long distance shots, the stereo accuracy will thus vary accordingly. The changing production demands require also changing the camera geometry, e.g. zooming via precision setting of two or more independent lenses. This is mechanically impractical and requires reliable calibration tools. Finally, correspondence estimation depends on the existence of matchable features in the scene content and is still prone to errors due to mismatches. A scene point must be visible by at least two cameras if its depth is to be recovered.
Since the cameras have different positions, it is common that there are areas that are visible by a single camera only. This problem is reduced by increasing the number of 2D cameras, but this will make the stereo camera setup more difficult to handle during production and increase the amount of video streams to be processed.
The camera developed in the ATTEST project overcomes the aforementioned obstacles. The depth camera is based on a single sensor that measures the distance from the camera to the scene at each pixel simultaneously. There are no angular differences between the color camera and the depth sensor, so each pixel of the color camera is assigned a corresponding depth value. The depth measurement is independent of the visible scene content: a depth map is generated even if the scene contains no visible features (e.g. in total darkness). This assures correct recovery of depth maps for areas of constant color, for example cloths and walls. The depth accuracy of the camera is independent of the distance from the camera. The camera measures linearly scaled depth values inside a controllable depth range. This enables the camera to handle seamless changes between long distance shots and close-ups, without affecting the quality of the recovered depth and without any change of the camera geometry (such as a change of base line).
2.1.1.b THE TECHNOLOGY OF THE DEPTH CAMERA
The operation of the camera is based on generating a “light wall” moving along the field of view, see Figure 2.1.1.a. As the light wall hits the objects, it is reflected back towards the camera carrying an imprint of the objects. The imprint contains all the information required for the construction of the depth map. The 3D information can now be extracted from the reflected deformed “wall” by deploying a fast image shutter in front of the CCD chip and blocking the incoming light as shown in Figure 2.1.1.a.c. This type of camera belongs to a broader group of sensors known as scanner-less LIDAR (laser radar without mechanical scanner). The collected light at each of the pixels is related to depth, but also to the reflectivity of objects. Hence, a normalization step is performed per pixel by dividing the front portion pixel intensity by the corresponding portion of the portion of the total intensity.
Figure 2.1.1.a : (a) Light wall moving from camera to scene (b) Imprinted light wall back to camera (c) Truncated light wall containing depth information from the source.
The output of the depth camera will be of two streams .One will be the normal broadcast quality image and the other will be the depth image. Figure 2.1.1.b and Fig. 2.1.1.c shows the video and depth images taken by the current camera.
Figure 2.1.1.b & Figure 2.1.1.c : Images taken by the depth camera; (a) normal RGB image;(b) accompanying depth image ( graylevel inversely proportional to depth)
The technological challenge of the depth camera is two fold :
1. Fast switching of the illumination source to form the “light wall”, and fast gating of the reflected image entering the camera. In the current depth camera, a cluster of IR laser diodes and corresponding optics is used to generate homogeneous illumination. The diodes are switched on and off with rise/fall times shorter then 1 nsec. Here, super fast driver electronics are designed to comply with the fast response, small space and low cost, and yet maintain high efficiency
2. The detection of the reflected pulse has to be synchronous with the switched illuminator. For this, a special fast driver has been designed that has rise/fall times shorter then 1 nsec. The current camera uses a fast optical switch on the basis of a so-called gated intensifier. This device is pixelized and contributes a small amount of noise, which limit the depth resolution and accuracy respectively. In the project, we will develop a solid-state shutter, which circumvents both limitations.
2.1.2 CREATION OF A 3D DYNAMIC VIDEO
Using an array of hardware-synchronized cameras captures the dynamic scenes. There are many studios where 3D dynamic videos are captured .Some of them are the virtualized reality system of Kanade et al. with 51 cameras arranged in a geodesic dome.
The Blue-C system at ETH Z¨urich consists of a room-sized environment with real-time capture and spatially-immersive display. The Argus research project of the Air Force uses 64 cameras that are arranged in a large semi-circle.
In the MetaVision project shown in Fig.2.1.2.a three cameras were used which will capture three video signals, one from the main high- resolution colour camera, and two from lower resolution auxiliary cameras positioned either side of the main camera. The auxiliary cameras are small monochrome cameras with normal TV resolution, whereas the main camera will have HDTV resolution or beyond. The auxiliary cameras are synchronized with the main camera. All the cameras are at 25Hz progressive.
Figure 2.1.2.a : Multiple camera setup in MetaVision Project
The ATTEST project described consists of an array of 16 hardware synchronized cameras as shown in Fig.2.1.2.b. Each camera captures progressive high-definition video in real-time. Here 16 Basler A101fc color cameras with 1300×1030, 8 bits per pixel CCD sensors. The cameras are connected by IEEE-1394 (FireWire) High Performance Serial Bus to the producer PCs. The maximum transmitted frame rate at full resolution is 12 frames per second. Two cameras each are connected to one of eight producer PCs.
All PCs in the prototype should have 3 GHz Pentium 4 processors, 2 GB of RAM, and run Windows XP. The Basler camera is primarily chosen because it has an external trigger that allows for complete control over the video timing. A PCI card is used with custom programmable logic devices (CPLD) that generates the synchronization signal for all cameras. All 16 cameras are individually connected to the card, which is plugged into one of the producer PCs. Although it would be possible to use software synchronization, precise hardware synchronization is preferred for dynamic scenes. The 16 cameras in a arranged in a regularly spaced linear array as shown in Figure 2.1.2.b
Figure 2.1.2.b : An array of 16 cameras and projectors used in ATTEST project.
The optical axis of each camera is roughly perpendicular to a common camera plane. Standard calibration procedures are used to determine the intrinsic and extrinsic camera parameters.
2.1.3 CONVERSION FROM CONVENTIONAL 2D VIDEO
The need for 3D-video content can only partially be satisfied with newly recorded material. Therefore, depth reconstruction algorithms will be developed that can be used to convert existing 2D-video material into 3D.
There are two types of conversion methods :
1. Off-line conversion tools will be provided for use at the broadcaster and content provider side. These will primarily be used to convert popular movies and impressive documentaries. As there aren’t any real-time constraints in this case, all available video data can be used in the computations and 3D information can be integrated over a whole shoot, resulting in high quality 3D reconstructions.
In this case computations can be performed off-line and computationally more expensive algorithms can be used. Another important advantage is that all image data is available at once. If necessary, 3D information obtained in one camera shot can even be used to provide depth augmentation for another.
2. On-line conversion methods (e.g. for processing in a set-top box), that will allow the viewer to augment any suitable, incoming 2D broadcast to 3D. In this case, computations can only be based on video frames that have already been received. The first approach will enable on-line depth augmentation using a set-top-box at the receiver end. This allows a user at home to activate 3D depth augmentation for any suitable 2D-video content that is received. In this case, computations can only be based on video frames that have already been received. Real time implementations like these require that the developed approach can take full advantage of advanced DSP capabilities.
2.2 COMPRESSION
Multi-video recordings constitute a huge amount of raw image data. Transmitting 16 uncompressed video streams with 1300×1030 resolution and 24 bits per pixel at 30 frames per second requires 14.4 Gb/sec bandwidth, which is well beyond current broadcast capabilities. So compression techniques should be adapted.
The different methods for compression are :
1. Spatial encoding
2. Temporal encoding
3. Centralized processor encoding.
Motion compensation in the time domain is called temporal encoding, and disparity prediction between cameras is called spatial encoding. Results show that a combination of temporal and spatial encoding leads to good results. The Blue-C system converts the multiview video into 3D “video fragments” that are then compressed and transmitted. However, all current systems use a centralized processor for compression, which limits their scalability in the number of compressed views.
To efficiently encode the multi-video data using object geometry, the images may be regarded as object textures. In the texture domain, a point on the object surface has fixed coordinates, and its color (texture) varies only due to illumination changes and/or non-Lambertian reflectance characteristics. For model-based coding, a texture parameterization is first constructed for the geometry model. Having transformed all multi-video frames to textures,the multi-view textures are then processed to de-correlate them with respect to temporal evolution as well as viewing direction. Shape-adaptive as well as multi-dimensional wavelet coding schemes lend themselves to efficient, progressive compression of texture information. Temporarily invisible texture regions can be interpolated from previous and/or future textures, and generic texture information can be used to fill in regions that have not been recorded at all. For spacetime isosurface reconstruction, deriving one common texture parameterization for all time instants is not trivial since the reconstruction algorithm does not provide surface correspondences over time. Encoding the time-varying geometry is also more complex than in the case of model- based analysis. Current research therefore focuses on additionally retrieving correspondence information during isosurface reconstruction.
The approach specified in the ATTEST project is to reduce the data to a single view with per-pixel depth map. This data can be compressed in real-time and broadcast as an MPEG-2 enhancement layer. On the receiver side, stereo or multiview images are generated using image-based rendering. However, it may be difficult to generate high-quality output because of occlusions or high disparity in the scene. Moreover, a single view cannot capture view-dependent appearance effects, such as reflections and specular highlights. High-quality 3D TV broadcasting requires that all the views are transmitted to multiple users simultaneously. The 3D TV system described uses temporal compression only and transmits all of the views as independent MPEG-2 video streams.
2.3 CODING AND TRANSMISSION
The 3D TV system coding should based on a flexible, modular and open architecture that provides important system features, such as backwards compatibility to today’s 2D digital TV, scalability in terms of receiver complexity and adaptability to a wide range of different 2D and 3D displays .
The 3D TV uses the Layered Coding Syntax shown in Fig.2.3.1.
Figure 2.3.1 : Layered Coding Syntax
The Layered Coding Syntax consists of one base layer and at least one additional enhancement layer. To achieve backwards compatibility to today’s conventional 2D digital TV, the base layer is encoded by using state-of-the-art MPEG-2 and DVB standards. Thus, this layer can be decoded by standard set-top boxes designed for 2D digital TV broadcast reception. The remaining enhancement layer(s) deliver(s) the additional information to the 3D-TV receiver. The minimum information transmitted in the enhancement layer(s) is an associated depth map providing one depth value for each pixel of the base layer. However, in the case of critical video content (e.g. large scale scenes with a high amount occlusions) it might be useful to send further information, for example segmentation masks and occluded texture.Note that the layered structure in Fig. 3 is extendable in this sense.
For the transmission of the enhancement layer(s), it is planed to rely as far as possible on already available MPEG-2/4/7 tools. In case that existing tools prove to be inadequate coding will be given to appropriate standardization bodies like MPEG Ad-hoc group (AHG) etc
Additionally it is important to realize that stereovision is only one of the relevant depth cues and that other cues such as motion-parallax, texture, brightness and geometric appearance of video objects are of comparable importance .
For scene objects that are sufficiently far away from the viewer, they can even become dominant see Fig.2.3.2.
Figure 2.3.2 : Importance of depth cues in dependence of the viewing distance
It is therefore a significant feature of the described layered structure that it is flexible enough to support alternative forms of depth representation. This allows for a stepwise introduction of 3D-TV receivers of different complexity. For example, an intermediate low-cost 3D-TV receiver could use the additional depth layer to render individual perspective views according to the head-tracked viewing position of the TV watcher (see also Fig. 2.3.1). By this means, broadcasters could provide the user with a first, limited depth impression through parallax viewing, even on conventional 2D-TV screens. On the other hand side, users willing to invest into a 3D-TV set could enjoy a full-blown stereo reproduction of the same data on single- or even multiple user 3D displays (see Fig. 2.3.1).
The proposed layered coding syntax will also provide scalability in terms of depth experience. This is particularly important, as perception studies have indicated that there are differences in depth appreciation over age groups. Hence in our view, the TV viewer should be in control of his depth experience. He should be able to set the depth level according to his personal preference – a feature which can also be used for graceful degradation in the case of unexpected artifacts in depth which are usually more annoying in stereovision than in parallax viewing.
The Layered coding syntax has many advantages like :
(a) Backward Compatibility – Since the base layer is coded by using MPEG,DVB standards this can be decoded by using the conventional 2D receivers ensuring that the existing 2D TV owners can also watch the transmission of 3D in 2D.Thus the system can plug into today’s digital TV broadcast infrastructure and co-exist in perfect harmony with 2D TV.
(b) Scalability-The system is highly scalable in terms of depth experience. This is particularly important, as perception studies have indicated that there are differences in depth appreciation over age groups. Hence, the TV viewer should be in control of his depth experience. He should be able to set the depth level according to his personal preference – a feature which can also be used for graceful degradation in the case of unexpected artifacts in depth which are usually more annoying in stereovision than in parallax viewing.
(c) Adaptability-The system is adaptable as there are different enhancement layers and the user can use receivers of different complexity according to the depth requirement.
(d) Another advantage of using 2D coding standards is that the codecs are well established and widely available. Tomorrow’s digital TV set-top box could contain one or many decoders, depending whether the display is 2D or multiview 3D capable.
2.4 3D DISPLAYS
The displays plays an important role in the 3D TV system. There are different types displays that can be used to produce the 3D effect.
2.4.1 HOLOGRAPHIC DISPLAYS
In holographic reproduction, light from an illumination source is diffracted by interference fringes on the holographic surface to reconstruct the light wavefront of the original object. A hologram displays a continuous analog lightfield, and real-time acquisition and display of holograms has long been considered the “holy grail” of 3D TV. The most recent device, the Mark-II Holographic Video Display, uses acousto-optic modulators, beam splitters, moving mirrors, and lenses to create interactive holograms. In more recent systems, moving parts have been eliminated by replacing the acousto-optic modulators with LCD , focused light arrays , optically-addressed spatial modulators or digital micro mirror devices. All current holo-video devices use single-color laser light. To reduce the amount of display data they provide only horizontal parallax. The display hardware is very large in relation to the size of the image (which is typically a few millimeters in each dimension). The acquisition of holograms still demands carefully controlled physical processes and cannot be done in real-time. Holographic systems are unable to acquire, transmit, and display dynamic, natural scenes on large displays.
2.4.2 VOLUMETRIC DISPLAYS
Volumetric displays use a medium to fill or scan a three-dimensional space and individually address and illuminate small voxels. However, volumetric systems produce transparent images that do not provide a fully convincing three-dimensional experience. Furthermore, they cannot correctly reproduce the lightfield of a natural scene because of their limited color reproduction and lack of occlusions. The design of large-size volumetric displays also poses some difficult obstacles that in maintaining view-dependent effects such as occlusion, specularity, and reflection. Their prototype uses beam-splitters to emit light at focal planes at different physical distances. Two such devices are needed for stereo viewing. Since the head and viewing positions remain fixed, this prototype is not a practical 3D display solution.
2.4.3 PARALLAX DISPLAYS
Parallax displays emit spatially varying directional light. They provide only horizontal parallax . There are different types of parallax displays:
(a) Parallax Stereograms – It uses a plate with vertical slits as a barrier over an image with alternating strips of left-eye/right-eye images .
(b) Parallax Panoramagrams – To extend the limited viewing angle and restricted viewing position of stereograms, narrower slits and smaller pitch are used between the alternating image stripes .
(c) Integral photograph – It uses an array of spherical lenses instead of slits. This is frequently called a “fly seye” lens sheet. An integral photograph is a true planar lightfield with directionally varying radiance per pixel . Integral lens sheets can be put on top of high-resolution LCDs. Integral photographs sacrifice significant spatial resolution in both dimensions to gain full parallax.
2.4.4 LENTICULAR DISPLAYS
The most common display material used is the Lenticular sheet. Lenticular sheet is a linear array of narrow cylindrical lenses called Lenticules. This reduces the amount of image data by giving up vertical parallax. To improve the native resolution of the display, multi – projector lenticular displays are used.. For this the back of a lenticular sheet is painted with a diffuse paint and is used as a projection surface. Different arrangements of lenticular sheets and multi-projector arrays can be made.
For the 3D display the lenticules are arranged as either a Rear-projection or a Front-projection 3D display. The display system uses a linear array of 16 projectors and lenticular screens.
The system uses 16 NEC LT-170 projectors with 1024×768 native output resolution. This projector is chosen because of its compact form factor which proposes values for optimal projector separation and lens pitch. Ideally, the separations between cameras and projectors are equal. The offset in the vertical direction between neighboring projectors leads to a slight loss of vertical resolution in the final image. The system uses eight consumer PCs and dedicate one of them as the controller.
For the rear-projection system (Figure 2.4.4.a left), two lenticular sheets are mounted back-to-back with optical diffuser material in the center. The back-to-back lenticular sheets and the diffuser fabric were composited using transparent resin that was UV-hardened after hand-alignment.
The front-projection system (Figure 2.4.4.a right) uses only one lenticular sheet with a retro-reflective front-projection screen material mounted on the back.
Figure 2.4.4.a : Projection type Lenticular 3D display
The projection-side lenticular sheet of the rear-projection display acts as a light multiplexer, focusing the projected light as thin vertical stripes onto the diffuser. A closeup of the lenticular sheet is shown in Figure 2.4.4.b. Considering each lenticule to be an ideal pinhole camera, the stripes capture the view-dependent radiance of a three-dimensional lightfield (2D position and azimuth angle). The viewer-side lenticular sheet acts as a light de-multiplexer and projects the view-dependent radiance back to the viewer. Note that the single lenticular sheet of the front-projection screen both multiplexes and de-multiplexes the light.
Figure 2.4.4.b : Formation of vertical stripes on the diffuser of the Rear-projection display. The closeup photograph shows the lenticules and stripes from one view point.
Figure 2.4.4.c shows photographs of both rear-projection and front-projection displays.
Figure 2.4.4 c : Rear-projection 3D display with double-lenticular screen. Right: Front-projection 3D display with single- lenticular screen.
The two key parameters of lenticular sheets are the field of view (FOV) and the number of lenticules per inch (LPI).The lenticular sheets used here are of 72”× 48” size with 30 degrees FOV and 15 LPI. The optical design of the lenticules is optimized for multiview 3D display. The number of viewing zones of a lenticular display are related to its FOV (see Figure 2.4.4.d). Here, the FOV is 30 degrees, leading to 180/30 = 6 viewing zones.
At the border between two neighboring viewing zones there is an abrupt view-image change (or “jump”) from view number 16 to view number one. This is eliminated by increasing the FOV of the display. Here each sub pixel (or thin vertical stripe) in Figure 7 is projected from a different projector, and each projector displays images from a different view.
The field of view of a lenticular display.
2.4.4.1 DISPLAY CALIBRATION
Automatic projector calibration for the 3D display is very important. Here relationship between rays in space and pixels in the projected images is founded by placing a camera on the projection side of the screen. Then the intensities of the projectors are equalized. For both processes, the display is covered with a diffuse screen material. Standard computer vision techniques are used to find the mapping of points on the display to camera pixels, which can be expressed by a 3×3 homography matrix. The largest common display area is computed by fitting the largest rectangle of a given aspect ratio (e.g., 4:3) into the intersection of all projected images. Even for one projector, the intensities observed by the camera vary throughout the projected image. Moreover, different projectors may project images of vastly different intensities.
The calibration procedure works as follows. First, a white image is projected into the common rectangle plane with each projector. Then record the minimum intensity in this image for each projector and then determine the minimum of those values across all projectors. This is the maximum intensity that is used for equalization.
Next, iteratively adjust the intensity of the image for each projector until the observed image has even maximum intensity.
This is possible because we know the correspondence between the camera pixels and the pixels of each projector. This process yields image-intensity masks for each projector. It is only an approximate solution, ince the response of the projectors for different intensities is generally nonlinear. In the rear-projection system, a translation of the lenticular sheets with respect to each other leads to an apparent rotation of the viewing zones (see Figure 2.4.4.e).
Figure 2.4.4.e : Apparent viewing zone rotation for rear-projection due to a shift of the lenticular screens.
In order to estimate this horizontal shift turn on each of the 16 projectors separately and measure the response on the viewer side of the display using the camera. The camera is placed approximately in the center of the display at the same distance to the screen as the projectors. Then observe with which projector maximum brightness in the camera is achieved. This is called as the apparent central projector. The image data is then re-routed between decoders (producers) and consumers such that the apparent central projector receives the images of the central camera.
2.5 PERCEPTUAL EVALUATION
The acceptance, uptake, and commercial success of any advanced technology aimed at the consumer market depend to a large extent on the users experiences with and responses towards the system. In the past, 3D video in theme parks, and even in 3D broadcast trials, were often intended to provide the viewers the ‘3D thrill of their life’. Depth impressions were also exaggerated to enhance the visual impact. Unfortunately, viewers also frequently experienced eye strain, headaches, and other unpleasant side effects. Therefore, it is vital to have a clear understanding of the in-the-home viewing experience of 3D-TV, both looking at the potential added value of the ATTEST 3D-TV systems, as well as the potential drawbacks for users. Our aim is to arrive at a set of requirements and recommendations for an optimal 3D-TV system, and contribute to each individual step in the 3D video chain through perceptual and usability evaluations of the proposed technological innovations.
More specifically, human-factors experiments will be performed to address the depth impression, perception of distortions, eye strain, quality, naturalness, presence, and acceptability of the 3D coding algorithms and novel 3D displays, in order to arrive at perceptually optimal image quality with minimal coding artifacts and negligible side-effects . Additionally, a number of basic and novel areas surrounding 3D video perception will be investigated that will enhance our understanding of the user experience. For example, user control over the depth impression in 3D video has to date received very little systematic experimental investigation. This will be one of the issues that will be addressed, looking at both basic perceptual and cognitive effects as well as ease-of-use. In addition, the fundamental issue of acceptability of 2D production grammars for 3D video will be investigated, requiring a much deeper understanding of how depth perception develops over time – e.g., how tolerant viewers will be to sudden disparity changes – whilst relating these insights to existing 2D and 3D video production grammars.
3. ADVANTAGES AND DISADVANTEGES
Like any other system the 3D TV system has both advantages and disadvantages.
3.1 ADVANTAGES
1. The 3D TV display shows high resolution of 1024 X 768 pixels of stereoscopic color images for multiple viewpoints without special glasses.
2. The system is completely scalable and backwad compatible in the number of acquired, transmitted and displayed views.
3. The new algorithm efficiently renders novel views from multiple dynamic video streams on a cluster of PC’s.
4. The large number of views (16) , and the large physical dimension
( 6’X4’ ) of the display lead to a very immersive 3D experience.
5. The projector based 3D display has a native resolution of 12 million pixels which is greater than the largest currently available high resolution flat-panel screen of IBM T221 LCD with 9 million pixels.
6. The overall delay in the system fron the acquisition to the display is less than one second.
3.2 DISADVANTAGES
Some of the disadvantage of the 3D TV system are:
1. The graphics cards and projectors are not synchronized which lead to and increased motion blur for fast movements in the scene.
2. The Rear projection system has less quality compared to front projection system as it exhibits Moir’e artifacts on the screen.
3. The front projection system has more difficulty to represent pure blacks, or color when the variations between the projected images are more apparent.
4. The lenticular sheet with 15 LPI shows some visible vertical lines, which will vanish when the number of lenticules per inch are increased.
4. APPLICATIONS AND FUTURE WORK
4.1 APPLICATIONS
The 3D TV has a number of applications. Some of them are :
1. Applications in film and TV production : The 3D technology is used in film and TV production for capturing 3D information from image sequences The potential applications fall into two classes, one requiring 3D data that can be represented as a depth map from a single viewpoint, and the other requiring a full 3D model. Applications for both classes of data are briefly reviewed, and current work on 3D data capture in two EU-funded projects are the MetaVision project which is considering depth map acquisition based on a three-camera stereo system. The development of a multi-camera system using widely separated cameras in a studio environment is being carried out as a part of the ORIGAMI project.
2. Tele-conferencing : Since the delay between the acquisition and display is less than one second ,the system can be used for tele-conferencing purposes if suitable multiview video compression techniques are available in future.
3. Medical field : 3D displays can be used in the medical field for the effective diagnosis applications.
Figure 4.1. Visualization of different structures of the Visible Male knee on the 3D-LCD Philips prototype monitor.
4.Entertainment : The 3D TV system by its resolution of 1300 X 768 pixels will provide the TV viewers with the most natural viewing experience .It will provide the experience of seeing through the window .
4.2 FUTURE WORK
Most of the key ideas for the 3D TV system presented in this paper have been known for decades, such as lenticular screens, multiprojector 3D displays, and camera arrays for acquisition. The system is the first to provide enough viewpoints and enough pixels per viewpoint to produce an immersive and convincing 3D experience without special glasses. It is also the first system that provides this experience in real-time for dynamic scenes.
There is still much that we can do to improve the quality of the 3D display. As noted before, the rear-projection system exhibits Moir´e artifacts that can only be corrected by very precise vertical alignment of the lenticular sheets. The type of screen material (diffuse or retro-reflective) has a huge influence on the quality and sharpness of either rear- or front-projection screens. Experiments will be done with different lenticular sheets to improve the FOV and sharpness of the display. To improve the optical characteristics of the 3D display computationally. We call this concept the computational display. First, we plan to estimate the light transport matrix (LTM) of our view-dependent display by projecting patterns and observing them with a camera array on the viewer side. Knowing the LTM of the display will then allow us to modify the projected images to improve the quality. The viewing-side cameras could also be replaced by a user who can tune the display parameters using a remote control to find the best viewing condition for the current viewpoint. Another area of future research is precise color reproduction of natural scenes on multiview displays. Another new and exciting area of research is high-dynamic range 3D TV. High-dynamic range cameras are being developed commercially and have been simulated using stereo cameras . True high-dynamic range displays have also been developed . We plan to extend these methods to multiview camera arrays and 3D displays. We also plan to experiment with multiview displays for deformable display media, such as organic LEDs. Multiview cameras and displays that dynamically change their parameters, such as position, orientation, focus, or aperture, pose new and exciting research problems for the future.
5. CONCLUSION
It is impossible to convey the impression of dynamic 3D TV on paper. The 3D-TV system will be an entire 3D-video chain including content creation, coding, transmission and display. All parts will be optimized with respect to the entire chain, guided by research on human 3D perception.
We discussed the specific goals for all system parts. A new 3D camera will be developed that meets the resolution and accuracy requirements of the 3D-TV application. Both real-time and off-line algorithms will be developed to convert existing 2D-video material into 3D. For transmission, we use a 2D-compatible method in which conventional images are accompanied with depth information, coded with MPEG-2/4/7 schemes. This scheme enables addressing of a wide range of 2D and 3D displays. Finally, two autostereoscopic displays will be developed ;one optimized for a single viewer, and a second display for multiple viewers.
With the combination of well-established academic and industrial partners, and building upon the technological progress obtained from earlier 3D projects, we expect to achieve the goal of developing the first commercially feasible 3-dimensional television broadcast system which will provide the most natural viewing experience.