A key decision for the VE demonstrator has been to base it almost entirely on Web technologies. This has two main advantages, namely that we can leverage this mature technology stack (XML/HTML, CSS, DOM, Javascript, etc.) for aspects like storage and transmission formats (using XML), scripting (JS), an API for data handling (DOM), user events (DOM-Events), etc., while also making the technology immediately accessible for the millions of Web developers and users who work with this technology on a daily basis.
XML3D: A Basis for a Future 3D-Internet
Specifically, we added the capability to represent 3D scenes directly in XML/HTML-5 by adding a minimal set of new elements, together called XML3D. XML3D allows defining a 3D scene (“xml3d”), geometric objects (“mesh”), hierarchical grouping of objects (“group”), coordinate systems (“transform”) and programmable material, light source properties (“shader”), and a few other new HTML elements. The entire approach is based on modern programmable graphics technology. We support these through data containers (“data”) that group named and typed arrays which together define the geometric primitives and provide (per vertex) data for programmable shaders.
Exploiting the similarity between text and layout in 2D on the one hand and geometry and shaders for 3D on the other, we use CSS to assign coordinate systems, materials, and emission properties to geometry. Common Web links (href="...") are used to instantiate resources defined elsewhere in the same scene or externally. Similarly, the common HTML <img> and <video> elements are reused to define textures, and we plan to also support a recursive <html> element that would allow for fully interactive HTML content as textures. Since all XML3D elements are part of the DOM, Web developers can manipulate 3D scenes using Javascript in exactly the same way as 2D web pages. This means, for example, that techniques like AJAX, for dynamically loading new content, just work.
XML3D has been implemented natively in Firefox, Webkit (basis of Google Chrome, Apple Safari, and most OpenSource browsers), as well as in a portable version via Javascript using WebGL for rendering. Meanwhile, XML3D will likely also be supported by Fraunhofer IGD in collaboration with their X3DOM approach layered on top in a collaborative effort within the Spitzencluster “Software Innovation for the Digital Enterprise.” We are also in contact with W3C regarding standardization (presentation at the German “W3C-Day” in Berlin and the TPAC meeting in Lyon).
AnySL: Portable Shading for the Web
Programmable shaders are required for realistic material descriptions in virtual environments. Unfortunately, several quite similar but incompatible shading languages (SLs) are in use today (HLSL, glsl, RSL, etc.). A shader is essentially a plug-in, but one that gets called possibly tens of millions of times per second from the innermost loop of a renderer.
AnySL is a new technology developed jointly with Sebastian Hack's compiler group at Saarland University. Different SLs are compiled into a common and portable representation that can be referenced, e.g. directly from an XML3D file. This code is highly abstract “subroutine threaded code” without concrete types and other details. At runtime, the renderer supplies its definitions of the types as well as implementation for common shader libraries and code implementing certain functionality required by the SL.
An embedded compiler (LLVM) is used to transform the joint code such that it best fits to the used renderer, instruction set, and hardware configuration, including possible vectorization/packetization for SIMD architectures. Since the compiler is embedded into the application, we query it about the code to be compiled, look at the computing environment (cache sizes, instruction sets, etc.), and use this to tell the compiler how to best transform the input into efficient executable code.
AnySL is fully integrated into XML3D and runs in the browser, allowing for editing shaders on the Web page at runtime to change materials. We have started to expand the domain of AnySL to also cover other areas including image processing, computer vision, geometry processing, and animations via the XFlow project.
XFlow: Data- and Task-Level Parallelism
New programming languages like CUDA, OpenCL, and others expose the data parallelism of novel processor architectures at a rather low level. They mostly operate on so-called kernels that get a number of data sets (mostly collections of arrays) as input and provide new data sets as output.
This fits nicely with the “data” element in XML3D that provides input data for geometry and vertex attributes for shaders. These kernels are well suited to provide “tools” for XML3D to implement functionality like geometry processing, morphing and skinning, animation, image processing, tone mapping, or other pre- and post processing.
XFlow provides the high-level abstraction for configuring and instantiating predefined kernels and the data flow between them as part of XML3D. Having a declarative description of the entire flow-graph of kernels, XFlow can handle runtime data dependencies, scheduling, and memory management, and can optimize them globally across the entire graph..
In a next step, we will combine XFlow with the next version of AnySL. This will expose the internal structure of kernels within the global flow-graph and allow for optimizing code not only within a kernel but also across kernels. Knowing the requirements of individual kernels (cache usage, memory footprint and layout, etc.) will enable instructing the compiler to generate suitable global code, to find optimal operation parameters for scheduling, etc.
RTSG: A Renderer-Independent Scene Graph
XML3D defines only the external encoding and the access to the scene via the DOM in the browser. It does not define how the scene is represented and processed internally. For that purpose, we have developed RTSG. In contrast to other scene graphs, RTSG fully separates the scene representation from rendering. All renderers (there may be several, e.g. optical, acoustics, etc.) extract the parts of the scene that they are interested in and specifically optimize rendering independently from the scene graph.
RTfact and GPU-RT: Highly Flexible Real-Time Ray Tracer
In the past, one had to resort to low-level optimization on the assembly-language level to achieve high-performance real-time ray tracing. However, this greatly impaired the software's flexibility, such that the code had to be rewritten even for moderate changes in the software or the hardware being used. On the other hand, using the flexibility available through commonly known software design approaches, like object-orientation and polymorphism, would cause massive inefficiencies in the generated code.
A paper by Georgiev in 2008 circumvents this issue by using template meta-programming and highly optimized specialization of only a few basic primitives. This way, we can achieve performance levels that are often very close to native performance, while maintaining compile-time flexibility. Unfortunately, there are a number of limitations in template meta-programming, and it requires highly skilled programmers to handle the resulting architectural complexity.
As a result, we have started the collaboration with the compiler group at Saarland University (see AnySL above) with the medium-term goal of developing new programming models and tools based on embedded compiler support that will allow us to write algorithms similarly to what we have done with RTfact but without the drawback of C++ templates.
Likewise, we have developed many new approaches for real-time ray tracing, fast spatial indexing of scenes, and other techniques that are well-suited to specific hardware architecture such as CPUs and GPUs. They are discussed in more detail in RA7.
SORA: Combining Multimedia, Graphics, Simulations, and Other Modalities
Complex scene descriptions of virtual environments often contain content that requires more than simple visual rendering: there might be multimedia content, content in need of haptic or acoustic rendering, simulations that change existing or create new content, etc. They can have dependencies between each other and must be scheduled correctly. Initially, we directly specified the multimedia processing within the scene graph and mapped such computations onto dedicated implementations either on the CPU or on the GPU. The Service-Oriented Rendering Architecture (SORA) generalizes this significantly by using a semantic and service-oriented approach to find the right (sequence of) rendering and simulation services to handle each type of content with in a scene graph and in a given context, allowing for the flexible combination of different contents in a scene.
DRONE: Distributed Rendering and Display
The ability to run virtual environments in the browser is very useful, but browsers currently do not provide means for stereo rendering or other forms of immersive interaction with the scene, such as tracking of the user or input devices, due to browser limitations.
For this purpose, we have developed DRONE. DRONE is a framework for distributing rendering, compositing, and display within a network. In addition to rendering locally in the browser, this will allow us to scale rendering performance by distributing the workload across a cluster of machines. The images from each of the machines can then be sent to an arbitrary configuration of displays, while optimizing image transfer and compositing.
Meru/Sirikata: Collaborative Virtual Environments
Once we have 3D environments in the browser, the obvious next step is to connect them with each other or with some server in the cloud. Here, we relied on the Meru/Sirikata project from Stanford University to provide the server components and the protocols. The Web based client can then connect to the server via WebSockets to sync the local scene with that of the server and propagate local updates back. While this work is still pretty new, we see great potential specifically in the context of our very general approach to 3D scene descriptions.
Software Platform and Integration
Using the technologies described above, we have implemented an entire new graphics stack, comprising (from top to bottom layers): (i) Browser/Web development environment using Javascript for writing interactive 3D applications; (ii) XML3D and XFlow for representing 3D scenes and data parallel processing in the scene, with the DOM/XML3D-API as the main interface towards applications; (iii) Meru/Sirikata to synchronize 3D scenes in real time with each other; (iv) RTSG as the internal representation of a 3D scene with the DOM as a thin layer above RTSG; (v) AnySL for providing a platform-neutral and portable shading system with code transformations to achieve highly efficient shader implementations based on an embedded compiler; (vi) DRONE as a layer for scalable distribution of rendering and display across a cloud or clusters and as a key component for server-based rendering; (vii) SORA for a service-oriented approach to handling dependencies and scheduling of operations regarding complex setups including multimodal scene components; (viii) RTFact and GPU-RT as our flexible yet still highly efficient renderers for RTSG based on real-time ray tracing.
We intend to make at least large parts publicly available once it has reached a sufficient maturity level that we are confident that we can handle the necessary support.
Saarland Visualization Center
The new DFKI building was completed at the end of 2010. On its ground floor, it contains the Saarland Visualization Center, which is a multi-purpose facility serving as a research lab, a representative demo and presentation venue, and a technology transfer and visualization service provider for scientists, industry, and governments. It will be operated by DFKI in collaboration with the Cluster of Excellence and the Intel Visual Computing Institute, making use of almost the entire stack of technologies described here.