Performance is important for smooth gameplay and a good gaming experience. This talk goes over good programming practices and some OpenGL tricks that may allow you to increase your game’s fps and get more out of your devices.

**By Shanee Nishry, Developer Advocate, Google**

The post OpenGL [ES] Optimizations appeared first on Mali Developer Center.

]]>It seems like every virtual reality talk or article recently has a dos and don’ts section. These can be useful to get by, but to overcome many of the platform’s challenges, you need an understanding of what is actually happening under the surface. nDreams CEO Patrick O’Luanaigh and Code Manager Richard ‘Fabs’ Fabian, will discuss some of the well-trodden issues of VR. From movement to frame rates, to draw calls and UI, they will outline the actual causes of these issues, how you could overcome them, and why these well-meaning recommendations aren’t always based on fact…

**Presented at the ARM Game Developer Day with TIGA by:**

**Patrick O’Luanaigh and Richard Fabian, nDreams**

The post The 'why' behind the dos and don'ts of mobile VR appeared first on Mali Developer Center.

]]>**Date:** 14 – 18 March, 2016

**Location:** San Francisco, USA

The Game Developers Conference^{®} (GDC) is the world’s largest and longest-running professionals-only game industry event. Presented every spring in San Francisco, it is the essential forum for learning, inspiration, and networking for the creators of computer, console, handheld, smartphone, tablet, mobile, and online games.

The GDC attracts over 26,000 attendees, and is the primary forum where programmers, artists, producers, game designers, audio professionals, business decision-makers and others involved in the development of interactive games gather to exchange ideas and shape the future of the industry.

Sponsored Sessions:

Marius Bjorge | Staff Engineer, ARM

Daniele Di Donato | Senior Software Engineer, ARM

*Learn how to prepare your graphics engine for Vulkan, outline Vulkan’s high-performance graphics and compute interface implementation and the benefits for mobile platforms. The presenter will showcase performance benefits achieved in a graphics engine using Vulkan. The talk will also explore ARM® CPU & GPU latest features and architecture and the add-on features that Vulkan provides to increase overall system performance and power efficiency.*

Weds. 2pm; West Hall 2000

Ivan Pedersen | Technical Artist, Geomerics

Tameem Antoniades | Co-Founder of Ninja Theory

*The current generation of games platforms has introduced beautiful, real-time rendering of environments where the player can roam freely through forests, canyons and vast, open terrain. Running such worlds at acceptable framerates has required new innovations. Dynamically streaming worlds with large draw distances and new types of geometry such as procedural terrain are examples of these. Yet combining believable global illumination with dynamic effects such as time of day remains a considerable challenge.*

*Discover how Enlighten addresses this challenge and hear first-hand how it is helping one particular studio, Ninja Theory, achieve a stunning vision for their upcoming title: Hellblade.*

Roberto Lopez Mendez | Software Graphics Engineer, ARM

Carl Calleweart | Americas Director & Global Leader of Evangelism, Unity

Patrick O’Luanaigh | CEO, nDreams

*The talk will discuss how high-quality VR graphics can be achieved on mobile devices by using Unity’s native VR support. It will identify many of the challenges surrounding developing for mobile platforms and will explore how best to overcome them.*

*The presentation will follow with a case study from nDreams showcasing the issues they faced when developing Unity-based mobile VR titles including their top rated “Perfect Beach” experience on Oculus GearVR. It will also outline how they identified the causes of their issues and how to address them.*

*Finally, the talk will cover what we might expect to see in the future of mobile VR.*

Stephen Barton | Senior Software Engineer, ARM

Stacy Smith | Demo coder, ARM

*This talk introduces you to the ARM tools and skills needed to profile and debug your application, and the optimization techniques and best practices to achieve high frame rates with high quality 3D graphics on mobile platforms.*

*The session also uses practical case studies to illustrate how the application bottlenecks were detected and the optimizations implemented to overcome the challenges.*

Discover the ecosystem that will turn your dream game into a reality on mobile at GDC booth #1624. Enjoy the ARM demos and sit for an exclusive talk in our lecture theatre from our Partners and ARM covering a range of topics that will help developers push the boundaries of mobile graphics.

The post Game Developers Conference (GDC) 2016 appeared first on Mali Developer Center.

]]>Date: 10-11 February 2016

Location: Hollywood, US

We will be attending the Unity Vision VR/AR Summit, a unique event focused on furthering the knowledge base of anyone developing virtual and/or augmented reality content. Look out for our booth, talk to our VR experts and experience some of our latest demos!

We will also be presenting a talk at the event:

This talk will discuss how high-quality VR graphics can be achieved on mobile devices by using Unity’s native VR support, showcasing the experience of porting ARM’s *Ice Cave* demo to Samsung Gear VR. It will also cover highly optimized rendering techniques for shadows, refraction and reflections based on local cubemaps, as well as how to render stereo reflections to achieve high quality reflections in VR.

*Roberto Mendez Lopez, Senior Engineer, ARM*

*Carl Calleweart, Global Leader of Evangelism, Unity
*

The post Unity Vision VR/AR Summit appeared first on Mali Developer Center.

]]>**Date:** 22-25 February 2016

**Location:** Barcelona, Spain

We will be attending Mobile World Congress 2016, the world’s largest gathering for the mobile industry.

Visit our stand to experience our latest mobile graphics demos, including Ice Cave VR and a mobile version of Geomeric’s Subway.

**Find us in Hall 6, stand 6C10.**

The post Mobile World Congress 2016 appeared first on Mali Developer Center.

]]>Date: 6-9 January 2016

Location: Las Vegas, US

ARM will be attending CES 2016, the global stage where next-generation innovations are introduced to the marketplace.

Come and experience our latest demos, including Ice Cave VR, and talk to some of our developers and ARM experts.

**Find us at:**

Tech East, LVCC, South Hall 2, Booth MP25246

Click here for a map of the hall showing our location

The post CES 2016 appeared first on Mali Developer Center.

]]>As we near the end of the year, we thought it would be good to take a look at the progression of mobile VR into 2016, and have a look at how developers (and our demo team!) have taken to it in 2015.

**Mobile VR hardware**

This year has seen the launch of possibly two of the most ‘famous’ mobile VR devices: an updated Google Cardboard and, most recently, a consumer version of Samsung’s Gear VR with support for multiple devices.

We have also seen the arrival of many alternative headsets from smaller companies. In fact, there seems to have been a steady flow of headsets from both new and old manufacturers – earlier in the year we took at a look at a headset from a newly-formed company alongside one from a company with over 100 years of manufacturing experience. Read the article here.

The end of this year also saw announcements from Samsung and Huawei regarding the next generation of ARM^{®} Mali^{™} GPU, Mali-T880. Samsung unveiled the Exynos 8 Octa Application Processor, able to provide ‘life-like virtual reality experiences’; Huawei announced that the Hisilicon Kirin 950 (featuring

Mali-T880) will be utilised in the upcoming Mate 8. You can find out more about these announcements on the ARM Connected Community Blog.

2015 also brought many successful VR-based kickstarter campaigns that are expected to come to fruition in 2016, from peripherals such as the Virtuix Omni and the VRGO to all-in one headsets like the AuraVisor, powered by the Rockchip RK3288 (ARM^{®} Cortex^{®} A17 MP4, Mali-T760 Mp4).

**Tools for developers**

The ability to create and provide VR content is fast becoming a key skill for developers. ARM and our partners are aiming to provide the best resources possible as we move into 2016.

Thsi year our demo team ported Ice Cave, one of our most recent graphics demos, to VR, showing developers the graphical quality that can be achieved in VR with ARM Mali GPUs.

To find out more:

Read an account of how this was achieved

See how a single frame of Ice Cave VR is created

(Presented at the Arm Game Developer day in London)

For those who would like an introduction to creating VR applications, our recently updated Mali VR SDK for Android is a good place to start. We also recommend looking at some our sample codes to help you get an idea of how things work in VR on Arm-based platforms.

ARM partners Epic Games and Unity now support VR in their respective game engines, Unreal Engine 4 and Unity 5. Both receive regular updates and are must-have tools for serious mobile VR development.

Watch our Unity and Unreal Engine presentations for more information on mobile optimization:

Enhancing your Unity Mobile Games

Unreal Engine on ARM CPU and GPU Architecture

Lastly, if you have ever wondered how difficult it is to create realistic fruit and vegetables in VR, Our partner Starship may just answer some of your questions in their ARM connected community blog, ‘Rendering realistic food on mobile GPUs’.

**Into 2016**

With the impending launches of the three ‘big players’ in the PC/console-based VR world; The Oculus Rift, HTC Vive and PlayStation VR scheduled for 2016, it seems that VR could well be here to stay.

Mobile VR’s accessibility and portability means it is likely to be the first VR experience for many, and can leave a lasting impression. Perhaps, as OculusVR founder Palmer Luckey recently tweeted, ‘Mobile VR will be successful long before PC VR goes wireless’.

The post Mobile VR in 2015…and beyond appeared first on Mali Developer Center.

]]>The post Kernel Device Driver r9p0-05rel0 appeared first on Mali Developer Center.

]]>By downloading the packages below you acknowledge that you accept the Licence Agreement for the Mali GPU Kernel Driver.

GPLV2 LICENCE AGREEMENT FOR MALI GPUS KERNEL DEVICE DRIVERS SOURCE CODE

THE USE OF THE SOFTWARE ACCOMPANYING THIS DOCUMENT IS EXPRESSLY SUBJECT TO THE TERMS OF THE GNU GENERAL PUBLIC LICENSE VERSION 2 AS PUBLISHED BY THE FREE SOFTWARE FOUNDATION AND SET OUT BELOW FOR REFERENCE (“GPL LICENCE”). ARM IS ONLY WILLING TO DISTRIBUTE THE SOFTWARE TO YOU ON CONDITION THAT YOU ACCEPT ALL OF THE TERMS IN THE GPL LICENCE PRIOR TO MODIFYING OR DISTRIBUTING THE SOFTWARE.

Further for the period of three (3) years, ARM hereby offers to make available the source code of any part of the software program that is supplied as object code or in executable form.

GPL Licence

GNU GENERAL PUBLIC LICENSE

Version 2, June 1991

Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

Preamble

The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software–to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation’s software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.

To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.

For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.

Also, for each author’s protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors’ reputations.

Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone’s free use or not licensed at all.

The precise terms and conditions for copying, distribution and modification follow.

GNU GENERAL PUBLIC LICENSE

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The “Program”, below, refers to any such program or work, and a “work based on the Program” means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term “modification”.) Each licensee is addressed as “you”.

Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program).

Whether that is true depends on what the Program does.

1. You may copy and distribute verbatim copies of the Program’s source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty;

and give any other recipients of the Program a copy of this License along with the Program.

You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.

In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.

3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,

c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.

If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.

4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.

5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the

Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.

6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients’ exercise of the rights granted herein.

You are not responsible for enforcing compliance by third parties to this License.

7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.

It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.

This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.

8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.

9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and “any later version”, you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.

10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.

NO WARRANTY

11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

Android Kernel Device Driver r9p0-05rel0

The post Kernel Device Driver r9p0-05rel0 appeared first on Mali Developer Center.

]]>**Contents**

This sample uses OpenGL ES 3.1 and the Android extension pack to procedurally generate complex geometry in real-time with geometry shaders.

The Android extension pack adds many new features to the mobile platform. Some features have been exclusive to desktop OpenGL for a long time, such as geometry and tessellation shader stages. In this sample, we showcase the use of geometry shaders to generate meshes from procedural surfaces, in real-time.

The source for this sample can be found in the folder of the SDK.

The geometry shader is an optional programmable stage in the pipeline, that allows the programmer to create new geometry on the fly, using the output of the vertex shader as input. For example, we could invoke the geometry shader with a set of N points using a single draw call,

glDrawArrays(GL_POINTS, 0, N)

and output a triangle for each such point in the geometry shader,

layout(points) in; in vec4 vs_position[]; void main() { vec4 position = vs_position[0]; gl_Position = position - vec4(0.05, 0.0, 0.0); EmitVertex(); gl_Position = position + vec4(0.05, 0.0, 0.0); EmitVertex(); gl_Position = position + vec4(0.0, 0.05, 0.0); EmitVertex(); EndPrimitive(); }

In this demo, we use geometry shaders to output a variable number of triangles (up to 6), given a set of points in a grid as input. For a more in in-depth introduction to geometry shaders, see [6].

In traditional 3D graphics, we are used to thinking about surfaces as a set of connected triangles. Modelling such surfaces involve meticulous placing of points, and connecting them together to make faces. For certain types of geometry, this works alright. But making natural-looking, organic geometry is a very tedious process in this representation.

In this demo, we will take advantage of recent advances in GPU hardware and use a more suitable surface representation, to generate complex surfaces in real-time. Instead of describing our surface as a set of triangles, we will define our surface as the set of points where a function of three spatial coordinates evaluates to zero. In mathematical terminology, this set is known as the isosurface of the function, and the function is referred to as a potential.

It might be difficult to imagine what that would look like, but think of it like this: An elevation map can be considered as a function, which takes a point (x, y) in the plane and gives the height at that point. The set of points where the height is equal to a specific value is called an isoline, because it traces out a curve in the plane. An isosurface is the same concept, but bumped up one dimension.

Isosurfaces are nothing new. They have been commonly used to visualize simulations in computational fluid dynamics, or in medical imaging to visualize regions of a particular density in a three-dimensional CT scan.

I’ve criticized triangles as being unfit for modelling organic geometry, so I better prove to you that this isosurface stuff is somehow better! To that end, let’s take a look at how you might make some simple terrain.

Let’s begin with a plane. For that we need a function which is negative below the plane and positive above it. When a point is on the plane, the function is zero.

In code this could be described as

surface = p.y;

where p is the sampling point in 3D space, and surface is the function value as stored in the 3D texture. Terrain usually has some valleys, so let’s add a value that oscillates as the sampling point travels the xz-plane.

surface = p.y; surface += 0.1 * sin(p.x * 2.0 * pi) * sin(p.z * 2.0 * pi);

Getting there, but this hardly looks like something you would find in nature. A common tool in procedural modelling is to use noise. There are many varieties of noise, such as simplex-, Perlin- or Worley-noise ([8], [9], [10]), that can be used to make interesting perturbations.

surface = p.y; surface += 0.1 * sin(p.x * 2.0 * pi) * sin(p.z * 2.0 * pi); surface += 0.075 * snoise(p * 4.0);

This is pretty good. But let’s say we want the terrain to flatten out at the top. To do this, we can intersect the surface with a plane that is facing downwards and cutting the terrain through the tops.

surface = p.y; surface += 0.1 * sin(p.x * 2.0 * pi) * sin(p.z * 2.0 * pi); surface += 0.075 * snoise(p * 4.0); surface = max(surface, -(p.y - 0.05));

We encourage the reader to have a look at Inigo Quilez’s introduction to modelling with distance functions [5]. Our demo doesn’t rely on the potential function being a distance field – that is, the value describes the distance to the closest surface – but the article describes some mathematical tricks that still apply. Nevertheless, this short exercise should have shown how compact we can represent the geometry, and how the representation allows for much easier experimentation, than a conventional triangle description.

Now that we have a sense of what isosurfaces are, and how we can make them, we need a way of drawing them. There are many methods for visualizing isosurfaces; some based on raytracing, some based on generating a polygonal mesh. We do not intend to give an overview of them here.

A famous method in the latter category is the marching cubes algorithm. Its popularity is perhaps best attributed to the splendid exposition by Paul Bourke [4], which does a fine job at explaining the technique, and provides a small and fast implementation as well.

In the demo, we use a different meshing technique known as surface nets. This technique appears to give similar mesh quality as marching cubes, but with considerably less geometry. Less geometry means less bandwidth, which is important to keep your mobile GPU happy.

The marching cubes technique was initially published in 1987, and is getting rather old. Not surprisingly, a lot of research has been done on the subject since then, and new techniques have emerged. Surface nets is a technique that was first introduced by Sarah Frisken [1] in 1999. The large time gap between the two techniques might lead you to believe that there may be a similar gap in complexity. But surprisingly enough, the naive version of surface nets [7] is simple enough, that you may even have thought of it yourself! Let’s first take a look at how it works in 2D.

We begin by laying a grid over the surface domain. In every cell of the grid, we sample its four corners to obtain four values of the potential function. If all values have the same sign, then the cell is either completely inside or outside the surface. But if there is a sign change between two corners, then the function must have intersected the edge between them.

If we place a single vertex in the center of each cell that contains an intersected edge, and connect neighbouring cells together, then we get a very rough approximation of the surface. The next step is to somehow smooth out the result. In the original paper, Gibson proposes an iterative relaxation scheme that minimizes a measure of the global surface “roughness”. In each pass of this routine all vertices are perturbed closer to the surface, but kept inside their original box. The latter constraint is important to preserve sharp features on the surface.

However, global optimization like that can be quite hairy to implement in real-time. A simpler approach is to compute the surface intersection point on each edge in a cell, summing them up and computing the average. The figure below shows intersection points as x’s and their average – the center of mass, if you will – as dashes.

The vertex is then perturbed to the center of mass, as shown below. This turns out to work rather well. It gives a smoother mesh, while preserving sharp features by keeping vertices inside their cells.

You can imagine how the technique extends itself to the third dimension. Each cell becomes a cube, and instead of linking neighbouring cells together by lines, they are connected by faces. The figure shows the process for a 3D sphere. On the left we show the mesh generated by linking together surface cells without the smoothing step. On the right, the vertices are perturbed towards the center of mass.

This naive implementation – as it is called [7] – lends itself very well to parallelization, as described in the next section.

Our GPU implementation works in three passes:

For each sample point in the grid (i.e. each corner of each cube): Sample the potential function, and store the result in a 3D texture.

For each cube in the grid: Fetch the function value at its 8 corners, and compute the center of mass. Store the result in another 3D texture.

For each cube in the grid that was on the surface: Construct faces by linking together neighbor cells that were on the surface too.

The first two passes are implemented in compute shaders, while the third pass uses a geometry shader to produce faces on-the-fly. It was easy to create links between surface cells for the 2D case above, but it might require a stretch of your imagination to do the same in 3D. So let’s elaborate a bit on that.

Generally, each cube on the surface can connect to its six neighbors with 12 possible triangles. A triangle is only created if both relevant neighbors are also on the surface. But considering all triangles for each cube would give redundant and overlapping triangles. To simplify, we construct faces backwards from a cube, connecting with neighbors behind, below or to the left of it. This means that we consider three edges for each cube. If an edge exhibits a sign change, the vertices associated with the four cubes that contain the edge are joined to form a quad.

To speed up the third pass, we only process cells that we know to be on the surface. This is done by the use of indirect draw calls and atomic counters. In the second pass, if the surface intersects the cell, we write the cell’s index to an index buffer and increment the count atomically.

uint unique = atomicCounterIncrement(outCount); int index = texel.z * N * N + texel.y * N + texel.x; outIndices[unique] = uint(index);

Here, N is the sidelength of the grid. When performing the draw call for the geometry shader, we bind a buffer containing the grid of points, the index buffer contaning which points to process, and an indirect draw call buffer contaning the draw call parameters.

glBindBuffer(GL_ARRAY_BUFFER, app->points_buffer); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, app->index_buffer); glBindBuffer(GL_DRAW_INDIRECT_BUFFER, app->indirect_buffer); glDrawElementsIndirect(GL_POINTS, GL_UNSIGNED_INT, 0);

The resulting mesh can look slightly bland. An easy way to add texture is to use an altitude-based color lookup. In the demo, we map certain heights to certain colors using a 2D texture lookup. The height determines the u coordinate, and allow for the use of the v coordinate to add some variation.

Lighting is done by approximating the gradient of the potential function in the geometry shader and normalizing, like so

float v000 = texelFetch(inSurface, texel, 0).r; float v100 = texelFetch(inSurface, texel + ivec3(1, 0, 0), 0).r; float v010 = texelFetch(inSurface, texel + ivec3(0, 1, 0), 0).r; float v001 = texelFetch(inSurface, texel + ivec3(0, 0, 1), 0).r; n = normalize(vec3(v100 - v000, v010 - v000, v001 - v000));

This gives a very faceted look, as the normal is only computed per face. A smoother normal could be computed by approximating the gradient at each generated vertex, and blending between them in the fragment shader. Alternatively – but costly – you could sample the gradient in the fragment shader.

In this demo we have taken a look at how we can use potential functions to model complex geometry, and the rendering of such functions using mesh extraction with geometry shaders. There has been much research in the area of mesh extraction, and we have only taken a brief look at one technique.

More advanced meshing techniques, such as dual contouring [3], improve upon the shortcomings of surface nets, and allow for the use of adaptive octrees on much larger grids.

Triplanar projection mapping [2] can be used to map a texture onto all three axes of your geometry, with minimal distortion. It can add more fidelity and variation beyond simple altitude-based color lookups.

[1] Sarah F. Frisken Gibson. “Constrained Elastic Surface Nets”

[2] Ryan Geiss. “Generating Complex Procedural Terrains Using the GPU”

[3] T. Ju, F. Losasso, S. Schaefer, and J. Warren. “Dual Contouring of Hermite Data”

[4] Paul Bourke. “Polygonising a scalar field”

[5] Inigo Quilez. “Modeling with distance functions”

[6] open.gl. “Geometry shaders”

[7] Mikola Lysenko. “Smooth Voxel Terrain”

[8] Wikipedia. “Simplex noise”

[10] Wikipedia. “Worley noise”

The post Procedural modelling with geometry shaders appeared first on Mali Developer Center.

]]>**Contents**

This sample uses OpenGL ES 3.1 and the Android extension pack to perform displacement mapping with tessellation. The sample investigates common techniques used to improve performance and visuals.

The Android extension pack adds many new features to mobile, that have until now been exclusive to desktop OpenGL. New pipeline stages, such as geometry and tessellation shaders, are now available to the programmer. This sample showcases the use of tessellation on a Mali GPU.

In the sample we apply a displacement map to a coarse mesh, to generate detailed geometry on the fly. We combine this with a screen-space based LOD computation to progressively vary the level of detail, depending on how much screen area the geometry covers.

Tessellation introduces three optional pipeline stages – two of them programmable – located conveniently after the vertex shader and before the fragment shader. These stages are capable of generating additional geometry. How many triangles, or where the generated geometry should be placed, is programmable in the Tessellation Control and Tessellation Evaluation shaders. These both operate on a per-vertex basis, and can see all the vertices of a single primitive.

The ability to generate additional geometry gives increased expressive power to the programmer. However, the ease of use is debatable. Realtime tessellation is notorious for requiring great care to avoid performance issues, patch gaps, mesh swimming or other visual artifacts. In this sample we will take a look at one particular usage of tessellation, as well as some tricks that can be applied to improve performance and hide artifacts. Before moving on, we briefly mention some common use cases, some of which we leave to the reader for further investigation:

Continuous level-of-detail (LOD): Geometry that covers only a handful of pixels clearly has lower requirements for detail than geometry that meets the viewer face-on. Traditionally this has been handled by dividing a mesh into several distinct meshes, each with a different level of detail. While this does the job, it is often difficult to hide the popping that occurs, when switching between discrete levels. Tessellation can be used to provide a seemingly continuous transition between detail levels.

Displacement mapping: Models that are handcrafted by artists tend to have much higher triangle counts before they are placed into a game, where they must be downgraded to meet the polygon budget. Displacement mapping stores details of the high quality mesh as a texture, that can be applied at run time to restore the fine details, using some LOD scheme as mentioned above.

Subdivision surfaces: Each polygonal mesh has a well-defined smooth surface associated with it. A refinement algorithm computes the smooth surface as the limit of a recursive process (the exact surface depends on the method used). One such method is Catmull-Clark subdivision. A popular approach by Loop and Schaefer [2] is suited for the GPU, and approximates the surface with bicubic patches.

Displaced subdivision surfaces: Lee et al. had the splendid idea of combining subdivision surfaces with displacement mapping [3]. The result is compact storage of fine-detail models, well suited for animation or rendering.

Smoothing 2D surfaces: GUI elements or text can be described by higher order geometry, such as bezier curves, to provide potentially infinite smoothness. Tessellation could be used to generate the geometry to render smooth 2D graphics.

For further reading, we refer to 10 fun things to do with tessellation [4], describing more uses such as terrain or hair rendering. For more information about the basics of tessellation, these articles ([5], [6]) provide a concise introduction.

In this sample we apply a displacement map to a spherical mesh, producing cool planetoidal-esque shapes. We begin by procedurally generating a cube mesh, where each face is predivided into a number of quadrilateral patches. A patch is a new primitive type, and defines the points that can be operated on in the tessellation shaders. These patches are further subdivided – into proper triangles – with a level of detail dictated by the control shader. As shown in the figure, we can produce a sphere from the tessellated mesh by normalizing each vertex to unit length.

**Figure 2: Each patch of the initial cube is further subdivided into triangles.**

Each point in the triangle is then normalized to unit length to produce a smooth sphere. Note that a uniform tessellation on the

cube will be denser near the seams of the faces on the sphere.”

The displacement map is generated using the popular 3D modelling package Blender, by the use of combining procedural textures of different types. To apply the map to the sphere, we need a mapping between vertex coordinates on the sphere and texel coordinates in the texture. Several methods for mapping a sphere exist, each with their own advantages and drawbacks. We chose a cubemap, where each side of the initial cube is mapped to one square in a texture of six.

**Figure 3: The displacement map consists of 6 squares, corresponding to each face of the cube.
Note that there are no visible seams between the faces in this figure.**

Sampling the cubemap is done by intersecting the sphere normal with the associated cube face. The mathematics for this turn out to be very simple, making cubemaps one of the more efficient mappings. Cubemap texture sampling is available as a hardware-level operation in the shaders.

Care should be taken to avoid visible seams between cubemap faces. Seamless filtering is available as a texture parameter hint in OpenGL, and may avoid issues. Further improvements can be made by averaging edges in the texture beforehand.

**Figure 4: The above cubemap applied to the tessellated sphere.**

In the evaluation shader, we renormalize the generated vertices and sample the texture using the sphere normal (parallel to the vertex position in our case!). The vertex is finally displaced by an amount proportional to the color of the texel.

While the basics of the displacement mapping technique are apparently simple, a good result does not come along by itself. In the next sections we describe some pitfalls associated with tessellation, as well as some optimizations that can be made to improve performance and visuals.

The tessellation evaluation shader will be run for each vertex generated by the tessellator. However, many of these vertices might end up being invisible when finally rendered to the screen. It is therefore beneficial to determine whether or not we can cull a patch, before submitting it to the tessellator, where further work would be wasted.

We can cull a patch by setting its tessellation factors to zero, effectively generating no additional geometry. This is done in the control shader, by checking whether all the vertices of a patch are either offscreen or are hidden by front geometry. In the case of a perfect sphere, a patch is hidden if all of its normals are facing away from the camera in view-space. That is, the z-component of each normal is negative. However, when the sphere is morphed we may have geometry that is displaced far enough to be visible from behind the sphere. A straightforward fix is to require that the z-component is less than some handpicked bias.

Finally, we project the patch vertices to normalized device coordinates, and compare the vertices with the frustum bounds to determine if the patch is fully offscreen.

Patches that do not cover a large screen area need not be tessellated too much. We take advantage of this to increase performance. The method used in the sample is a naive implementation of screen-space coverage adaptive tessellation, and works as follows:

Project the patch vertices to screen space

Compute the lengths of each projected edge (the unit will be in pixels)

The tessellation level of each edge is computed as a linear function of its screen space length. An edge is maximally tessellated when its length is equal to a handpicked threshold.

The inner tessellation levels are computed as the average of the associated outer levels.

Care must be taken to ensure that the edge levels are computed consistently across all patches. If two neighbouring edges do not have the same tessellation level, horrible gaps or flickering may occur.

Tessellation of meshes has close ties with sampling theory. I.e. the generated geometry must have a high enough sampling rate, in order to reconstruct the geometry described by the displacement map. If the sampling rate is too low, we could attempt to reduce the frequency of the displacement map to compensate.

In the sample code, we attempt to do this by selecting a lower quality mipmap of the texture, depending on the tessellation. Mipmaps are pre-calculated, optimized versions of the original texture, each of which downscaled by a factor of two from the previous level. You can think of this as reducing the frequency of the displacement geometry by a half, for each mipmap level.

A clever strategy could be to actually analyze the frequency components of the map, and select an appropriate mipmap level based on the tessellation level. In the sample we simply linearly select between some lower-bounded mipmap and the best mipmap based on camera distance.

It is possible that the displacement map simply has too much detail, than can be represented by the tessellated mesh. The result of this is aliasing, and can be painfully visible.

The following figures demonstrate the phenomenom. In both cases, the displacement map consisted of a single sharp line, dividing the map into a black and white section. In the first figure, the line was aligned with the grid, but slightly offset from the sampling center. In the second figure, the line was not aligned with the grid.

**Figure 5: The mesh suffers aliasing in one dimension due to high-frequency components in the displacement map that are insufficiently sampled by the tessellation.**

**Figure 6: The mesh suffers aliasing in two dimensions due to low sampling rate, and misalignment between the underlying sampling points and the displacement map.**

This jarring effect can be somewhat reduced by simply increasing the global tessellation factor, but that approach is not scaleable. Several techniques have been developed for preventing aliasing in tessellation. Such a technique must prevent gaps between patches and mesh swimming (vertices that squirm around when the level of detail is varied by the camera distance). We mention some ideas:

Importance sampling: The generated tessellation points are shifted such that they align better with contours in the displacement map.

Noise and dithering: Hide the effect by adding noise to the displacement map. We employ this strategy in one of our displacement maps that initially had visible artifacts from steep hills.

**Figure 7: The effects of aliasing can be somewhat hidden by adding noise to the displacement map.**

If your mesh consists of large flat areas – such as the world’s greatest virtual concrete slab [1] – we can reduce the triangle count with no apparent loss of fidelity. Geometry adaptive tessellation does this by examining the curvature or the underlying surface, and varies the level of tessellation accordingly. A possible approach applied to subdivision surfaces is described in [8].

Hopefully, this sample has demonstrated the potential use cases of GPU accelerated tessellation, as well as what pitfalls that lay before the eager programmer. If the reader decides to go further with tessellation, it is important to consider that the Mali GPU – in its own peculiarity – does not have a dedicated tessellation unit. The performance of tessellation can be highly dependent on the underlying hardware, and should be used with care.

[1] The Tech Report. “The world’s greatest virtual concrete slab”

[2] Loop, Schaefer. “Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches”

[3] Lee, et al.. “Displaced Subdivision Surfaces”

[4] Castaño, Ignacio. “10 Fun Things to do with Tessellation”

[5] The Little Grasshopper. “Triangle Tessellation with OpenGL 4.0”

[6] OpenGL SuperBible. “Primitive Processing in OpenGL”

[7] Rákos, Daniel. “History of hardware tessellation”, available online.

[8] GPU Gems 2. “Adaptive Tessellation of Subdivision Surfaces with Displacement Mapping”

The post Displacement mapping with tessellation appeared first on Mali Developer Center.

]]>**Contents**

This sample will show you how to efficiently implement high quality ocean water rendering using compute shaders in OpenGL ES 3.1.

**Note**

This sample uses OpenGL ES 3.1.

This sample makes use of Tessellation shaders and RGBA16F framebuffer extensions when supported by the device.

This sample assumes good knowledge of OpenGL ES 3.1 compute shaders.

The ocean rendering technique in this sample is based on the well-known water rendering paper by J. Tessendorf [1] which implements heightmap generation by synthesizing a very large number of waves using the Fast Fourier Transform. The main principle of Ocean rendering is that it can be modelled very well by thinking of it a sum of “infinite” waves at different amplitudes travelling in different directions.

Summing up a large number of sinusoids is very intensive with a naive approach, and has complexity O(M * N) if synthesizing M waves into N samples in one dimension. Increasing this to two dimensions and we end up with a terrible O(M * N^2) complexity. With simpler water techniques, very few waves are therefore used instead and the sinusoids are accumulated directly. For realistic looking water, we would prefer the number of waves to be the same as number of samples.

The inverse fast Fourier transform does this excellently, using only O(2N * (N * log2(N))) complexity for a 2D NxN grid to synthesize NxN waves. For reasonable values (this sample uses N = 256), this can be done very efficiently on GPUs.

There are many excellent introductions to the Fast Fourier Transforms, so only a short introduction will be provided here. The core of the Fourier Transform is a transform which converts from the time/spatial domain to frequency domain and back. The mathematical formulation of the discrete (forward) Fourier Transform is:

X[k] = sum n from 0 to N - 1: x[n] * exp(-j * k * 2 * pi * n / N)

and inverse Fourier Transform (which we’re mostly interested in) here:

// Only real difference is the sign of j. x[n] = sum k from 0 to N - 1: X[k] * exp(j * n * 2 * pi * k / N)

j is the imaginary constant sqrt(-1) which is also knows as “i”. Thus, the Fourier Transform is formulated with complex numbers. Taking the exponential of an imaginary number might look weird at first, but it can be shown using Taylor expansion that exp(j * x) is equivalent to

exp(j * x) = cos(x) + j * sin(x)

If we imagine that the real and imaginary numbers form a 2D plane, exp(j * x) looks very much like rotation with angle x. In fact, we can simply think of exp(j * x) as a complex oscillator.

This is the core of ocean wave synthesis. In the frequency domain, we will create waves with certain amplitudes and phases, and use the inverse Fourier transform to generate a sum of sinusoids. To move the water, we simply need to modify the phases in the frequency domain and do the inverse FFT over again.

Since the FFT assumes repeating inputs, the heightmap will be tiled with GL_REPEAT wrapping mode. As long as the heightmap is large enough, the tiling effect should not be particularly noticable.

The GPU FFT implementation used in this sample is based on the GLFFT library [2]. The FFT implementation in GLFFT is inspired by work from E. Bainville [3] which contain more details on how FFT can be implemented efficiently on GPUs.

The ocean is a random process in that the amplitudes and phases of waves are quite random, however, their statistical distributions are greatly affected by wind and can be modelled quite well. The Tessendorf paper [1] uses the Phillips spectrum which gives the estimated variance for waves at certain wavelengths based on wind direction and speed. Based on this formula, we generate a random two-dimensional buffer with initial phase and amplitude data for the heightmap and upload this to the GPU only at startup.

cfloat FFTWater::phillips(vec2 k, float max_l) { float k_len = vec_length(k); if (k_len == 0.0f) { return 0.0f; } float kL = k_len * L; vec2 k_dir = vec_normalize(k); float kw = vec_dot(k_dir, wind_dir); return pow(kw * kw, 1.0f) * // Directional exp(-1.0 * k_len * k_len * max_l * max_l) * // Suppress small waves at ~max_l. exp(-1.0f / (kL * kL)) * pow(k_len, -4.0f); } void FFTWater::generate_distribution(cfloat *distribution, vec2 size, float amplitude, float max_l) { vec2 mod = vec2(2.0f * M_PI) / size; for (unsigned z = 0; z < Nz; z++) { for (unsigned x = 0; x < Nx; x++) { auto &v = distribution[z * Nx + x]; vec2 k = mod * vec2(alias(x, Nx), alias(z, Nz)); cfloat dist = cfloat(normal_dist(engine), normal_dist(engine)); v = dist * amplitude * sqrt(0.5f * phillips(k, max_l)); } } }

In Fourier Transforms, we need to consider negative and positive frequencies. Before computing the spatial frequency, we alias the frequency according to Nyquist. This means that the higher values for x and z will alias to the negative frequencies. The negative frequencies represent waves travelling in the opposite direction as the positive counterparts.

static inline int alias(int x, int N) { if (x > N / 2) x -= N; return x; }

Before actually doing the FFT on the GPU, we run compute shader passes which generates a new frequency domain buffer for height, normals and displacement.

void generate_heightmap() { uvec2 i = gl_GlobalInvocationID.xy; // Pick out the negative frequency variant. uvec2 wi = mix(N - i, uvec2(0u), equal(i, uvec2(0u))); // Pick out positive and negative travelling waves. vec2 a = distribution[i.y * N.x + i.x]; vec2 b = distribution[wi.y * N.x + wi.x]; vec2 k = uMod * alias(vec2(i), vec2(N)); float k_len = length(k); const float G = 9.81; // If this sample runs for hours on end, the cosines of very large numbers will eventually become unstable. // It is fairly easy to fix this by wrapping uTime, // and quantizing w such that wrapping uTime does not change the result. // See Tessendorf's paper for how to do it. // The sqrt(G * k_len) factor represents how fast ocean waves at different frequencies propagate. float w = sqrt(G * k_len) * uTime; float cw = cos(w); float sw = sin(w); // Complex multiply to rotate our frequency samples. a = cmul(a, vec2(cw, sw)); b = cmul(b, vec2(cw, sw)); b = vec2(b.x, -b.y); // Complex conjugate since we picked a frequency with the opposite direction. vec2 res = a + b; // Sum up forward and backwards travelling waves. heights[i.y * N.x + i.x] = pack2(res); }

While the frequency space is indeed complex, the final heightmaps we’re interested in contain real data. Since the frequency samples generated by generate_heightmap() are complex conjugated, we can use a clever two-for-one FFT scheme which can do complex-to-real FFT at close to 2x improvement in both speed and power.

For the normal map and displacement map however, we need two components, so we do a regular complex-to-complex FFT in this case.

// Init GLFFT fft_height = unique_ptr(new FFT(Nx, Nz, ComplexToReal, Inverse, SSBO, ImageReal, cache, options)); fft_displacement = unique_ptr(new FFT(Nx >> displacement_downsample, Nz >> displacement_downsample, ComplexToComplex, Inverse, SSBO, Image, cache, options)); fft_normal = unique_ptr(new FFT(Nx, Nz, ComplexToComplex, Inverse, SSBO, Image, move(cache), options)); // Generate new FFTs glUseProgram(prog_generate_height.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, distribution_buffer.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, freq_height.get()); // ... // We only need to generate half the frequencies due to C2R transform. glDispatchCompute(Nx / 8 + 1, Nz / 4, 1); glUseProgram(prog_generate_displacement.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, distribution_buffer_displacement.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, freq_displacement.get()); // ... Since displacement is very low-frequency, compute it at a lower resolution. glDispatchCompute((Nx >> displacement_downsample) / 4, (Nz >> displacement_downsample) / 4, 1); glUseProgram(prog_generate_normal.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, distribution_buffer_normal.get()); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, freq_normal.get()); // ... glDispatchCompute(Nx / 4, Nz / 4, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); // Compute the iFFT texture_index ^= 1; fft_height->process(heightmap[texture_index].get(), freq_height.get()); fft_displacement->process(displacementmap[texture_index].get(), freq_displacement.get()); fft_normal->process(normalmap[texture_index].get(), freq_normal.get()); glMemoryBarrier(GL_TEXTURE_FETCH_BARRIER_BIT);

After generating the iFFTs, we generate heightmap normals, compute the Jacobian, and various other things, before mipmapping the textures.

*Using FP16 FFTs for bandwidth savings and performance*

Using a FP16 FFT instead of FP32 FFT works well in this sample and this saves lots of extra bandwidth and computation in the FFT implementation.

*Correctly mipmapping the heightmap*

One important detail with mipmapping the heightmap is that we cannot use a box filter. Instead, we pretend that the first texel lies on uv = (0, 0). This is necessary later when we want to properly render the heightmap. It is also necessary to mipmap our heightmap with compute shaders. The alternative is to compute the mipmap with fragment shaders, but for a tiled architecture such as Mali, having vertex shading (which uses the heightmap) depends on fragment (mip-generation) often creates a bad pipeline stall.

vec2 uv = (2.0 * vec2(gl_GlobalInvocationID.xy) + 0.5) * uInvSize; // A typical box filter would use 1.0 offset here instead of 0.5. mediump vec4 filtered = vec4(0.0); filtered += 0.25 * textureLod(uNormal, uv + vec2(-D, -D) * uInvSize, float(uLod)); filtered += 0.25 * textureLod(uNormal, uv + vec2(+D, -D) * uInvSize, float(uLod)); filtered += 0.25 * textureLod(uNormal, uv + vec2(-D, +D) * uInvSize, float(uLod)); filtered += 0.25 * textureLod(uNormal, uv + vec2(+D, +D) * uInvSize, float(uLod)); imageStore(iNormal, ivec2(gl_GlobalInvocationID.xy), filtered);

In reality, ocean waves do not behave like pure sinusoids, but in a more “choppy” way where peaks of the waves compact a bit, to make the waves sharper.

Using only a straight heightmap, this is not easy to implement, however, we can have another “displacement” map which computes displacement in the horizontal plane as well. If we compute the inverse Fourier transform of the gradient of the heightmap, we can find a horizontal displacement vector which we will push vertices toward. This gives a great choppy look to the waves. By adding choppiness, we can go from:

to:

We implement it in compute shaders by modifiying generate_heightmap() slightly.

void generate_displacement() { // ... // Derivative of exp(j * x) is j * exp(j * x). vec2 grad = cmul(res, vec2(-k.y / (k_len + 0.00001), k.x / (k_len + 0.00001))); grads[i.y * N.x + i.x] = pack2(grad); }

See [1] for mathematical details.

At wave crests, the turbulence tends to cause a more diffuse reflection, increasing overall brightness and “whiteness” of the water. By looking at the gradient of the horizontal displacement map, we can detect where the crests occur and use this to pass a turbulence factor to the fragment shader.

// From bake_height_gradient.comp. mediump float jacobian(mediump vec2 dDdx, mediump vec2 dDdy) { return (1.0 + dDdx.x) * (1.0 + dDdy.y) - dDdx.y * dDdy.x; } #define LAMBDA 1.2 vec2 dDdx = 0.5 * LAMBDA * ( textureLodOffset(uDisplacement, uv.zw, 0.0, ivec2(+1, 0)).xy - textureLodOffset(uDisplacement, uv.zw, 0.0, ivec2(-1, 0)).xy); vec2 dDdy = 0.5 * LAMBDA * ( textureLodOffset(uDisplacement, uv.zw, 0.0, ivec2(0, +1)).xy - textureLodOffset(uDisplacement, uv.zw, 0.0, ivec2(0, -1)).xy); float j = jacobian(dDdx * uScale.z, dDdy * uScale.z); imageStore(iHeightDisplacement, ivec2(gl_GlobalInvocationID.xy), vec4(h, displacement, 0.0)); imageStore(iGradJacobian, ivec2(gl_GlobalInvocationID.xy), vec4(grad, j, 0.0));

When the Jacobian factor is close to 0, the water is very “turbulent” and when larger than 1, the water mesh has been “stretched” out. A Jacobian of 1 is the “normal” state.

We combine the lower-frequency Jacobian with the normal map in the fragment shader to compute a final “turbulence” factor which creates a neat shading effect.

// water.fs vec3 vGradJacobian = texture(uGradJacobian, vGradNormalTex.xy).xyz; vec2 noise_gradient = 0.30 * texture(uNormal, vGradNormalTex.zw).xy; float jacobian = vGradJacobian.z; float turbulence = max(2.0 - jacobian + dot(abs(noise_gradient), vec2(1.2)), 0.0); // This is rather "arbitrary", but looks pretty good in practice. float color_mod = 1.0 + 3.0 * smoothstep(1.2, 1.8, turbulence);

Rendering heightmaps (terrains) efficiently is a big topic on its own. For this sample, we implement two different approaches to rendering continous LOD heightmaps. Both variants have a concept of patches which are adaptively subdivided based on distance to camera.

First we place a large grid of patches in the world. The patches are roughly centered around the camera, but they do not move in lock-step with the camera, rather, they only move in units of whole patches in order to avoid the “vertex swimming” artifact.

With this scheme, we want to subdivide the patches depending to distance to camera. The full grid is quite large, so we cannot have full quality for the entire world without taking a serious performance hit.

We also want a “continous” LOD effect. A difficult problem to solve is avoiding “popping” artifacts and “swimming” artifacts at the same time. Popping happens when the vertex mesh suddenly changes resolution without any kind of transition band. Vertex swimming happens if we move the heightmap (and hence texture sampling coordinates) around without snapping it to some kind of grid.

For adaptively subdividing a patch, Tessellation in OpenGL ES 3.2 [7] can solve this quite neatly.

We implement tessellation by treating our water patch as a GL_PATCH primitive. In the control shader, we compute tessellation factors for our patch.

float lod_factor(vec2 pos) { pos *= uScale.xy; vec3 dist_to_cam = uCamPos - vec3(pos.x, 0.0, pos.y); float level = log2((length(dist_to_cam) + 0.0001) * uDistanceMod); return clamp(level, 0.0, uMaxTessLevel.x); } float tess_level(float lod) { return uMaxTessLevel.y * exp2(-lod); } vec4 tess_level(vec4 lod) { return uMaxTessLevel.y * exp2(-lod); }

For the outer tessellation factors, it is vital that the tessellation factors are exactly the same for patches which share edges, otherwise, we risk cracks in the water mesh, which is never a good sign.

To make this work, we find tessellation factors in the four corners of our patch. We then take the minimum LOD of two corners which belong to an edge, which decides the tessellation factor for that edge.

float l0 = lod_factor(p0 + vec2(0.0, 1.0) * uPatchSize); float l1 = lod_factor(p0 + vec2(0.0, 0.0) * uPatchSize); float l2 = lod_factor(p0 + vec2(1.0, 0.0) * uPatchSize); float l3 = lod_factor(p0 + vec2(1.0, 1.0) * uPatchSize); vec4 lods = vec4(l0, l1, l2, l3); vPatchLods = lods; vec4 outer_lods = min(lods.xyzw, lods.yzwx); vec4 tess_levels = tess_level(outer_lods); float inner_level = max(max(tess_levels.x, tess_levels.y), max(tess_levels.z, tess_levels.w));

In order to avoid processing patches which are outside the camera, we frustum cull in the control shader as well, and set tessellation factors to negative values which discards the patch outright.

In the evaluation shader, we interpolate the corner LODs calculated by the control shader to figure out which LOD we should sample the heightmap with.

patch in vec4 vPatchLods; mediump vec2 lod_factor(vec2 tess_coord) { // Bilinear interpolation. mediump vec2 x = mix(vPatchLods.yx, vPatchLods.zw, tess_coord.x); mediump float level = mix(x.x, x.y, tess_coord.y); mediump float floor_level = floor(level); mediump float fract_level = level - floor_level; return vec2(floor_level, fract_level); }

When sampling the heightmap, we need to take extra care to make sure we avoid the swimming artifact. A simple and effective way to make sure this happens is that when we sample the heightmap, we sample the respective mipmaps between their texels so the bilinear interpolation is smooth and has a continuous first derivative. For this reason, we cannot use tri-linear filtering directly, but we can easily do this ourselves. Ideally, we would like uv = (0, 0) to land exactly on the first texel here since that would also map to first texel in the next miplevel, however, this is not the case with OpenGL ES. For this, reason, we apply the half-texel offsets independently for every level. The speed hit here is negligible, since tri-linear filtering generally does two texture lookups anyways.

mediump vec3 sample_height_displacement(vec2 uv, vec2 off, mediump vec2 lod) { return mix( textureLod(uHeightmapDisplacement, uv + 0.5 * off, lod.x).xyz, textureLod(uHeightmapDisplacement, uv + 1.0 * off, lod.x + 1.0).xyz, lod.y); }

The second thing we need to take care of is using the appropriate subdivision type. Tessellation supports equal_spacing, fractional_even and fractional_odd. We opt for fractional_even here since it will interpolate towards an even number of edges which matches well with the textures we sample from. We want our vertex grid to match with the texel centers on the heightmap as much as possible.

layout(cw, quads, fractional_even_spacing) in;

For devices which do not support tessellation, we implement a heightmap rendering scheme which takes inspiration from tessellation, geomipmaping [4], geomorphing [5] and CDLOD techniques [6], using vertex shaders and instancing only.

Like geomipmapping and geomorphing, the patch size is fixed, and we achieve LOD by subdividing the patch with different pre-made meshes. Like CDLOD, we transition between LODs by “morphing” the vertices so that before switching LOD, we warp the odd vertices towards the even ones so that the fully warped mesh is the same as the lower-detail mesh, which guarantees no popping artifacts. Morphing between LODs and using a fixed size patch is essentially what tessellation does, so this scheme can be seen as a subset of quad patch tessellation.

One of the main advantages of this scheme over CDLOD is its ability to use a different LOD function than pure distance since CDLOD uses a strict quad-tree structure purely based on distance. It is also simpler to implement as there is no tree structure which needs to be traversed. There is also no requirement for the patches to differ by only one LOD level as is the case with CDLOD, so it is possible to adjust the LOD where the heightmap is very detailed, and reduce vertex count for patches which are almost completely flat. This is a trivial “LOD bias” when processing the patches.

The main downsides of this algorithm however is that more patches are needed to be processed individually on CPU since there is no natural quad-tree structure (although it is possible to accelerate it with compute and indirect drawing) and for planetary style terrain with massive differences in zoom (e.g. from outside the atmosphere all the way into individual rocks), having a fixed patch size is not feasible.

To allow arbitrary LOD between patches, we need to know the LOD at all the four edges of the patch and the inner LOD before warping our vertex position. This info is put in a per-instance uniform buffer.

vec2 warp_position() { // aLODWeights is a vertex attribute that contains either (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1) or (0, 0, 0, 0). // It is all zero when our vertices is inside the patch, and contains a 1 if our vertex lies on an edge. // For corners, we don't really care which edge we belong to since the corner vertices // will never be snapped anywhere. // Using a dot product, this lets us "select" an appropriate LOD factor. // For the inner lod, we can conditionally select this efficiently using the boolean mix() operator. float vlod = dot(aLODWeights, patches.data[gl_InstanceID].LODs); vlod = mix(vlod, patches.data[gl_InstanceID].InnerLOD.x, all(equal(aLODWeights, vec4(0.0)))); // aPosition.xy holds integer positions locally in the patch with range [0, patch_size]. // aPosition.zw are either 0 or 1. These select in which direction we will snap our vertices when warping to the lower LOD. // It is important that we always round towards the center of the patch, since snapping to one of the edges can lead to popping artifacts. float floor_lod = floor(vlod); float fract_lod = vlod - floor_lod; uint ufloor_lod = uint(floor_lod); // Snap to grid corresponding to floor(lod) and ceil(lod). uvec2 mask = (uvec2(1u) << uvec2(ufloor_lod, ufloor_lod + 1u)) - 1u; uvec4 rounding = aPosition.zwzw * mask.xxyy; // Either round towards 0 or +inf. vec4 lower_upper_snapped = vec4((aPosition.xyxy + rounding) & ~mask.xxyy); // Then lerp between them to create a smoothly morphing mesh. return mix(lower_upper_snapped.xy, lower_upper_snapped.zw, fract_lod); }

We also want to sample our heightmap using a LOD level that closely matches the patch LOD. For this, we have a tiny GL_R8 LOD texture which has one texel per patch and we can linearly interpolate the LOD factors from there. We update this texture when processing the patches.

mediump vec2 lod_factor(vec2 position) { vec2 uv = (position + patches.data[gl_InstanceID].Offsets.zw) * uLodScaleOffset; mediump float level = textureLod(uLod, uv, 0.0).x * (255.0 / 32.0); // Rescale the UNORM value. mediump float floor_level = floor(level); mediump float fract_level = level - floor_level; return vec2(floor_level, fract_level); }

Building the UBO is also quite simple. To select LODs we look at the four neighbor patches. The edge LOD is the maximum of the two neighbors. We need the maximum since otherwise there would not be enough vertices in one of the patches to correctly stitch together the edge.

for (unsigned z = 0; z < blocks_z; z++) { for (unsigned x = 0; x < blocks_x; x++) { if (!patches[z * blocks_x + x].visible) continue; // Clamp to edge. unsigned px = x ? (x - 1) : 0; unsigned pz = z ? (z - 1) : 0; unsigned nx = min(x + 1, blocks_x - 1); unsigned nz = min(z + 1, blocks_z - 1); float left = lod_buffer[z * blocks_x + px]; float top = lod_buffer[nz * blocks_x + x]; float right = lod_buffer[z * blocks_x + nx]; float bottom = lod_buffer[pz * blocks_x + x]; float center = lod_buffer[z * blocks_x + x]; float left_lod = max(left, center); float top_lod = max(top, center); float right_lod = max(right, center); float bottom_lod = max(bottom, center); int center_lod = int(center); auto &lod = lod_meshes[center_lod]; unsigned ubo_offset = center_lod * blocks_x * blocks_z; ubo_data[ubo_offset + lod.full.instances].Offsets = vec4( patches[z * blocks_x + x].pos + block_offset, // Offset to world space. patches[z * blocks_x + x].pos); ubo_data[ubo_offset + lod.full.instances].LODs = vec4(left_lod, top_lod, right_lod, bottom_lod); ubo_data[ubo_offset + lod.full.instances].InnerLOD = vec4(center); lod.full.instances++; } }

While we could in theory use the LOD factor we generate in warp_position() to sample the heightmap, this would break the corner vertices on the patch. We need to make sure all vertices that are shared between patches compute the exact same vertex position. This is not the case with the vlod factor we compute in warp_position().

From here, we sample our heightmaps, etc as before, very similar to the evaluation shader implementation.

While correctly shading ocean water is a big topic entirely on its own, this sample uses a simple approach of cubemap reflection against a pregenerated skydome. For more realistic specular shading, the fresnel factor is added in. No actual diffuse shading is used, and the only source of lighting is the skydome cube map.

The shader samples two normal (gradient) maps, one low-resolution map generated from the heightmap, and one high-frequency gradient map which is also generated using FFT.

[1] J. Tessendorf – Simulating Ocean Water – http://graphics.ucsd.edu/courses/rendering/2005/jdewall/tessendorf.pdf

[2] GLFFT – Fast Fourier Transform library for OpenGL – https://github.com/Themaister/GLFFT

[3] E. Bainville – OpenCL Fast Fourier Transform – http://www.bealto.com/gpu-fft.html

[4] W. H. de Boer – Fast Terrain Rendering Using Geometrical MipMapping – http://www.flipcode.com/archives/article_geomipmaps.pdf

[5] D. Wagner – Terrain Geomorphing in the Vertex Shader – https://www.ims.tuwien.ac.at/publications/tuw-138077.pdf

[6] F. Strugar – Continous Distance-Dependent Level of Detail for Rendering Heightmaps (CDLOD) – http://www.vertexasylum.com/downloads/cdlod/cdlod_latest.pdf

[7] GL_EXT_tessellation_shader (now OpenGL ES 3.2) – https://www.khronos.org/registry/gles/extensions/EXT/EXT_tessellation_shader.txt

The post Ocean Rendering with Fast Fourier Transform appeared first on Mali Developer Center.

]]>**Contents**

This sample presents the GL_OVR_multiview and GL_OVR_multiview2 extensions and how they can be used to improve performance for virtual reality use cases.

Multiview rendering allows draw calls to render to several layers of an array texture simultaneously. The vertex shader can know what layer it is writing to, so that the rendering results can be different for each layer. This can be very useful for virtual reality applications where rendering the same scene from two different positions is necessary.

bool setupFBO(int width, int height) { // Create array texture GL_CHECK(glGenTextures(1, &frameBufferTextureId)); GL_CHECK(glBindTexture(GL_TEXTURE_2D_ARRAY, frameBufferTextureId)); GL_CHECK(glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_LINEAR)); GL_CHECK(glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_LINEAR)); GL_CHECK(glTexStorage3D(GL_TEXTURE_2D_ARRAY, 1, GL_RGBA8, width, height, 2)); /* Initialize FBO. */ GL_CHECK(glGenFramebuffers(1, &frameBufferObjectId)); /* Bind our framebuffer for rendering. */ GL_CHECK(glBindFramebuffer(GL_DRAW_FRAMEBUFFER, frameBufferObjectId)); /* Attach texture to the framebuffer. */ GL_CHECK(glFramebufferTextureMultiviewOVR(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, frameBufferTextureId, 0, 0, 2)); /* Create array depth texture */ GL_CHECK(glGenTextures(1, &frameBufferDepthTextureId)); GL_CHECK(glBindTexture(GL_TEXTURE_2D_ARRAY, frameBufferDepthTextureId)); GL_CHECK(glTexStorage3D(GL_TEXTURE_2D_ARRAY, 1, GL_DEPTH_COMPONENT24, width, height, 2)); /* Attach depth texture to the framebuffer. */ GL_CHECK(glFramebufferTextureMultiviewOVR(GL_DRAW_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, frameBufferDepthTextureId, 0, 0, 2)); /* Check FBO is OK. */ GLenum result = GL_CHECK(glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)); if (result != GL_FRAMEBUFFER_COMPLETE) { LOGE("Framebuffer incomplete at %s:%i\n", __FILE__, __LINE__); /* Unbind framebuffer. */ GL_CHECK(glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0)); return false; } return true; }

The following code shows shaders used for multiview rendering. Only the vertex shader contains multiview specific code. It enables the GL_OVR_multiview extension and sets the num_views variable in the layout to 2. This number needs to be the same as the number of views attached to the framebuffer using glFramebufferTextureMultiviewOVR. The shader takes in an array of view projection matrices (view and projection matrices multiplied together), instead of just one, and selects the matrix to use by indexing with gl_ViewID_OVR which gives the index of the current texture layer being rendered to. This allows us to have different camera positions and projections for the different layers, making it possible to render from both eye positions in a VR application with one draw call. There is only one model matrix is this case as the model does not move for the different layers. Only the gl_Position is affected by the gl_ViewID_OVR value in this case, meaning this shader only requires the GL_OVR_multiview extension and not the GL_OVR_multiview2 extension. In order to also change the normal based on gl_ViewID_OVR (or other vertex outputs) the GL_OVR_multiview2 would be required.

#version 300 es #extension GL_OVR_multiview : enable layout(num_views = 2) in; in vec3 vertexPosition; in vec3 vertexNormal; uniform mat4 modelViewProjection[2]; uniform mat4 model; out vec3 v_normal; void main() { gl_Position = modelViewProjection[gl_ViewID_OVR] * vec4(vertexPosition, 1.0); v_normal = (model * vec4(vertexNormal, 0.0f)).xyz; } #version 300 es precision mediump float; in vec3 v_normal; out vec4 f_color; vec3 light(vec3 n, vec3 l, vec3 c) { float ndotl = max(dot(n, l), 0.0); return ndotl * c; } void main() { vec3 albedo = vec3(0.95, 0.84, 0.62); vec3 n = normalize(v_normal); f_color.rgb = vec3(0.0); f_color.rgb += light(n, normalize(vec3(1.0)), vec3(1.0)); f_color.rgb += light(n, normalize(vec3(-1.0, -1.0, 0.0)), vec3(0.2, 0.23, 0.35)); f_color.a = 1.0; }

The program can be set up in the same way as any other program. The viewProjection matrices must be set up as a matrix array uniform as in the following code. In this example the projection matrices are the same, but for VR one would normally use different projection matrices for each eye. The example later in this tutorial will render to more than 2 layers, and will use different perspective matrices per layer, which is the reason for there being more than one perspective matrix here. The camera positions in this case are set at -1.5 and 1.5 in the x direction, both looking at the center of the scene.

/* M_PI_2 rad = 90 degrees. */ projectionMatrix[0] = Matrix::matrixPerspective(M_PI_2, (float)width / (float)height, 0.1f, 100.0f); projectionMatrix[1] = Matrix::matrixPerspective(M_PI_2, (float)width / (float)height, 0.1f, 100.0f); /* Setting up model view matrices for each of the */ Vec3f leftCameraPos = {-1.5f, 0.0f, 4.0f}; Vec3f rightCameraPos = {1.5f, 0.0f, 4.0f}; Vec3f lookAt = {0.0f, 0.0f, -4.0f}; Vec3f upVec = {0.0f, 1.0f, 0.0f}; viewMatrix[0] = Matrix::matrixCameraLookAt(leftCameraPos, lookAt, upVec); viewMatrix[1] = Matrix::matrixCameraLookAt(rightCameraPos, lookAt, upVec); modelViewProjectionMatrix[0] = projectionMatrix[0] * viewMatrix[0] * modelMatrix; modelViewProjectionMatrix[1] = projectionMatrix[1] * viewMatrix[1] * modelMatrix; multiviewModelViewProjectionLocation = GL_CHECK(glGetUniformLocation(multiviewProgram, "modelViewProjection")); multiviewModelLocation = GL_CHECK(glGetUniformLocation(multiviewProgram, "model")); /* Upload matrices. */ GL_CHECK(glUniformMatrix4fv(multiviewModelViewProjectionLocation, 2, GL_FALSE, modelViewProjectionMatrix[0].getAsArray())); GL_CHECK(glUniformMatrix4fv(multiviewModelLocation, 1, GL_FALSE, modelMatrix.getAsArray()));

Anything rendered with this program while the multiview framebuffer object is bound will be rendered to both texture layers from different view angles without having to do do multiple draw calls. Having rendered your VR scene to separate layers for each eye, the results now need to be rendered to the screen. This is easily done by binding the texture and rendering with it as a 2D array texture. For a VR application, two viewports can be set up, and for each viewport the relevant texture layer is rendered to the screen. This can be a simple blitting of the texture to the screen, or it can do filtering or other post processing operations on the texture before displaying it. As the texture is an array, the texture sampling operation needs a vec3 texture coordinate, where the last coordinate indexes into the array. In order for each draw call to choose different layers, a uniform with the layer index can be provided as in the following fragment shader.

#version 300 es precision mediump float; precision mediump int; precision mediump sampler2DArray; in vec2 vTexCoord; out vec4 fragColor; uniform sampler2DArray tex; uniform int layerIndex; void main() { fragColor = texture(tex, vec3(vTexCoord, layerIndex)); }

A VR technique that can easily be achieved using multiview rendering is rendering with a higher resolution in the center of each eye’s view, with gradually lower resolution further away from the center. Eyes are capable of observing higher resolutions in the center of their view, and this can therefore give better visual results than rendering the entire scene in one resolution. This can be achieved using the multiview extension by rendering to more than one texture layer per eye with different fields-of-view, and blending the resulting layers. One texture layer can be rendered to using a projection matrix giving a wide field of view, rendering the entire scene. Another texture layer can be rendered to using a narrower field of view, so that it only renders the center of the screen, where the eye will be able to see a higher resolution image. As each layer has the same dimensions, the layer with the narrow field of view will be a much higher resolution version of the center of the scene. These two layers can then be blended together to create an image with varying resolution. This method can also give a performance boost, as the FBO can use half the resolution while still getting the same dpi in the center of the screen. Even though you are rendering 4 layers instead of 2, this still cuts the total number of pixels in half. This technique also combines well with barrel distortion warping, which is a common virtual reality technique for making the virtual reality image look correct through a lens. The barrel distortion warping makes objects closer to the center of the viewport larger than objects in the edges. Combined with a varying resolution, this can give a higher resolution for the enlarged objects and a lower resolution for the objects that are made smaller by the barrel distortion warping.

To implement the varying resolution, we first set up a multiview framebuffer object in the same way as shown before, only with 4 layers instead of 2 as there are 2 layers per eye:

bool setupFBO(int width, int height) { // Create array texture GL_CHECK(glGenTextures(1, &frameBufferTextureId)); GL_CHECK(glBindTexture(GL_TEXTURE_2D_ARRAY, frameBufferTextureId)); GL_CHECK(glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_LINEAR)); GL_CHECK(glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_LINEAR)); GL_CHECK(glTexStorage3D(GL_TEXTURE_2D_ARRAY, 1, GL_RGBA8, width, height, 4)); /* Initialize FBO. */ GL_CHECK(glGenFramebuffers(1, &frameBufferObjectId)); /* Bind our framebuffer for rendering. */ GL_CHECK(glBindFramebuffer(GL_DRAW_FRAMEBUFFER, frameBufferObjectId)); /* Attach texture to the framebuffer. */ GL_CHECK(glFramebufferTextureMultiviewOVR(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, frameBufferTextureId, 0, 0, 4)); /* Create array depth texture */ GL_CHECK(glGenTextures(1, &frameBufferDepthTextureId)); GL_CHECK(glBindTexture(GL_TEXTURE_2D_ARRAY, frameBufferDepthTextureId)); GL_CHECK(glTexStorage3D(GL_TEXTURE_2D_ARRAY, 1, GL_DEPTH_COMPONENT24, width, height, 4)); /* Attach depth texture to the framebuffer. */ GL_CHECK(glFramebufferTextureMultiviewOVR(GL_DRAW_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, frameBufferDepthTextureId, 0, 0, 4)); /* Check FBO is OK. */ GLenum result = GL_CHECK(glCheckFramebufferStatus(GL_DRAW_FRAMEBUFFER)); if (result != GL_FRAMEBUFFER_COMPLETE) { LOGE("Framebuffer incomplete at %s:%i\n", __FILE__, __LINE__); /* Unbind framebuffer. */ GL_CHECK(glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0)); return false; } return true; }

The shaders for rendering the scene are also the same except that the vertex shader specifies num_views to be 4 rather than 2, and takes in arrays of length 4 instead of 2 for the matrices:

#version 300 es #extension GL_OVR_multiview : enable layout(num_views = 4) in; in vec3 vertexPosition; in vec3 vertexNormal; uniform mat4 ModelViewProjection[4]; uniform mat4 model; out vec3 v_normal; void main() { gl_Position = ModelViewProjection[gl_ViewID_OVR] * vec4(vertexPosition, 1.0); v_normal = (model * vec4(vertexNormal, 0.0)).xyz; }

The following code then sets up the projection and view matrices for rendering the scene 4 times. The first two perspective matrices use a 90 degree field of view, rendering the entire scene for each eye. This will be used as the low resolution texture when creating the final image. The next two perspective matrices use a 53.13 degree field of view, as this gives a near plane that is exactly half the size of the 90 degree matrices (tan(53.13/2) * 2 == tan(90/2)). This will be used as the high resolution image. Making the high resolution near plane exactly half the size of the low resolution near plane makes interpolating between the images simpler, as the texture coordinates for the low resolution image can go from 0 to 1, while the texture coordinates for the high resolution image go from -0.5 to 1.5. The high resolution image will then only be sampled in the middle half of the screen, and its contents will match the content of the low resolution texture at the same screen coordinates, only with a higher resolution.

/* M_PI_2 rad = 90 degrees. */ projectionMatrix[0] = Matrix::matrixPerspective(M_PI_2, (float)width / (float)height, 0.1f, 100.0f); projectionMatrix[1] = Matrix::matrixPerspective(M_PI_2, (float)width / (float)height, 0.1f, 100.0f); /* 0.9272952188 rad = 53.1301024 degrees. This angle gives half the size for the near plane. */ projectionMatrix[2] = Matrix::matrixPerspective(0.9272952188f, (float)width / (float)height, 0.1f, 100.0f); projectionMatrix[3] = Matrix::matrixPerspective(0.9272952188f, (float)width / (float)height, 0.1f, 100.0f); /* Setting up model view matrices for each of the */ Vec3f leftCameraPos = {-1.5f, 0.0f, 4.0f}; Vec3f rightCameraPos = {1.5f, 0.0f, 4.0f}; Vec3f lookAt = {0.0f, 0.0f, -4.0f}; Vec3f upVec = {0.0f, 1.0f, 0.0f}; viewMatrix[0] = Matrix::matrixCameraLookAt(leftCameraPos, lookAt, upVec); viewMatrix[1] = Matrix::matrixCameraLookAt(rightCameraPos, lookAt, upVec); viewMatrix[2] = Matrix::matrixCameraLookAt(leftCameraPos, lookAt, upVec); viewMatrix[3] = Matrix::matrixCameraLookAt(rightCameraPos, lookAt, upVec); modelViewProjectionMatrix[0] = projectionMatrix[0] * viewMatrix[0] * modelMatrix; modelViewProjectionMatrix[1] = projectionMatrix[1] * viewMatrix[1] * modelMatrix; modelViewProjectionMatrix[2] = projectionMatrix[2] * viewMatrix[2] * modelMatrix; modelViewProjectionMatrix[3] = projectionMatrix[3] * viewMatrix[3] * modelMatrix; multiviewModelViewProjectionLocation = GL_CHECK(glGetUniformLocation(multiviewProgram, "modelViewProjection")); multiviewModelLocation = GL_CHECK(glGetUniformLocation(multiviewProgram, "model")); /* Upload matrices. */ GL_CHECK(glUniformMatrix4fv(multiviewModelViewProjectionLocation, 4, GL_FALSE, modelViewProjectionMatrix[0].getAsArray())); GL_CHECK(glUniformMatrix4fv(multiviewModelLocation, 1, GL_FALSE, modelMatrix.getAsArray()));

Every draw call using the shader and matrix setup shown above will render to all 4 layers of the framebuffer object using the different view and projection matrices. The resulting images will then be blended to the screen to create a varying resolution image for each eye using the following shaders.

#version 300 es in vec3 attributePosition; in vec2 attributeLowResTexCoord; in vec2 attributeHighResTexCoord; out vec2 vLowResTexCoord; out vec2 vHighResTexCoord; void main() { vLowResTexCoord = attributeLowResTexCoord; vHighResTexCoord = attributeHighResTexCoord; gl_Position = vec4(attributePosition, 1.0); } #version 300 es precision mediump float; precision mediump int; precision mediump sampler2DArray; in vec2 vLowResTexCoord; in vec2 vHighResTexCoord; out vec4 fragColor; uniform sampler2DArray tex; uniform int layerIndex; void main() { vec4 lowResSample = texture(tex, vec3(vLowResTexCoord, layerIndex)); vec4 highResSample = texture(tex, vec3(vHighResTexCoord, layerIndex + 2)); // Using squared distance to middle of screen for interpolating. vec2 distVec = vec2(0.5) - vHighResTexCoord; float squaredDist = dot(distVec, distVec); // Using the high res texture when distance from center is less than 0.5 in texture coordinates (0.25 is 0.5 squared). // When the distance is less than 0.2 (0.04 is 0.2 squared), only the high res texture will be used. float lerpVal = smoothstep(-0.25, -0.04, -squaredDist); fragColor = mix(lowResSample, highResSample, lerpVal); }

A viewport for each eye is created, and for each viewport this shader program is used to draw a full screen textured quad. There are different texture coordinates for the high resolution and low resolution images, as the high resolution image should be drawn at half the size of the low resolution image and centered in the middle of the screen. This is achieved by the following texture coordinates:

/* Textured quad low resolution texture coordinates */ float texturedQuadLowResTexCoordinates[] = { 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1 }; /* Textured quad high resolution texture coordinates */ float texturedQuadHighResTexCoordinates[] = { -0.5, -0.5, 1.5, -0.5, 1.5, 1.5, -0.5, -0.5, 1.5, 1.5, -0.5, 1.5 };

The layerIndex used to select the texture array layer is set up in the same way as in the earlier example, but the shader will sample both the layer at layerIndex and the one at layerIndex + 2, as the last 2 layers in the array texture contain the high resolution images. The sampled colors are then interpolated based on the distance to the middle of the screen. The shader calculates the squared distance to the middle of the screen rather than the actual distance as this removes the need for doing a square root operation. This works as long as the limits used in the smoothstep call are adjusted accordingly, which gives no extra work as these are constant values.

After drawing a full screen quad using this shader for each eye viewport, the high and low resolution images have been blended to give higher resolution in the center of the image where the eye is focused, with the resolution gradually decreasing further away from the center, where the eye is not focusing. The image in the introduction shows the result, where 3 rotating cubes have been drawn from each eye, and the center of each eye’s viewport gets a higher resolution than the rest of the screen. The following image is a lower resolution version of the same scene, making it easier to see how the resolution increases towards the center of each eye’s viewport.

**Low resolution multiview rendering sample.**

This technique can also be used to create a movable focal point, i.e. for directing the viewer’s focus towards a certain part of the scene. To do this, the camera for the high resolution image would have to be moved around to capture different parts of the scene, and the texture coordinates used when blending would have to be adjusted accordingly.

[1] http://www.khronos.org/registry/gles/extensions/OVR/multiview.txt

[2] http://www.khronos.org/registry/gles/extensions/OVR/multiview2.txt

The post Using multiview rendering appeared first on Mali Developer Center.

]]>