Pages: [1] 2 3 4 5   Go Down
  Print  
Author Topic: Advance Deferred setup  (Read 15017 times)
November 13, 2011, 02:22:29 am
I take a look at all the possible collision methods in Q3D. But yet need some advice.

I need this collision setup for my tilled based deferred. Its all about screen space and 2D  collision. I must divide screen to 4*5 = 25 cell. Then i need to get all lights volume ( In this case 100 light with quad or triangle volume) collision detection with each one of these 2d cell's. So if i have 100 box (as local lights volume), and 25 cell, i will end up with a bit huge amounts of collision operation 25*100 = 2500. In directx 10,11 it can be done with compute shader all in GPU (bf3) But for DX9 perhaps its impossible. So at the end i need to get each collided light ID per each cell and total number of collided lights for each cell. Just wondering what is the best method for a scenario like this.

ali-rahimi.net
November 13, 2011, 02:56:45 pm
If you transform the lights' 3d pos to 2d screen coords (Multiply by view proj matrix) it becomes trivial. You then use x and y ranges to determine which cells (if any) the lights are in. It is exactly the same as when hit-testing a gui item, ie

for each cell

( (light_x >= cell_min_x) && (light_x < cell_max_x) && (light_y >= cell_min_y) && (light_y < cell_max_y) ) ? 1 : 0



Edit: Corrected inv view to view. We want to project  to cam space not de-project to 3d space!
November 13, 2011, 03:28:05 pm
Thanks Tom. Actually i was setup such things few years ago for my GUI. Perhaps it should work in Expression and hlsl both. But i am afraid about its performance. Specially with Expression (CPU). Any tip on performance issue? Problem is each cell is a separate unit. And some lights might be available is several cell at the same time. So its a bit heavy calculation. Each cell must taking care of each one of those 100 light's (Quad) separately.

ali-rahimi.net
November 13, 2011, 03:32:22 pm
Btw. For those who might be interested on this subject.

http://bps10.idav.ucdavis.edu/talks/12-lauritzen_DeferredShading_BPS_SIGGRAPH2010.pdf

Edit: I think lua could do a better job in this situation. Badly my lua script skill is not that much good.  Wink

ali-rahimi.net
November 14, 2011, 10:43:43 am
It seems to me that offloading to the GPU is better for performance than trying to run expression channels. The tiled technique does remind me of light indexing. By constraining the indexes over a tile you might get some speed up, because of the "tile" based approach of GPU's.

The approach of Tom sounds good. Taking actual collision into account is probably slower. Note that the whole point is to run custom shaders for multiple lights at once. without that, there is no gain.

Now to remove the expressions you might consider using a render target with reduced size and render the light area to it. Then it really comes down to light indexing into tiles. For each tile marked you then do the actual lighting for 8x8 or 16x16 pixels. The parallel nature of the GPU is better used by forcing such groups of pixels to be rendered the same way.
November 14, 2011, 04:05:07 pm
I take a look at this Light Indexed Deferred Lighting paper. Sounds very interesting. Need to think more about that.

* LightIndexedDeferredLighting1.1.pdf (297.67 KB - downloaded 448 times.)

ali-rahimi.net
November 14, 2011, 05:34:31 pm
These are interesting ideas Smiley

I thought I'd add that the pseudocode I posted was for determining the 2d tile that the light centre is in. If you want to use the light volume you can either (for spotlights) transform the corners of the light bounding box and check all of them, or (for pointlights) check whether the light centre +- screen-space radius is within each tile.
November 14, 2011, 06:40:27 pm
Yes. i need light volume, Not just its center. But don't you think it might be better to setup such a huge calculation with lua?

I read this Indexed Deferred Lighting paper several times. Its very different than my current  pipeline. My pipeline is full deferred.
Here is the list of the games which use full deferred and pre pass. So as you can see using full deferred is not that much bad  Wink
http://en.wikipedia.org/wiki/Deferred_shading#Deferred_lighting_in_commercial_games

ali-rahimi.net
November 14, 2011, 09:34:36 pm
It is less demanding than my system that calculates frustum culling. When you evaluate the costs you have consider the benefits. If it costs 1ms but saves 10ms then it is a win, if it costs 10ms and saves 1ms it is a loss Wink Of course frustum culling gives a huge speed-up so it is well worth it, but your light culling and tiled render might be more marginal. You have to try it to see Smiley

I wouldn't do it in lua as you will have to write the matrix transformation functions in lua. I'd do it the quick and easy way in channels, and test if it gives a speed-up. Then you can consider if you want to optimize it.

Using the method I imediately thought of, for 150 point lights you will need to do 150 matrix multiplies to get the centre vector, maybe a simple formula can give you the point light radius depending on didtance of the light from the camera. You can then discard lights that are not in the view frustum (coords outside -1,1 range) then divide the x and y positions of the extents of the light circle by the x and y cell dimensions and then round the results down to give you the x and y cell ids. That is not much for a modern CPU.
November 15, 2011, 12:09:44 pm
Calculating the span of the lights is just the beginning. After that you'll have to mark them in the 2D array. Then based on the marked lights you have to select the shader and shader inputs for each tile. This marking in a 2D array seems very familiar. It's almost like rendering Wink

I can really imagine that a tile based light indexing approach is a good one. First of all because your light index texture can be a lot smaller than the viewport size. This saves space and speeds up the marking and selecting process. Secondly it improves the pixel batch size. If you render a single pixel with some shader you will actually run the shader at least 4 times for a 2x2 block. The true power of a GPU is of course its parallel processing, so running at a larger block will probably not be much slower.

So, lets take 8x8 tiles. You first perform the light indexing/marking process on a target that is 8x8 smaller than the viewport. Then you can run fullscreen passes for 4 lights, 3 lights, 2 lights and 1 light. The 4 light pass marks the stencil buffer, so passes with lower light counts can skip the filled in parts quickly. If less than 4 lights are influencing the tile, you simply discard the pixel high up in your code. Because discarding and rendering is performed in tiles, the GPU should perform very well on this.

Since the light index texture is relatively small, you can add a second one for 8 lights. (Normally a R8G8B8A8 texture is used for up to 4 light influences out of 255 possible lights.) I really like this idea.

The light effect is often faded out at larger distance. Specially if the light only influences specular reflections and the diffuse light is already baked. This means that once a light region becomes smaller than a tile, it will probably be discarded anyway. So you're not applying it to a 8x8 tile, while it only covers a few pixels.
November 15, 2011, 08:10:35 pm
I setup a simple example. Only Light 01 is collide with Cell 07. But i have several issue.

1. I can not transform 3d light correctly to camera space.
2. Tom's round solution is the key. But i have no idea how to setup it for now.

I hope to have a good result at the end with your help's.


* tiled deferred.jpg (102.69 KB, 1651x1015 - viewed 382 times.)
* Tiled Deferred 01.rar (277.8 KB - downloaded 188 times.)

ali-rahimi.net
November 16, 2011, 10:50:45 am
Quote
I can not transform 3d light correctly to camera space.

Here is an example, the red square shows the 2d screen pos of the centre of the sphere.

* Screen Coords.cgr (676.16 KB - downloaded 199 times.)
November 16, 2011, 11:37:33 am
Thanks a lot Tom. Working on it Cheesy I know how Round work (Setup it before for my clip plan landscape). But i think with Round we can get only 1 active cell ID. But light volume is share between different cell's. May we use a loop? What about point light circle formula instead of a quad?

Edit: I fix a formula for quad collision. Now its fully functional.

((Light X Pos Center+Half Size) >= Cell Min X && (Light X Pos Center-Half Size)<=Cell Max X && (Light Y Pos Center+Half Size) >= Cell Min Y && (Light Y Pos Center-Half Size)<=Cell Max Y) ? 1 : 0

ali-rahimi.net
November 16, 2011, 04:20:29 pm
Here is an example that tests 64*64 pixel tiles against 300 point lights.

Edit, fixed a bug.

* Tiled Lighting Calculation.cgr (120.33 KB - downloaded 196 times.)
November 16, 2011, 04:34:36 pm
Here is an example that tests 64*64 pixel tiles against 300 point lights.

Edit, fixed a bug.
It does suffer from a quad that is too small when you get closer to the sphere. The radius of the sphere on the screen can be larger than the cross section of the sphere. See attached image.


* Sphere_to_quad_error_01.png (18.24 KB, 527x527 - viewed 857 times.)
Pages: [1] 2 3 4 5   Go Down
  Print  
 
Jump to: