Pages: [1] 2 3 4 5 ... 7   Go Down
  Print  
Author Topic: My first deferred setup.  (Read 21944 times)
April 25, 2011, 08:18:39 am
Hi guys, So finally I setup deferred , and can't describe how cool it is. by by forward  Grin Here is the result. So now the hard part started (For me). I hope to get some good advise from expert guys here.
1. Do you prefer to use 64 or 32 bit G-Buffer? With 64 i will have 1 more buffer to use, but also wasting lots of memory. Damn its always a matter of hard choice. 1 extra Buffer mean's alot in deferred.
So here is the ideal setup in term's of memory usage.

G-Buffer:
1) Depth                       = R32F
2) Normal.xy                   = G16R16F
3) Albedo,etc                  = A8R8G8B8
4) Material Specular, ID, etc  = A8R8G8B8

But in my current setup all buffer's set to A16B16G16R16F
Perhaps wasting lots of memory but with 1 more buffer to use.
Anyway if anyone know a better setup pleaes let us know.

My current problem is how to port many light's (100 max) in runtime.
I read several post ,but yet can't get it done. For now what i'am seeking for is an array of matrix to use it for light info, Or any other alternative method.

Is it possibel to define it like this?

Code:
float4x4 Matrix_1 : CHANNELMATRIX0;
float4x4 Matrix_2 : CHANNELMATRIX1;
float4x4 Matrix_3 : CHANNELMATRIX2;
float4x4 Matrix_4 : CHANNELMATRIX3;
float4x4 Matrix_5 : CHANNELMATRIX4;
float4x4 Matrix_6 : CHANNELMATRIX5;
float4x4 Matrix_7 : CHANNELMATRIX6;
float4x4 Matrix_8 : CHANNELMATRIX7;
float4x4 Matrix_9 : CHANNELMATRIX8;
float4x4 Matrix_10 : CHANNELMATRIX8;

static float4x4 Light_Input[10] =
{ Matrix_1,
        Matrix_2,
        Matrix_3,
        Matrix_4,
        Matrix_5,
        Matrix_6,
        Matrix_7,
        Matrix_8,
        Matrix_9,
        Matrix_10,
};


// function
float3 Diffuse_Ligthing()
{
float Dot = 0;

for(int i=0; i<10; i++)
{
float3 Light_Pos = Light_Input[i] ;

}

return Dot;
}

Thanks in advance.


* deffered-01.jpg (162.78 KB, 1845x1015 - viewed 455 times.)

* deffered-02.jpg (207.6 KB, 1845x1015 - viewed 451 times.)

* deffered-03.jpg (123.87 KB, 1845x1015 - viewed 415 times.)

* deffered-04.jpg (198.64 KB, 1845x1015 - viewed 443 times.)

* deffered-05.jpg (95.97 KB, 1845x1015 - viewed 403 times.)

* deffered-06.jpg (129.76 KB, 1845x1015 - viewed 425 times.)

ali-rahimi.net
April 25, 2011, 11:11:17 am
G-Buffer:
1) Depth                       = R32F
2) Normal.xy                   = G16R16F
3) Albedo,etc                  = A8R8G8B8
4) Material Specular, ID, etc  = A8R8G8B8
That is the combination I went for, but it's a tight fit. It would really be great to be able to add one more A8R8G8B8 to that for extra material settings. Oh, and another thing, normal.xy. I assume you intend to store the normal in camera space based on the much quoted statement that then you'll only need two coordinates. Well, that's incorrect. In camera space the normal can still have a negative z if you use a perspective camera.
Quote
My current problem is how to port many light's (100 max) in runtime.
I read several post ,but yet can't get it done. For now what i'am seeking for is an array of matrix to use it for light info, Or any other alternative method.
In a deferred setup a light basically becomes geometry. The fastest way to render instanced low poly geometry (low poly being less than 500) is by using hardware instancing in a nature painter.
April 25, 2011, 11:52:18 am
I prefer light pre-pass instead of full deffered.
It requires only 1 or 2 RTTs for g-buffer and is more flexible when you need to make many different materials.

Anyway.
You can encode normals into sperical coordinates and decode them later:
half2 encode(half3 n)
{
    half p = sqrt(n.z*8+8);
    return half2(n.xy/p + 0.5);
}

half3 decode(half2 enc)
{
    half2 fenc = enc*4-2;
    half f = dot(fenc,fenc);
    half g = sqrt(1-f/4);
    half3 n;
    n.xy = fenc*g;
    n.z = 1-f/2;
    return n;
}


For lights I was using sphere primitives, but you can use even simple quads, it just need to cover light volume in screen space. Instance them with nature painter or use global shader with for loop under it.

Upgrade to PSSM is available for SSAO customers. Check http://www.3dvrm.com/shadows_solution/ for details.
April 25, 2011, 03:13:42 pm
This is a bit offtopic, but does the half type even work in these days? I mean, it works, but as far as I know it's just treated as a float. Even worse, double is also treated as a float.

Anyway, the code stays the same, but I just have a habit of only using floats. Only as a loop counter I'll use an int. For the rest, boolean, half, double, I pretend I've never heard of those.
April 25, 2011, 07:41:43 pm
AFAIK under PS3 everything is a float, even bools.

In order to use Matrix arrays I ended up making my own shader channel which contains ID3DXEffect.
It was pretty simple to do. Then I could use SetRawData which is the fastest way to set constants in the shader.
April 25, 2011, 09:14:47 pm
Great. I see many experts are here. Thanks guys.

Quote
Oh, and another thing, normal.xy. I assume you intend to store the normal in camera space based on the much quoted statement that then you'll only need two coordinates. Well, that's incorrect. In camera space the normal can still have a negative z if you use a perspective camera.

Yes the famous formula Z = sqrt( 1 X*X Y*Y ). I read some post about it to say the same thing. But also it seam some AAA games using it. It have some error but it's not that much noticeable. However i with my mate failed to use this method. Therefore we use encode normals solution. And it goes fine. But now everything is in view space which my mate sed its better for deferred.

Quote
In a deferred setup a light basically becomes geometry. The fastest way to render instanced low poly geometry (low poly being less than 500) is by using hardware instancing in a nature painter.

Ruslan send his deferred setup with nature painter.(Tanks a lot to him for his sharing) But it seam's a bit slow. I dont know if it's because of the nature painter or something else.

Quote
I prefer light pre-pass instead of full deffered.

Yes i hear about it too. Maybe try it later on. If we could.

Now this is my setup  Grin
G-Buffer:
1) Depth                       = R32F
2) Normal.xy                   = G16R16F
3) Albedo,etc                  = A8R8G8B8
4) Material Specular, ID, etc  = A8R8G8B8

And the code. The original code was from Viktor, But we change it alot. So i hope nobody gave mad at me.

Code:
//-------------------------------------------------------------------------------
// Tranforms
//-------------------------------------------------------------------------------
float4x4 VW :View;
float4x4 WVP : WorldViewProjection;
float4x4 Projection : CHANNELMATRIX0;


float FarClip : CHANNELVALUE0;


float2 PIXEL_SIZE : CHANNELVECTOR0;
float3 SCREEN_SIZE : CHANNELVECTOR1;
float3 TL_corner_position : CHANNELVECTOR2;
float3 TR_corner_position : CHANNELVECTOR3;
float3 BR_corner_position : CHANNELVECTOR4;
float3 BL_corner_position : CHANNELVECTOR5;
//float3 camera_position : CHANNELVECTOR6;
float2 SM_SIZE : CHANNELVECTOR7;
float3 Sun_Position : CHANNELVECTOR8;



float3 Sun_Vector : CHANNELVECTOR13;
static float3 corners[4] =
{ BL_corner_position,
BR_corner_position,
TL_corner_position,
TR_corner_position, };


float3 SphereMapDecode2( float2 enc )
{
    float4 nn = float4(enc, 0, 0)*float4(2, 2, 0, 0) + float4(-1, -1, 1, -1);
    float l = dot(nn.xyz, -nn.xyw);
    nn.z = -l;
    nn.xy *= sqrt(l);
    return nn.xyz*2 + float3(0, 0, 1);
}

float3 CalculateViewPos( float2 vCoord, float fDepth, float4x4 matInvProj )
{
float4 vPosProj = float4( vCoord.x * 2 - 1,
(1 - vCoord.y)*2 - 1,
fDepth, 1.0f );
float4 vPosView = mul( vPosProj, matInvProj );
return vPosView.xyz / vPosView.w;
}

//-------------------------------------------------------------------------------
// Textures
//-------------------------------------------------------------------------------
texture MRT0_Depth : TEXTURE0;
sampler2D MRT0_Depth_Sampler = sampler_state
{ Texture = <MRT0_Depth>;
MinFilter = POINT;
MagFilter = POINT;
AddressU = CLAMP;
AddressV = CLAMP;};

texture MRT1_Normal : TEXTURE1;
sampler2D MRT1_Normal_Sampler = sampler_state
{ Texture = <MRT1_Normal>;
MinFilter = POINT;
MagFilter = POINT;
AddressU = CLAMP;
AddressV = CLAMP;};

texture MRT2_Albedo_Spec : TEXTURE2;
sampler2D MRT2_Albedo_Spec_Sampler = sampler_state
{ Texture = <MRT2_Albedo_Spec>;
MinFilter = POINT;
MagFilter = POINT;
AddressU = CLAMP;
AddressV = CLAMP;};

texture MRT3_Proxy : TEXTURE3;
sampler2D MRT3_Proxy_Sampler = sampler_state
{ Texture = <MRT3_Proxy>;
MinFilter = POINT;
MagFilter = POINT;
AddressU = CLAMP;
AddressV = CLAMP;};



//-------------------------------------------------------------------------------
// Structs
//-------------------------------------------------------------------------------
struct VS_INPUT
{
             float4 Pos             : POSITION;
float2 UV              : TEXCOORD0;
};
struct PS_INPUT
{
        float4 Pos              : POSITION;
float3 frustumRay       : TEXCOORD0;
};

//-------------------------------------------------------------------------------
// Vertex Shader
//-------------------------------------------------------------------------------
PS_INPUT MAIN_VS(VS_INPUT IN) {
PS_INPUT OUT;
OUT.Pos = mul(IN.Pos, WVP);
float index = IN.UV.x;
OUT.frustumRay = corners[index];

return OUT;
}

//-------------------------------------------------------------------------------
// Pixel Shader Deffered_Final_PP
//-------------------------------------------------------------------------------
float4 MAIN_PS(PS_INPUT IN, float2 screenPos : VPOS) : COLOR
{
//get screen space position of pixel
float2 coord = screenPos * PIXEL_SIZE;
  coord += PIXEL_SIZE * 0.5;

//Init texture's
float MRT0_Depth        = tex2D(MRT0_Depth_Sampler, coord).r;
float2 MRT1_Normal       = tex2D(MRT1_Normal_Sampler, coord).rg;
float4 MRT2_Albedo_Spec  = tex2D(MRT2_Albedo_Spec_Sampler, coord);
float4 MRT3_Proxy        = tex2D(MRT3_Proxy_Sampler, coord);

float Depth = MRT0_Depth / FarClip;



float3 WorldPos = Depth * IN.frustumRay;
// float4 ViewPos = CalculateViewPos(coord, MRT0_Depth, InvProj);

const float3 Sun = Sun_Position;
float3 Light_Vector = mul(float4(Sun, 0), VW).xyz;
Light_Vector = normalize(Light_Vector);
float3 Normal = SphereMapDecode2(MRT1_Normal);
float  Lambert = max(0, dot(Light_Vector, Normal));
return float(Lambert);

}

//-------------------------------------------------------------------------------
// Technique
//-------------------------------------------------------------------------------
technique Deffered_Final_PP {
pass P0 {
ZEnable = FALSE;
ZWriteEnable = FALSE;
StencilEnable = TRUE;
stencilfunc = EQUAL;
StencilPass = KEEP;
StencilFail = KEEP;
StencilZFail = KEEP;
StencilRef = 1;
VertexShader = compile vs_3_0 MAIN_VS();
PixelShader = compile ps_3_0 MAIN_PS();
}
}



* deffered-07.jpg (153.58 KB, 1685x1013 - viewed 406 times.)

ali-rahimi.net
April 25, 2011, 09:59:16 pm
AFAIK under PS3 everything is a float, even bools.

In order to use Matrix arrays I ended up making my own shader channel which contains ID3DXEffect.
It was pretty simple to do. Then I could use SetRawData which is the fastest way to set constants in the shader.
More modern hardware does have native int support for loop counters as far as I know. The rest indeed is emulated. I usually emulate booleans in a float myself. The real problem lies with the double type. It does not get emulated, it simply loses bits and becomes a float. The latest generation is bragging about the double support, but it's kind of sad you can only use it using a prefix on your variable name. (Which of course is different between Ati and Nvidia.)
Great. I see many experts are here. Thanks guys.

Yes the famous formula Z = sqrt( 1 X*X Y*Y ). I read some post about it to say the same thing. But also it seam some AAA games using it. It have some error but it's not that much noticeable. However i with my mate failed to use this method. Therefore we use encode normals solution. And it goes fine. But now everything is in view space which my mate sed its better for deferred.
I found the error too big. That of course also depends on your shading system. In our case it caused the ground to change when looking up and down.

For inputs that are reused I use globals. And I just use CPU side code to do calculations, since they are faster than an expression channel and reduce the amount of inputs. So, for example:
Code:
float2 SCREEN_SIZE : GLOBAL_SCREEN_SIZE;

static float2 PIXEL_SIZE = 1.f / SCREEN_SIZE;
April 26, 2011, 07:31:12 am
And I just use CPU side code to do calculations, since they are faster than an expression channel and reduce the amount of inputs.
CPU side code? Where is it? something different than expression channel?

ali-rahimi.net
April 26, 2011, 08:40:34 am
CPU side code is basically code in the HLSL that is outside the pixel or vertex shaders. It will be executed on the CPU once the shader is used. Like the second line I placed above to derive the pixel size from the screen size. The main thing to do is to define the variables as static.
April 26, 2011, 09:14:06 am
Oh. i see, static float2 PIXEL_SIZE. So you prefer that rather than expression channel? But i thought expression channel is also cpu based.It's not?
Another question is about nature paint method. When i add 40 light it mean 40 time loop in a shader? If it's true then i dont think it could be a good solution. Am i right?

ali-rahimi.net
April 26, 2011, 10:10:30 am
The expression channel is of course also CPU based, but it's based on an interpreter I guess, while the other code is compiled. It's a similar difference between java and javascript. (A few years ago, these days javascript also gets compiled.)

If you nature paint, you don't need to build the loop yourself. Nature paint does that for you. If you use hardware instancing the code does get compiled as if looped. (Basically sending 40 times the geometry in a single pass along with 40 world matrices and indices with the vertices.)

I prefer light pre-pass instead of full deffered.
It's indeed not bad. And it reduces the amount of different shaders needed, because the normal/depth output is separated from the diffuse output.
April 26, 2011, 04:41:21 pm
Thanks Jos for your time.
About the nature paint, in Ruslan example he connect deferred pp directly to the nature paint channel and everything works fine but dose it mean the hole hlsl calculate 40 time? So in case with a complex shader would it be better to only calculate the local light's and then output the result to another rtt and do the rest of calculation with another single hlsl channel? What is your suggestion. Dose it even worth to to that. Besid it only limited to 40 light. Did you use the same method?
And about the pre-pass method. Dose it have anythings do do with pass or special command? As far as i understand it must be a sort of loop calculation. G-buffer pass data to deferred pp then deferred pp calculate the lighting and again pass in to the G-Buffer. Right?

ali-rahimi.net
April 26, 2011, 06:31:55 pm
In deffered render you put into G-buffer all the info that you need for shading (albedo, specular mask, specular power, reflection term and etc.) and basically render geometry ones/
In light pre-pass you put into g-buffer only information that is needed to calculate lighting terms (depth and normals), then you calculate diffuse and specular terms for all lights, save it into another RTT and render geometry ones more reading lighting information from that RTT.

Upgrade to PSSM is available for SSAO customers. Check http://www.3dvrm.com/shadows_solution/ for details.
April 26, 2011, 07:13:48 pm
so rendering geometry 2 time's. huum. since for transparent object's we must do a separate forward render therefor it might be the only solution for complex scenes with lots of material. However with material id and if its possible to setup many different shading model but it seam using many if command is not very fast. In dx 10,11 these things can be done with better performance. I hope act3d have some plan for that, or we might end up with some problem's for the next 2,3 year's. dx9 era reaching its end when it comes to things like that.

ali-rahimi.net
April 26, 2011, 08:39:45 pm
As far as I know the problems of the if don't depend on the DirectX version, it depends on the card. Still, DirectX 10+ would be nice. More and more calls are full screen passes and a compute shader performs best on that.

In terms of data transfer the light prepass seems to be better than complete deferred, but you do have to pass your geometry twice.
Pages: [1] 2 3 4 5 ... 7   Go Down
  Print  
 
Jump to: