Author Topic: Sonic Z-Treme  (Read 24319 times)

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #150 on: February 22, 2018, 03:27:16 am »
I tested the demo on real hardware : 20 to 60 fps, which is great considering it's not optimized.
Here is a video with 2 more Quake maps :
https://youtu.be/2B48PMqd1zg

corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #151 on: February 24, 2018, 05:37:03 pm »
I tested the demo on real hardware : 20 to 60 fps, which is great considering it's not optimized.
Here is a video with 2 more Quake maps :
https://youtu.be/2B48PMqd1zg

Guau!!! Really nice the new texture LOD in this version and a REAL Hardware!! :D

You think that whit real-time source colour lighting, like Exhumed/DN3D/Quake from Slave engine. Your engine could stay 30FPS for second??

Greetings! :)
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #152 on: February 24, 2018, 05:44:11 pm »
Probably not, but right now it wouldn't work well for sure, both on the CPU and the VDP1.

Anyway RAM is so tight that it wouldn't fit since I would need to keep space for the light vectors, so I will stick with static lights.
I don't plan to add dynamic lights.

On real hardware, the framerate goes between 20 and 30 FPS (or 60 fps sometimes if left uncapped). The video seen here is on emulator, which stays at 30 fps (or 60 if I leave the framerate uncapped).
There is lot of overdraw (sprites written over other sprites), if I find a solution for that the framerate should be much better.

The CD loading functions are really terribly slow (2 minutes to load a 1,3 MB map!!!) and prone to failure, so I'm not showing real hardware footage until I can bring that loading time down to something better (15 seconds or so).

corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #153 on: February 24, 2018, 10:33:26 pm »
Probably not, but right now it wouldn't work well for sure, both on the CPU and the VDP1.

Anyway RAM is so tight that it wouldn't fit since I would need to keep space for the light vectors, so I will stick with static lights.
I don't plan to add dynamic lights.

On real hardware, the framerate goes between 20 and 30 FPS (or 60 fps sometimes if left uncapped). The video seen here is on emulator, which stays at 30 fps (or 60 if I leave the framerate uncapped).
There is lot of overdraw (sprites written over other sprites), if I find a solution for that the framerate should be much better.

The CD loading functions are really terribly slow (2 minutes to load a 1,3 MB map!!!) and prone to failure, so I'm not showing real hardware footage until I can bring that loading time down to something better (15 seconds or so).

Okey. Is possible that you need use something whit SCU to calculate lights maybe, to push up the FPS whit this feature on.

SS have 2MB of RAM and 1MB of VRAM(VDP1+VDP2) to maps like Quake that have a size of 1MB aprox whit color vertex data inside. And now you are use Gouraud in all sprites in screen for Depth Cueing. I mean, right now the vertex are a color data for this effect.

For overdraw. The only way is a good culling function like you said. Whit the portal o PVS like you want, and whit the actual LOD + mip mapping. Are the right way. I think. All this whit a simple and fast code. Ideally, in very complex part whit a ASM to win speed in more complex areas.

Load data from CD at this speed is very rare... SS can read at 2x 360kb/s and have cache system and SH-1 to help it... is rare. Ideally 1.300Kb could be read in 3,5 o 4 sg. Right now 120sg is to long time... But, you load the map and all the rest data: sound, engine code... etc... I possible you load more things. I don’t know what happens...
« Last Edit: February 24, 2018, 10:35:38 pm by corvusd »
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #154 on: February 25, 2018, 06:17:03 am »
For the SCU : like I mentionned before, SGL doesn't work with that. You just cannot use it for lights since it's done during the polygon processing and you can't intercept it unless you do some really complicated tricks, so forget it, it won't happen. Plus it wouldn't help with the framerate as the main framerate bottleneck is the VDP1 more than the CPUs. I might use the SCU for some more advanced physics down the line, but I'm not holding my breath.

For RAM, of course I know! Else how would I have been able to load my maps (13 000+ quads, 30 000+ vertices), fill the VDP1 RAM to the limit with my generated textures (500+ different sprites with 4 bpp CLUT)?
Plus you can't compare my engine with the Slavedriver engine : SGL takes MINIMUM 32 bytes per quad, EXCLUDING the vertices (12 bytes each). The Slavedriver engine takes 5 bytes per quad and 8 bytes per vertices (x, y, z and the vertex color data) plus some per-plane data.
You are comparing 2 very different things. SGL is super fast, but it's wasting lot of memory and I'm stuck with that.

For realtime lightning : each realtime lightning polygon requires 12 bytes of extra vector data. At some 13 000 quads per map, we are talking about 156 KB. Yes the Saturn has 2 MB of Work-RAM, but you need to store the code, the tables, the entities (Sonic, ennemies, etc.) and other stuff. My maps are totally filling the low-work RAM. I want to keep high-work RAM for the PVS and I need to find a way to store PCM in audio RAM (I'm not even sure if it's possible with the Sega audio driver as I never saw any examples).
Btw, I'm not using gouraud on everything : only on the LOD model to reduce both CPU and VDP1 usage. Since the LOD quads are large, it doesn't look fully smooth, but realtime lightning would look terrible because of that (like one large quad getting lit all at once). Palette swaps or per-texture lightning (like in Wipeout) might be a better choice for static light, but then the issue is VDP1 RAM. The depth gouraud is only taking 64 bytes, so it's really not much.

For the PVS or Portal, the problem is the implementation, it's not easy.

For the CD, obviously I know exactly what I'm loading as everything is tightly packed in memory, so I know I'm not loading more than 1,3 or 1,4 MB : I load the textures first to LWRAM and DMA them to the VDP1 RAM, then the actual map data that I all keep in the low-work RAM. The problem is that the current jo_fs_read_next_bytes function fetches 1 sector (2048 bytes) at a time, so it's not making good use of the SH1 512 KB buffer since it keeps stopping and transfering the data. It could fetch 10 sectors at a time without stopping. The async load function, as-is, can't be used for what I do, so I'll have to see what Johannes has in mind or continue writting my own CD function using SBL.

So my available memory is : VDP2 RAM, audio RAM and SH-1 RAM. The rest is fully used or will be soon.

corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #155 on: February 25, 2018, 09:07:08 pm »
For the SCU : like I mentionned before, SGL doesn't work with that. You just cannot use it for lights since it's done during the polygon processing and you can't intercept it unless you do some really complicated tricks, so forget it, it won't happen. Plus it wouldn't help with the framerate as the main framerate bottleneck is the VDP1 more than the CPUs. I might use the SCU for some more advanced physics down the line, but I'm not holding my breath.

Okey. I know that win a lot of possible in the pre-process side(Transform matrix, lighting, physic. IA…) not help to side more to VDP1?? But the real bottleneck not are totally documented. But are games whit less of 400 quads very low FPS and games whit a lot of quads more than 1200 whit a 30FPS o more. And whit a lot of VDP1 effects… I had the sensation of the knowledge of the real limit of VDP1 is unknown. But this, I choose how possible solution use SCU o more ASM code for complex functions in pre-process side to win a lot of time process cue to VDP1. But I don't want press you. I know that your skills and resources are limit. And your results are BIG. Thank you.

For RAM, of course I know! Else how would I have been able to load my maps (13 000+ quads, 30 000+ vertices), fill the VDP1 RAM to the limit with my generated textures (500+ different sprites with 4 bpp CLUT)?
Plus you can't compare my engine with the Slavedriver engine : SGL takes MINIMUM 32 bytes per quad, EXCLUDING the vertices (12 bytes each). The Slavedriver engine takes 5 bytes per quad and 8 bytes per vertices (x, y, z and the vertex color data) plus some per-plane data.
You are comparing 2 very different things. SGL is super fast, but it's wasting lot of memory and I'm stuck with that.

32 bytes SGL vs 8 bytes “Slave” for quad is a lot of difference… Is for have more possible spec ready to use?
But for lighting the requirement data is the vertex data. And the difference here is less 12 bytes SGL VS 8 bytes “slave” I can't know why...
Anyway the current use of texture data en “SlaveEngine” is 15bit RGB in Exhumed and DN3D. For Quake 4bpp color LUT like your engine, whit a similar load of data, except the LOD data. I know that are use more RAM in both sides. One for SGL data mesh and vertex. And VRAM for LOT and Mip mapping. Is possible for Mip mapping locate the data in VDP2 VRAM? I mean use color bank for these textures. I saw how use Gouraud or Half-transparent in color bank elements: Polygon, line, sprites… not Sprite distorted… but maybe possible. If not, well. For Depth Cueing or lighting you can use “per texture texture lighting”

For realtime lightning : each realtime lightning polygon requires 12 bytes of extra vector data. At some 13 000 quads per map, we are talking about 156 KB. Yes the Saturn has 2 MB of Work-RAM, but you need to store the code, the tables, the entities (Sonic, ennemies, etc.) and other stuff. My maps are totally filling the low-work RAM. I want to keep high-work RAM for the PVS and I need to find a way to store PCM in audio RAM (I'm not even sure if it's possible with the Sega audio driver as I never saw any examples).
Btw, I'm not using gouraud on everything : only on the LOD model to reduce both CPU and VDP1 usage. Since the LOD quads are large, it doesn't look fully smooth, but realtime lightning would look terrible because of that (like one large quad getting lit all at once). Palette swaps or per-texture lightning (like in Wipeout) might be a better choice for static light, but then the issue is VDP1 RAM. The depth gouraud is only taking 64 bytes, so it's really not much.
“Realtime lightning polygon requires 12 bytes of extra vector data” plus the 32 bytes and 8 bytes that refer before of SGL? 52 bytes total for distorted sprite?

For the PVS or Portal, the problem is the implementation, it's not easy.

I am conscious about that. How can help you whit this? Try to find some code example? Or are you have? Testing??

For the CD, obviously I know exactly what I'm loading as everything is tightly packed in memory, so I know I'm not loading more than 1,3 or 1,4 MB : I load the textures first to LWRAM and DMA them to the VDP1 RAM, then the actual map data that I all keep in the low-work RAM. The problem is that the current jo_fs_read_next_bytes function fetches 1 sector (2048 bytes) at a time, so it's not making good use of the SH1 512 KB buffer since it keeps stopping and transfering the data. It could fetch 10 sectors at a time without stopping. The async load function, as-is, can't be used for what I do, so I'll have to see what Johannes has in mind or continue writting my own CD function using SBL.

Many areas until to implement well to have better tools to dev new games. Any effort is welcome. In this direction, your both efforts to implement a better function to win speed in data load from CD is a good new for the community.

So my available memory is : VDP2 RAM, audio RAM and SH-1 RAM. The rest is fully used or will be soon.

Well. Only can give you is the possibility or use color bank for LOD textures. Whit the “possible” problem that gouraud not work or work whit issues. For the audio RAM, I try to find some code example. Ideally fx sounds coded in ADPCM in low sample quality and mono. In order to get the most FX sounds, mainly, and maybe some song or music. In documentation in this way only use 4% of the sound CPU I remember to decode this quality of ADPCM. The other option is combine low PCM fx sounds whit ADPCM tunes in low medium/stereo quality. I understand that you can decompress audio whit the M68000 is need place the data in your own RAM. If not is very slow, load from system RAM to process the sound data.
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #156 on: February 26, 2018, 03:29:29 am »
The VDP1 limit is pretty well known, but there are ways to speed it up : use VDP2 CRAM, avoid overdraw, use lower quality textures, avoid some effects, do preclipping, etc.

For the difference, SGL is just poorly designed for memory counsumption. 4 bytes vertices (16.16 Fixed) is fine I guess (vs 2 bytes - 8.8 Fixed - for Slavedriver), but the quads are wasting way too much memory. One quad in SGL (using realtime light) is taking, without considering the vertices, 44 bytes. If you add an average of 2 vertices per quads (since some vertices are reused), that's 68 bytes.
There is nothing to be done, it's just how it is with SGL.

For lightning, again that's how Slavedriver works : they use 8 bytes for vertices, 2 for each axis (x,y,z) and 2 for the color.
You can do the same thing (static light) with SGL and gouraud shading without major issues simply by re-using the same gouraud adresses. For dynamic light, you need another vector (12 bytes). Even if you have the normals, you still need that extra vector as Sega found out their method had issues, so they just added something else on top of everything else.


Duke Nukem 3D is using 16 CLUT, not 16 bits RGB.

For mipmap, no, I can't use the VDP2 RAM. I really want to use the CRAM, but you only have a maximum of 2048 colors, but if you use 16 colors per sprite, you still need to have these 16 colors in the same area. When you have 500 sprites, it's nearly impossible. Using 256 colors is possible of course, but then you double the VRAM counsumption.
Gouraud shading can be used with VDP2 Color bank as proved by the "Chrome" demo, but Sega writes in its documentation that they can't guarantee good results.
Per texture light is great, but then again you need to store many more sprites, which isn't easy.

For the PVS, I haven't seen much code online, but unless someone else here codes something first, I'll have to do it myself.


For the cd functions, the problem is that duplicating work isn't efficient, we all have a job, girlfriends, and all, so we can't spend as much time as we'd like on these projects.

For audio RAM, for PCM, I never saw an example, but maybe some homebrew emulator do it, I really don't know.



corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #157 on: February 26, 2018, 08:27:58 pm »
The VDP1 limit is pretty well known, but there are ways to speed it up : use VDP2 CRAM, avoid overdraw, use lower quality textures, avoid some effects, do preclipping, etc.

Yes, but for me not is good documented limit. Or bottleneck. For example: How many elements or layers or area limits to use VDP1 CC Half-transparent.... Or VDP1 CC Shadow. Or many cycles use Gouraud + Flat Polygon. Or Flat Poligon + H-T + Gouraud... How many cycles use Forced Triangles share a one vertex common. For me so many dudes. Or many cycles read the VDP1 Buffer o VDP2 Buffer for the CPU to Render-to-texture effects... So many questions, to know the real limits of the system.

For the difference, SGL is just poorly designed for memory counsumption. 4 bytes vertices (16.16 Fixed) is fine I guess (vs 2 bytes - 8.8 Fixed - for Slavedriver), but the quads are wasting way too much memory. One quad in SGL (using realtime light) is taking, without considering the vertices, 44 bytes. If you add an average of 2 vertices per quads (since some vertices are reused), that's 68 bytes.
There is nothing to be done, it's just how it is with SGL.

Ok, gotcha is a base Design of the SGL. Not in mind put the limit the hardware to lighting system, like Slave engine.

For lightning, again that's how Slavedriver works : they use 8 bytes for vertices, 2 for each axis (x,y,z) and 2 for the color.
You can do the same thing (static light) with SGL and gouraud shading without major issues simply by re-using the same gouraud adresses. For dynamic light, you need another vector (12 bytes). Even if you have the normals, you still need that extra vector as Sega found out their method had issues, so they just added something else on top of everything else.

I read the issues in the release notes of SGL/SBL SDK. But I understand that fix in last release for the Gouraud real time.

Duke Nukem 3D is using 16 CLUT, not 16 bits RGB.

Yes, my mistake. Sorry.

For mipmap, no, I can't use the VDP2 RAM. I really want to use the CRAM, but you only have a maximum of 2048 colors, but if you use 16 colors per sprite, you still need to have these 16 colors in the same area. When you have 500 sprites, it's nearly impossible. Using 256 colors is possible of course, but then you double the VRAM counsumption.
Gouraud shading can be used with VDP2 Color bank as proved by the "Chrome" demo, but Sega writes in its documentation that they can't guarantee good results.
Per texture light is great, but then again you need to store many more sprites, which isn't easy.

Is a good news know that Gouraud is possible use colour bank, no limit here. I understand the real problem is design a sprites colour palette whit precision to have similar colour between texture CLUT and Colour bank. And use 16 colour is enough for very low and far mip map, whit various palletes of 16 colour I mean. An you save it a lot of memory and use CVRAM.

For the PVS, I haven't seen much code online, but unless someone else here codes something first, I'll have to do it myself.
I will try find something in C, that is the same language and how example or beginning is something.

For the cd functions, the problem is that duplicating work isn't efficient, we all have a job, girlfriends, and all, so we can't spend as much time as we'd like on these projects.
I take your word... I have a child XD But this is a funny and epic "mission"! :)

For audio RAM, for PCM, I never saw an example, but maybe some homebrew emulator do it, I really don't know.
I will try to find something here also.

Thank again and greetings!
« Last Edit: February 28, 2018, 11:01:51 am by corvusd »
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #158 on: March 03, 2018, 03:01:35 am »
 So I used a basic SBL cd function to load the maps instead.
The result? The maps now load in...5 seconds! 1.4 MB in 5 seconds instead of 100-120 seconds.
I couldn't believe my eyes how fast it was.


corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #159 on: March 03, 2018, 10:47:25 am »
So I used a basic SBL cd function to load the maps instead.
The result? The maps now load in...5 seconds! 1.4 MB in 5 seconds instead of 100-120 seconds.
I couldn't believe my eyes how fast it was.

Ou yeah!! :)

This function maybe use it whit a example to implement some improvements in the side of jo-engine function. Is really good news!

:D
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #160 on: March 07, 2018, 01:05:48 am »
I'm just using the GFS_Load function from SBL.

Here is the game running on stock hardware (30 fps) :
https://youtu.be/VsHibSGWKuw

Joeveno

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Re: Sonic Z-Treme
« Reply #161 on: March 07, 2018, 01:56:24 am »
Hello! it's me again!
Is there gonna be any kind of Homing Attack in the Z-Treme project? what about camera centralization?
What about using the Robo Blast 2 Sonic Adventure DX maps in this project? they're not as big and graphically demanding as Vanilla Robo Blast 2 maps, do you think it's possible to insert them in the project?
And what are your plans in the full game maps right now? would be great to know :)
Cheers!
« Last Edit: March 07, 2018, 02:15:59 am by Joeveno »

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #162 on: March 07, 2018, 03:45:21 pm »
No for the homing attack.
I'm not sure I understand the question about the camera.
You mean free floating camera?
Maybe, but only if an analog pad is detected, else I'll keep it fixed at 90 degrees angles.

For the Robo Blast maps, sorry but it won't happen.
It's just too much work converting maps' geometry to fit the Saturn, I'll stick to Sonic X-Treme maps and maybe some Sonic Jam-style maps (just maybe).

corvusd

  • Jr. Member
  • **
  • Posts: 81
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sonic Z-Treme
« Reply #163 on: March 07, 2018, 08:11:58 pm »
I'm just using the GFS_Load function from SBL.

Here is the game running on stock hardware (30 fps) :
https://youtu.be/VsHibSGWKuw

Another step forward! Yeah! run very well in real hardware. Amazing!

Very weird this specific slowdowns... It is possible debug in some-way?? And finding what’s happens?

All the rest beautiful. Very well adjust mip-mapping in levels.

Go head! Greetings!
« Last Edit: March 07, 2018, 09:07:11 pm by corvusd »
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 341
  • Karma: +72/-1
    • View Profile
Re: Sonic Z-Treme
« Reply #164 on: March 07, 2018, 08:19:27 pm »
You mean with the Jo Engine functions?
I think it's just how it reads one sector (2048 bytes) at a time and puts it in a buffer.
I just transfer everything right into memory without any intermediate buffer.
As you can see, the loading is super fast.
I could work hard to make it load 1 or 2 seconds faster, but it's not worth the extra work I think since it's super fast already.

For the mipmapping, I don't generate the texture now if the 2 quads are too different. It increases the geometry, but it looks much better.

 

Sitemap 1 2 3 4 5 6 7 8 9 10 
SMF spam blocked by CleanTalk