Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - corvusd

Pages: [1] 2 3 ... 6
Project announcement / Re: Sonic Z-Treme
« on: September 13, 2018, 11:50:51 am »
Quake maps do work with OKish draw distance, but have a very high poly count and vertices count. Still, 30 fps with Sonic and some 920 drawn polygons on screen isn't too bad considering all the overdraw.
I made some little progress on the bsp tree, but there are many problems to solve as nothing found online mentions quads or small polygons, so it might take a while to make it all work.

Again great job XL2!

It is getting closer and closer to the nearly 1300 of Sonic-R at 30FPS. Definitely, you are putting more and more to the limit the possibilities of the DSP (transformation and lighting) of the two SH2, along with the management of calls to the VDP1 and VRAM of VDP1 and sound system. Not to mention physics, collisions or AI. That is to say all that traffic by BUS-B passing through the SCU. Amazing!

I am increasingly aware that this is one of the keys to the REAL optimization of the SS. In fact, in his reflection with PSX, this is also part of the key to his good performance. And having a Profile system made it really easy to expose all the bottlenecks in an engine or program.

Courage and thanks again! :)

Project announcement / Re: Sonic Z-Treme
« on: September 02, 2018, 05:58:32 pm »
Hello again XL2,

I've been on vacation these days, and I have not been able to see your SAGE build until today.

Great job! I would ignore most reactions. More without foundations. You have done what no one did. THANK YOU!

Reviewing your work in particular.

1. An implementation of the shadow with transparency VDP1 for Sonic very smart. To avoid the redrawn defect of the VDP1 and you take care when shadow cover other primitive this are a Color LUT or RGB from VDP1 color VRAM for correctly blend transparency.

2. A very nice particle effects and well implemented in this last build. Later when you see. If you have surplus performance and find a new and original way you could use "some" type of SS transparency. They would already be perfect.

3. I see that the effect of Metal(Sonic and enemies) changes according to the background. It's a totally original effect!

4. I see that you reach peaks of 750 elements on the screen, keeping 30FPS !!! Absolutely brutal!

5. I see that you still have a lot of main RAM, VDP1 VRAM and VDP2 Pattern RAM available. And I see that you have used almost 90% the VDP2 Color RAM.

6. I also see signs of using DSP sound !! 100% memory used! : D

7. The control in my case I loved. The problem in my point of view. Is that the game needs some kind of tutorial. Because it is not the typical Sonic that the typical user can find. The areas of falls and death are numerous. And the speed of Sonic can be deadly XD. In any case. The control under my point of view responds superbly. As a crazy idea, how would you see that the camera was always behind Sonic?

And a lot of details more!!!

Greetings and thanks again!

Well, one solution that I'm currently experimenting with :
-Use the framebuffer from last frame.
-If for each pixel the bit 15 is 0, it's palette code. Just draw a line using cram palette with transparency ratio.
-If the msb is 1, use vdp1 half-transparency.
Since you set the z distance, it won't create much sorting issues and would keep artifacts low.

The main issue is reading from the framebuffer, it's just slow.
Like I mentionned, you can also use gouraud shading, so these fake polygons could still look nice.
I'm not sure it could be done with textures because the width must be a multiple of 8 (I hate this limitation)

Forgive XL2, I wanted to have answered this answer of yours before.

I have a similar idea to solve the problem. I thought that you could create a function that knew when a part of an element (Normal, Scaled or Distorted Sprite and lines or polylines) is on top of another element of VDP1 or not. That is, the function knows when a vertex of an element is above an element of VDP1 or VDP2. If so, use VDP1 CC (Color calculation) Half-Transparent and when it is on the VDP2 use palette transparency shared with VDP2 color RAM.

There would still be the problem that certain elements, when they are on the border, did not make a transparency correctly.

In your idea something similar would happen, but only at the line level.

Maybe my idea is "faster" because it would be done in the transformation calculation part. Comparing common coordinates, it would not touch to read the framebuffer as in your idea.

Also, it may be useful to use both ideas in combination.

Who knows, even plus the trick of Burning Rangers.

According to situations, each form or idea may be more suitable for a better solution. Quality / Performance

All of them using the native process and graphic pipeline of the SS.


What did you use?

Scaled sprite whit vdp1 color calculation function replace/shadow, this are equal to half-transparent. Only add a step transform to monocrome mask if the pattern is color.

Project announcement / Re: Sonic Z-Treme
« on: August 24, 2018, 01:16:46 am »
Yeah, but the solution is to subdivide the map further. The PS1 had that feature in the sdk from day one. The Saturn, as usual, doesn't and can't really do it. I'm trying to find a way to clip textures in vram to subdivide it in 4, but I'm not sure it will work.
These days I was watching what you told me that the SCU-DSP would not be useful for real time tesselation. Looking at the PSX SDK, it seems that it uses the GTE for it, but I'm not sure that it's for everything. I mean, I think for tessellation. The GTE is used to recalculate the UV coordinates of the new polygon or polygons. But I think that to divide it uses the CPU. In a technical document a developer asks which is better to divide the GTE or the CPU and they respond to the CPU. Which we can apply to the SS because the SCU-DSP and the GTE are very similar doing addition and multiplication to dot product mainly. And the SH2 has divison instruction. Unless divisions can be made with multiplications of decimal numbers in the GTE and in SCU-DSP?

Incredibles posts these days XL2!

Well, it's a very creative solution on your part. I think that if you can facilitate the creation of content with this routine, it could be used well, in specific things. For large crystals for example.

With respect to making the transparency work in both VDP1 and VDP2. I still believe that you have to see how Burning Rangers does it. Well I think it's the best solution, using the SS pipeline.

The maximum that I have come to find out is that BR make two "spaces" of drawing are alternated in the VDP1. In first, only the opaque elements with their texture or color are drawn. And in a second the transparent elements and a part of the black opaque elements that cover the transparent elements.

The first space is drawn with a clipping system "total" and the second to half. For example for a final output resolution of 320x240 non-interlaced, with a Clipping System of VDP1 of 319x239. The second space will be 160x120 and then send it to VDP2 to NBG1 at 16bit color and using a Color calculation of this layer over the VDP1. The elements drawn by the VDP1 will look transparent over the VDP1 and VDP2 at the same time.

All for "one" frame. BR have 20FPS peaks stable.

Problems that still exist:

1) The elements of the VDP1 between them do not mix. Could we use VDP1 H-T? Solving the redrawing problems. Using only non-deformed elements in the vertical like: Scaled sprites, Normal Sprites or Distorted sprites like Billboards. Or new tricks like yours XL2.

2) Can we get to render the second space to the total resolution?

3) In Burning Renger the final layer of transparency is on top of everything, including the UI. Could we somehow avoid this problem? For example creating a mask on these parts. Or using a VDP2 layer for the UI.

Objective: Make a total Sun Lens Flare effect that works on VDP1 and 2 and that are 3D elements of VDP1.

A for all! :)

What's better, dithering, or over-transparency? In my opinion, I would choose something that is real vdp1 transparency, and not the same screen door method that's been used since the genesis. The designers never had the intention of the output being precise, but with s-video, I would rather stick to real transparency, simply because it gives the Saturn a chance of having transparency. Oh, yeah, and I think the n64 can only do 50% transparency because mario in sm64 uses dithering when using a secret warp, so we shouldn't get too ahead of ourselves.

N64 have REAL Alpha Blending. Is a patent of Silicon Graphics. PSX and SS not have Alpha blending, it is wired half-transparency. In its documentation says it clearly all this stuff. We not invent theories or data... already they lied enough in the past: History of millions of polygons on screen at 60 FPS. We stick to the technical data, clearly documented. Please.

Half-transparency is REAL and totally useful in SS. The key is archive the best way to use it without caveats: No Redraw, in 3D primitives and OK blend in all pipeline(VDP1+VDP2).

Example to start whit real approach: Burning Rangers.

In my Table analysis we have at 65 titles whit VDP1 CC half-transparent use. We can research all advantages and disadvantages in each case(geometry, layer, quantity, area pixels, type of color and pool VRAM, size texture, color calculation use it...), and get right conclusion about the objective. In the same way are other column to analysis of VDP2 semi-transparency... Feel free to make research and share! :)

What if we just ask an sh-2 to make a distorted version of the shadow to be displayed as a sprite with affine transformation?

Everything is possible. But are you aware of the programming implications they have? Time, effort and knowledge. Right now we are in an early moment of homebrew for SS. With public official SDKs or Jo-engine all these things that you pose are far away.

But a shadow’s just one color, so it shouldn’t matter that much

Of course it matters. 1 color or 32,000 colors not is the important. The issue is in how those colors are painted by the VDP1. Think about that deeply a time, you will see how it is not so simple.

To understand the problem of transparencies and the redrawing in SS. We have to dig deeper into our technical knowledge of the SS graphic pipeline.

Tessellate a Quad will not help in reducing overdraw or redrawing of pixels in the H-T VDP1.

In addition to using the VDP1 H-T (No Gouraud + H-T) has other problems added:
- If the quad is very deformed it will take up to 6 (According to documentation) drawing cycles.
- Ideally if It do not redraw any pixel, up to 2 cycles (This is a speculation on my part, being in essence the same as Gouraud.).
- Restriction of color types to work well within VDP1.
- It will never see the VDP2.
- Simple to program.
- Infinity blend layers.
- 1 Level transparency. 50% blend.

In other hand, if we use the transparency of VDP2 with sprites of the VDP1:
- It will not have redrawing problem.
- It does it in a cycle.
- Restriction of color types to work well within VDP2.
- It will not ever see the VDP1.
- More complicated to program.
- Up to 2 blend layers. 1 real transparency, other MSB Shadow function whit a lot of restrictions. Finally 1 effective transparency layers.
- Up to 32 levels of transparency.

Unless you want to hack his account and steal his code, he won't share Sonic R's code, Sega owns it.
The scu dsp is super hard because of all the restrictions around it, it has its own assembly language, you need to dma data to its own ram, it has no division unit, you need to either scu dma data somewhere or fetch it with the sh2, it has very little ram and it's running at half the clock rate of the sh2. It's probably faster 99% of the time to just use the sh2.

To corvusd, the quad count isn't that relevant. A draw command takes something like 70 cycles minimum. The rest depends on the texture size, the drawn pixels, the color calculation functions used, etc.
So lots of small polygons won't hurt performances that much, except maybe on the cpu side.
The key is to reduce overdraw.
Sonic R used a pvs, so it didn't eliminate overdraw but that's fast enough.
Slavedriver engine games reduced the overdraw to a minimum, but it came at heavy cost for the cpu, with a complex portal system.
In Sonic Z-Treme, I still don't have a pvs so there is lot of overdraw, but thanks to the mipmapping and lod, it's still very fast, drawing something like 1000 quads at 30 fps in some situations. In some scenes, the quad count is low (300-400), but so many quads are merged that it would otherwise be maybe 1200 quads.
But the overdraw is still what's preventing even more polygons on screen, more than the cpu.
So using the scu dsp wouldn't have such an impact right now.

Good! this value "70 cycles" is my nightmare. I can not understand this value and the rest of formula from SoE Tutorial. In same way, if it is possible to make a formula, it is possible to calculate the max data to VDP1. Really I think that is very difficult. For SS, but equal for PSX. I am convinced, to the point that the only thing that helped to optimize your GPU in PSX was the Performance Analyzer, which extracts and shows a great amount of data, for better all the data flow in the system and focus in draw GPU state.

Saturn SCU DSP Demonstration Program

The DSP sample program performs 3D point transformation, i.e. it multiplies a 4x3
homogeneous matrix by an arbitrary list of 3-element vectors (the fourth element of each
vector is presumed to be 1). The program attempts to take full advantage of the
parallelism built into the DSP, and the transformation matrix, the input points, and the
output points are transferred using the SCU’s DMA capability. The sample code
performs point transformations roughly a third faster than the equivalent code written in
SH2 assembly language
, even allowing for the time spent transferring data into and out of
the DSP’s memory. It is hoped that this program is general and useful enough to be used
in an actual development environment.

Saturn SCU DSP Tutorial

1.2 Advantages of Using the DSP
The DSP is a highly specialized processor intended to efficiently calculate sums of
products, as when performing matrix and vector calculations such as 3D point
transformations or lighting calculations. When performing the sorts of tasks for which it
was designed, the DSP can be faster than the SH2, because it can load operands for one
calculation, perform a second calculation, and store the results of a third calculation in
It can also perform a 32x32 multiply, yielding a 48-bit result, in a single cycle.
The DSP gains an additional advantage when performing fixed-point calculations, since,
when it stores its results to its data RAM, it can store either the lower or the upper 32 bits
of its 48-bit accumulator, whereas the SH2 must take time to explicitly reformat the
results of fixed-point calculations by using the “xtrct” instruction.

1.3 Disadvantages of Using the DSP
The DSP runs at half the clock speed of the SH2, so, while the DSP can multiply in a
single cycle, that cycle is twice as long as one of the SH2’s cycles.
The DSP’s doesn’t have much memory, and the memory it does have is not mapped onto
the system bus, which means that the DSP must continually take time to copy its data
between its own data RAM and the SH2’s work RAM.
The DSP is difficult to program. A routine that could be coded in SH2 assembly
language in half an hour might take half a day to write, debug, and fully optimize on the

6. Parallelism
The DSP’s two main functional units (the ALU and the multiplier) can operate in
parallel, as can its four buses and its four banks of data RAM. As a result, the DSP can
execute up to six instructions in a single cycle,
including any or all of the following: one
ALU instruction, an instruction to load the RX or P register, the MOV MUL, P
instruction, an instruction to load the RY register or the accumulator, either the MOV
ALU, A instruction or the CLR A instruction, and a D1-bus instruction. These are the
only instructions that can be used in parallel with each other; other instructions require a
cycle of their very own.

All the rest Totally agree with you. But if we unload the SH2 from calculations that the SCU-DSP can do asynchronously. We could use the SH2 slave with the J. Burton code to reduce the overdraw. As? Part of my idea is the following algorithm sketch. According to those criteria:

Starting data for my hypothesis:
Taking as a basis the Sonic R (or your project) in this case I have the data already:
1 Player: 1.266 max @ 30 = 37.980
2 player Splitscreen: 1.374 max @ 30 = 41.220
Of which (Rounded data):
Gouraud Distorted Sprite = 1000
  a) Light source Color = 500
  b) Light Pre-calculated = 500
Flat Distorted Sprite = 200
Sprite Scaled = 100

- Using the raster software by J. Burton. Rasterized part of the "distorted sprites". Converting them into Normal Sprites with the exact pixels. Like PSX. In my estimate up to 200 quads, 400 triangles. According to the R of the initial screen.
- HSS always activated for distorted sprites or scaled sprites = or greater than 32x32
- Do not use textures + large 64x64. Or the minimum ones on the screen.
- Using Pre-clipping Enabled for quads outside the System Clipping Coordinates.
- Using Pre-clipping Disable for quads that are always inside, whole or for the most part, of the System Clipping Coordinates.
- User Clipping Coordinates always.
- Use User Local Clipping if zones such as: Interiors, houses, tunnels, caves ... etc ..
- Use End Code in textures with masked or transparent areas.
- Use Transparent Pixel only for distorted sprites with texture.
- Using End Code Draw and Transparent Pixel Draw is the most optimal, but less colors.

Now, we follow a typical Viewing frustum Clipping case:
A) Near Clipping Zone:
1) If the quad forms an angle between 90 and 45deg.
  a) Draw with VDP1 without mipmap.
  b) Gouraud is drawn.

2) If the quad forms an angle less than 45deg with the camera view.
  a) Draw with VDP1 with level 2 of mip-map.
  b) Gouraud is drawn.

3) If the quad forms an angle less than 30deg with the view camera:
 a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture)
 b) Gouraud is not drawn.

B) Medium Clipping Zone:
1) If the quad forms an angle between 90 and 45.
 a) Draw with VDP1 change to mipmap Level 2.
 b) Gouraud is drawn.

2) If the quad forms an angle less than 45deg with the camera view.
 a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture)
 b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

3) If the quad forms an angle less than 30deg with the view camera:
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting

C) Far Clipping Zone:
1) If the quad forms an angle between 90 and 45.
  a) Draw with VDP1 change to a flat polygon whit a precalculated base color(representative of the entire texture)
  b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

2) If the quad forms an angle less than 45deg with the camera view.
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting.

3) If the quad forms an angle less than 30deg with the view camera:
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting.

I am convinced that we can reach the limit of optimizing the problem of overdraw.

And this is just an idea. There is still margin.


General discussion about the Sega Saturn / Re: Sega Basic Library 6.0
« on: August 14, 2018, 07:41:28 pm »
Well... I think the copyright. It possible that it not are under property of TT only. Maybe also of SEGA. Wherever, J. Burton not are all TT... and this code is very valuable. If you want try to contact whit him and try to that He share to the community. Let Go! :D

General discussion about the Sega Saturn / Re: Sega Basic Library 6.0
« on: August 14, 2018, 07:01:05 pm »
"Yes, there is, also the full source code."
Okay this is good! source code! :D

"But the main issue is that it DMA the data to vram, causing interruptions for the vdp1. It's just faster to use a buffer and do it all on sh2 like SGL does."
It's true, in another post you told me that.

"Finding a good use for the scu dsp isn't easy, but lightning was the usual choice."
It is not easy, not at all. I know. What is true, that well used can shoot performance. The case of the Sonic R is clear. Reaching the 1300 quads visible at 30 stable FPS. It is true that it also adjusts the types of primitives to use the VDP1 well. But the SCU-DSP is definitely doing work in this game. Not to mention that the engine is very good, with the use of the second SH2 to rasterize. Everything in assembler. If J. Burton shared the code it would be great! :)

General discussion about the Sega Saturn / Re: Sega Basic Library 6.0
« on: August 14, 2018, 02:29:41 pm »
I think it is possible that there is something example code whit SCU-DSP transforms libraries. Some time ago, in segaextrem someone free to iso, whit a Sample games for SBL 1.1. This samples used intensely one SH2 and SCU-DSP for transform matrix coordinates. Inside the iso image, are code and libraries I remember. This afternoon when I return from work I try to look better inside.

In this post forum:

You are here! :)

Very good ideas! Are you think about How make the change of shape when the camera rotate above the shadow?

Pages: [1] 2 3 ... 6
SMF spam blocked by CleanTalk