Author Topic: Sega Basic Library 6.0  (Read 805 times)

XL2

  • Sr. Member
  • ****
  • Posts: 343
  • Karma: +75/-1
    • View Profile
Sega Basic Library 6.0
« on: November 06, 2017, 03:16:05 am »
I think it's fair to say that SBL was really overlooked (except for some functions added in SGL as well).

But it is really interesting in that it includes the source code for everything it does.
It includes a sprite (polygon) library (under SPR), with processing using the SCU DSP for matrix transformation.
It's probably slower than SGL overall, but since the source code is included, I think it's very interesting.

Has anyone ever given it a good look?

I haven't found any demos for it.

The 3d part seems more complete than SGL even if it was slower in the end, so I think it's a good starting point.

Things like "inbetween polygons" fix the issue of duplicated vertices after a map subdivision (octree, bsp, grid, etc.), while just making use of the SCU DSP at all is something SGL didn't even bother doing (unlike the high-end games like Burning Rangers, Quake and Sonic R using it).


corvusd

  • Jr. Member
  • **
  • Posts: 83
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sega Basic Library 6.0
« Reply #1 on: August 14, 2018, 02:29:41 pm »
I think it is possible that there is something example code whit SCU-DSP transforms libraries. Some time ago, in segaextrem someone free to iso, whit a Sample games for SBL 1.1. This samples used intensely one SH2 and SCU-DSP for transform matrix coordinates. Inside the iso image, are code and libraries I remember. This afternoon when I return from work I try to look better inside.

In this post forum: https://segaxtreme.net/threads/sega-saturn-sample-by-sega.24264/

You are here! :)
« Last Edit: August 14, 2018, 02:34:58 pm by corvusd »
David Gámiz Jiménez

XL2

  • Sr. Member
  • ****
  • Posts: 343
  • Karma: +75/-1
    • View Profile
Re: Sega Basic Library 6.0
« Reply #2 on: August 14, 2018, 06:28:08 pm »
Yes, there is, also the full source code.
But the main issue is that it DMA the data to vram, causing interruptions for the vdp1. It's just faster to use a buffer and do it all on sh2 like SGL does.
Finding a good use for the scu dsp isn't easy, but lightning was the usual choice.

corvusd

  • Jr. Member
  • **
  • Posts: 83
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sega Basic Library 6.0
« Reply #3 on: August 14, 2018, 07:01:05 pm »
"Yes, there is, also the full source code."
Okay this is good! source code! :D

"But the main issue is that it DMA the data to vram, causing interruptions for the vdp1. It's just faster to use a buffer and do it all on sh2 like SGL does."
It's true, in another post you told me that.

"Finding a good use for the scu dsp isn't easy, but lightning was the usual choice."
It is not easy, not at all. I know. What is true, that well used can shoot performance. The case of the Sonic R is clear. Reaching the 1300 quads visible at 30 stable FPS. It is true that it also adjusts the types of primitives to use the VDP1 well. But the SCU-DSP is definitely doing work in this game. Not to mention that the engine is very good, with the use of the second SH2 to rasterize. Everything in assembler. If J. Burton shared the code it would be great! :)
David Gámiz Jiménez

20EnderDude20

  • Full Member
  • ***
  • Posts: 115
  • Karma: +6/-0
  • I'm also known as "The Blender Fiddler" on Youtube
    • View Profile
    • Youtube Channel
Re: Sega Basic Library 6.0
« Reply #4 on: August 14, 2018, 07:10:48 pm »
What’s stopping us from obtaining the source code from him?

corvusd

  • Jr. Member
  • **
  • Posts: 83
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sega Basic Library 6.0
« Reply #5 on: August 14, 2018, 07:41:28 pm »
Well... I think the copyright. It possible that it not are under property of TT only. Maybe also of SEGA. Wherever, J. Burton not are all TT... and this code is very valuable. If you want try to contact whit him and try to that He share to the community. Let Go! :D
David Gámiz Jiménez

20EnderDude20

  • Full Member
  • ***
  • Posts: 115
  • Karma: +6/-0
  • I'm also known as "The Blender Fiddler" on Youtube
    • View Profile
    • Youtube Channel
Re: Sega Basic Library 6.0
« Reply #6 on: August 14, 2018, 07:57:47 pm »
Why are dsp’s so hard to grasp, demonstrated especially by the infamous difficulty of programming the dsp’s on the Jaguar?

XL2

  • Sr. Member
  • ****
  • Posts: 343
  • Karma: +75/-1
    • View Profile
Re: Sega Basic Library 6.0
« Reply #7 on: August 14, 2018, 08:25:28 pm »
Unless you want to hack his account and steal his code, he won't share Sonic R's code, Sega owns it.
The scu dsp is super hard because of all the restrictions around it, it has its own assembly language, you need to dma data to its own ram, it has no division unit, you need to either scu dma data somewhere or fetch it with the sh2, it has very little ram and it's running at half the clock rate of the sh2. It's probably faster 99% of the time to just use the sh2.

To corvusd, the quad count isn't that relevant. A draw command takes something like 70 cycles minimum. The rest depends on the texture size, the drawn pixels, the color calculation functions used, etc.
So lots of small polygons won't hurt performances that much, except maybe on the cpu side.
The key is to reduce overdraw.
Sonic R used a pvs, so it didn't eliminate overdraw but that's fast enough.
Slavedriver engine games reduced the overdraw to a minimum, but it came at heavy cost for the cpu, with a complex portal system.
In Sonic Z-Treme, I still don't have a pvs so there is lot of overdraw, but thanks to the mipmapping and lod, it's still very fast, drawing something like 1000 quads at 30 fps in some situations. In some scenes, the quad count is low (300-400), but so many quads are merged that it would otherwise be maybe 1200 quads.
But the overdraw is still what's preventing even more polygons on screen, more than the cpu.
So using the scu dsp wouldn't have such an impact right now.

20EnderDude20

  • Full Member
  • ***
  • Posts: 115
  • Karma: +6/-0
  • I'm also known as "The Blender Fiddler" on Youtube
    • View Profile
    • Youtube Channel
Re: Sega Basic Library 6.0
« Reply #8 on: August 14, 2018, 09:06:53 pm »
I don’t understand how a barrel shifter takes up a lot of space on a die, but I guess it’s enough to even make the dsp for the n64 have no barrel shifter for the new instructions. Comparisons aside, is the RAM for the dsp slower than the RAM for the sh-2’s?

corvusd

  • Jr. Member
  • **
  • Posts: 83
  • Karma: +8/-0
    • View Profile
    • Personal Web Portfolio
Re: Sega Basic Library 6.0 [EDIT: Sorry, Lots of corrections]
« Reply #9 on: August 14, 2018, 11:59:10 pm »
Unless you want to hack his account and steal his code, he won't share Sonic R's code, Sega owns it.
The scu dsp is super hard because of all the restrictions around it, it has its own assembly language, you need to dma data to its own ram, it has no division unit, you need to either scu dma data somewhere or fetch it with the sh2, it has very little ram and it's running at half the clock rate of the sh2. It's probably faster 99% of the time to just use the sh2.

To corvusd, the quad count isn't that relevant. A draw command takes something like 70 cycles minimum. The rest depends on the texture size, the drawn pixels, the color calculation functions used, etc.
So lots of small polygons won't hurt performances that much, except maybe on the cpu side.
The key is to reduce overdraw.
Sonic R used a pvs, so it didn't eliminate overdraw but that's fast enough.
Slavedriver engine games reduced the overdraw to a minimum, but it came at heavy cost for the cpu, with a complex portal system.
In Sonic Z-Treme, I still don't have a pvs so there is lot of overdraw, but thanks to the mipmapping and lod, it's still very fast, drawing something like 1000 quads at 30 fps in some situations. In some scenes, the quad count is low (300-400), but so many quads are merged that it would otherwise be maybe 1200 quads.
But the overdraw is still what's preventing even more polygons on screen, more than the cpu.
So using the scu dsp wouldn't have such an impact right now.

Good! this value "70 cycles" is my nightmare. I can not understand this value and the rest of formula from SoE Tutorial. In same way, if it is possible to make a formula, it is possible to calculate the max data to VDP1. Really I think that is very difficult. For SS, but equal for PSX. I am convinced, to the point that the only thing that helped to optimize your GPU in PSX was the Performance Analyzer, which extracts and shows a great amount of data, for better all the data flow in the system and focus in draw GPU state.

EDIT:
About SCU-DSP
Quote
SEGA SATURN TECHNICAL BULLETIN #SOA-10
Saturn SCU DSP Demonstration Program

The DSP sample program performs 3D point transformation, i.e. it multiplies a 4x3
homogeneous matrix by an arbitrary list of 3-element vectors (the fourth element of each
vector is presumed to be 1). The program attempts to take full advantage of the
parallelism built into the DSP, and the transformation matrix, the input points, and the
output points are transferred using the SCU’s DMA capability. The sample code
performs point transformations roughly a third faster than the equivalent code written in
SH2 assembly language
, even allowing for the time spent transferring data into and out of
the DSP’s memory. It is hoped that this program is general and useful enough to be used
in an actual development environment.

Quote
SEGA SATURN TECHNICAL BULLETIN #SOA- 8
Saturn SCU DSP Tutorial



1.2 Advantages of Using the DSP
The DSP is a highly specialized processor intended to efficiently calculate sums of
products, as when performing matrix and vector calculations such as 3D point
transformations or lighting calculations. When performing the sorts of tasks for which it
was designed, the DSP can be faster than the SH2, because it can load operands for one
calculation, perform a second calculation, and store the results of a third calculation in
parallel.
It can also perform a 32x32 multiply, yielding a 48-bit result, in a single cycle.
The DSP gains an additional advantage when performing fixed-point calculations, since,
when it stores its results to its data RAM, it can store either the lower or the upper 32 bits
of its 48-bit accumulator, whereas the SH2 must take time to explicitly reformat the
results of fixed-point calculations by using the “xtrct” instruction.

1.3 Disadvantages of Using the DSP
The DSP runs at half the clock speed of the SH2, so, while the DSP can multiply in a
single cycle, that cycle is twice as long as one of the SH2’s cycles.
The DSP’s doesn’t have much memory, and the memory it does have is not mapped onto
the system bus, which means that the DSP must continually take time to copy its data
between its own data RAM and the SH2’s work RAM.
The DSP is difficult to program. A routine that could be coded in SH2 assembly
language in half an hour might take half a day to write, debug, and fully optimize on the
DSP


6. Parallelism
The DSP’s two main functional units (the ALU and the multiplier) can operate in
parallel, as can its four buses and its four banks of data RAM. As a result, the DSP can
execute up to six instructions in a single cycle,
including any or all of the following: one
ALU instruction, an instruction to load the RX or P register, the MOV MUL, P
instruction, an instruction to load the RY register or the accumulator, either the MOV
ALU, A instruction or the CLR A instruction, and a D1-bus instruction. These are the
only instructions that can be used in parallel with each other; other instructions require a
cycle of their very own.

All the rest Totally agree with you. But if we unload the SH2 from calculations that the SCU-DSP can do asynchronously. We could use the SH2 slave with the J. Burton code to reduce the overdraw. As? Part of my idea is the following algorithm sketch. According to those criteria:

Starting data for my hypothesis:
Taking as a basis the Sonic R (or your project) in this case I have the data already:
1 Player: 1.266 max @ 30 = 37.980
2 player Splitscreen: 1.374 max @ 30 = 41.220
 
Of which (Rounded data):
Gouraud Distorted Sprite = 1000
  a) Light source Color = 500
  b) Light Pre-calculated = 500
Flat Distorted Sprite = 200
Sprite Scaled = 100

Requirements:
- Using the raster software by J. Burton. Rasterized part of the "distorted sprites". Converting them into Normal Sprites with the exact pixels. Like PSX. In my estimate up to 200 quads, 400 triangles. According to the R of the initial screen.
- HSS always activated for distorted sprites or scaled sprites = or greater than 32x32
- Do not use textures + large 64x64. Or the minimum ones on the screen.
- Using Pre-clipping Enabled for quads outside the System Clipping Coordinates.
- Using Pre-clipping Disable for quads that are always inside, whole or for the most part, of the System Clipping Coordinates.
- User Clipping Coordinates always.
- Use User Local Clipping if zones such as: Interiors, houses, tunnels, caves ... etc ..
- Use End Code in textures with masked or transparent areas.
- Use Transparent Pixel only for distorted sprites with texture.
- Using End Code Draw and Transparent Pixel Draw is the most optimal, but less colors.

Now, we follow a typical Viewing frustum Clipping case:
A) Near Clipping Zone:
1) If the quad forms an angle between 90 and 45deg.
  a) Draw with VDP1 without mipmap.
  b) Gouraud is drawn.

2) If the quad forms an angle less than 45deg with the camera view.
  a) Draw with VDP1 with level 2 of mip-map.
  b) Gouraud is drawn.

3) If the quad forms an angle less than 30deg with the view camera:
 a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture)
 b) Gouraud is not drawn.

B) Medium Clipping Zone:
1) If the quad forms an angle between 90 and 45.
 a) Draw with VDP1 change to mipmap Level 2.
 b) Gouraud is drawn.

2) If the quad forms an angle less than 45deg with the camera view.
 a) Change the texture quad to a flat polygon with a precalculated base color(representative of the entire texture)
 b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

3) If the quad forms an angle less than 30deg with the view camera:
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting

C) Far Clipping Zone:
1) If the quad forms an angle between 90 and 45.
  a) Draw with VDP1 change to a flat polygon whit a precalculated base color(representative of the entire texture)
  b) Gouraud is not drawn. Flat Lighing(pallete CLUT levels luminance).

2) If the quad forms an angle less than 45deg with the camera view.
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting.

3) If the quad forms an angle less than 30deg with the view camera:
 a) Raster SH2 slave with precalculated flat color(representative of the entire texture).
 b) Gouraud is not drawn. No lighting.

I am convinced that we can reach the limit of optimizing the problem of overdraw.

And this is just an idea. There is still margin.

Greetings!
« Last Edit: August 20, 2018, 01:11:44 pm by corvusd »
David Gámiz Jiménez

 

Sitemap 1 2 3 4 5 6 7 8 9 10 
SMF spam blocked by CleanTalk