Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ponut64

Pages: 1 [2] 3 4 ... 15
16
What you want to use is RBG0 of VDP2.

RBG0 is a 3D rotating, scaling, and tiling plane.

In this thread, XL2 attached a demo that uses RBG0.
https://forum.jo-engine.org/index.php?topic=864.0

There are no Jo engine abstractions to use RBG0.
Here is a more direct example of the parameter setup for an RBG0 plane:
Code: [Select]
slRparaInitSet((void *)RBG0_PRA_ADR);
slMakeKtable((void *)RBG0_KTB_ADR);
slCharRbg0(COL_TYPE_256 , CHAR_SIZE_1x1);
slPageRbg0((void *)RBG0RB_CEL_ADR , 0 , PNB_1WORD|CN_12BIT);

slPlaneRA(PL_SIZE_1x1);
sl1MapRA((void *)RBG0RA_MAP_ADR);
slOverRA(0);
slKtableRA((void *)RBG0_KTB_ADR , K_MODE0 | K_FIX | K_LINE | K_2WORD | K_ON | K_LINECOL);
Cel2VRAM(tuti_cel , (void *)RBG0RA_CEL_ADR , 65536);
Map2VRAM(tuti_map , (void *)RBG0RA_MAP_ADR , 64 , 64 , 2 , 884);
Pal2CRAM(tuti_pal , (void *)RBG0RA_COL_ADR , 160);

slRparaMode(RA);
slBack1ColSet((void *)BACK_COL_ADR , 0);

    slColorCalc( CC_ADD | CC_TOP | NBG2ON | RBG0ON);
slColorCalcOn( NBG2ON | RBG0ON);
slScrTransparent(NBG2ON | RBG0ON);
slColRateLNCL(0x00);

To display & transform the RBG0 plane, you need to end the matrix it is in with the following:

Code: [Select]
		slCurRpara(RA);
slScrMatConv();
slScrMatSet();

slCurRpara sets RBG0's rotation parameter type.
slScrMatConv converts the matrix to a scroll screen matrix. (Note the consumes the matrix in the process, do not set polygons in this matrix)
slScrMatSet sets the converted matrix and sends it to VDP2 rather than VDP1.

In addition, you may note this command:
Code: [Select]
	
slScrAutoDisp(NBG0ON | RBG0ON );

slScrAutoDisp sets data inside of SGL (the development library) to notify VDP2 of which scroll screens are to be displayed automatically each frame.


I HIGHLY RECOMMEND YOU READ VDP2 USER'S MANUAL REGARDING THE SCROLL SCREEN, OR THE SEGA OF EUROPE TUTORIAL DOCUMENT.
Those can be found here:
https://antime.kapsi.fi/sega/docs.html

17
Project announcement / Re: what p64 does
« on: April 15, 2019, 04:05:55 am »
https://youtu.be/JkOGICAd2Ew

Notes:
The DSP is being used. I have attached the DSP program.

Performance Hints:
The best path to improved performance on the Saturn is what is frequently called "Data Oriented Design", or "DOD".
In general, your philosophy is to ensure the least amount of data is processed, moved, and accessed.
This is absolutely at odds with the prevalent modern philosophy of programming called "Object Oriented Design", or "OOD".

SGL Anamolies

SGL's documents state that SCU DMA Channel 0 and CPU DMA Channel 0 are "free" in SGL.
However, my observations indicate otherwise.
Let me try and walk you through what was happening.

First, we have a model. It's 625 vertices and 576 polygons.
Let's say we recalculate these polygon's normals and textures every frame before we send that model to be drawn as slPutPolygon.
Normally? This is actually OK. Using fast inverse square root, calculating a normal is a relatively inexpensive task (less than 100 instructions).
If we just wrote that out straight as 100, and did that 576 times, we get 57600 instructions.
Considering an SH2 has 28,000 instructions to offer per millisecond, it only takes about 2ms to perform that task.

But there is a problem with this theory.
The first problem is that we have to read the vertices data from memory. This is not instantaneous.
Another problem is we have to write normals back to memory.
The final problem is the Slave SH2 needs to access this data to draw the polygon, as SGL's default behavior is to use the Slave SH2 for all polygon / matrix processing.

However, by itself, this process does not cause any major performance or synchronization problem.
Remember your program runs on the MSH2. If you have the MSH2 perform this updating normals task before slPutPolygon is reached, it will all be fine.
You might get some small bus contention if the SSH2 currently crunching numbers on some previous model you sent, but that is not a major concern.

But let me try and read into this further.
An important part of SGL is that it is wisely set up to be sending draw commands to VDP1 in a way that wastes the least time possible.
Keep in mind that accessing VDP1's memory will halt VDP1's operation until some cycles after memory access is complete.
Because of this, you should not frequently access VDP1's memory to inform it about what to draw. Rather, SGL prepares all of your draw commands and sends them to VDP1 in one big batch.

SGL does not do this in a linear fashion, however. It is buffered so the SH2s don't waste time waiting for VDP1 to finish drawing so they can send the next frame.
As far as I understand it, your code in immediacy is running two frames behind the frame that is currently being sent to VDP1.
One thing this means is your frame-time is inevitably limited to the transfer time of the frame's data. So there is some time wasted, maybe 4ms?
So the SH2s and VDP1 do not have the full frame-time to render a frame.

This fact alone, you can reasonably ignore except for knowing that you don't actually have 16/33/50/66ms to do anything, always less.
More importantly, the Slave SH2 pretty much _always_ wants access to the memory to transfer the frame to VDP1. Maybe not always, but it is safer to assume so!
So not only does the SSH2 need memory access, it also needs a DMA channel.

Keep in mind that no two processors can access high/low memory at a time. There is a priority order.
The order is (in SGL): SCU > Master > Slave.

So what happens if SGL ABSOLUTELY MUST send the next frame on time (that is its goal: to be fast at 3D), but your code is accessing high memory, and the processor which manages this is the Slave SH2?
... You can call me out on this if you know differently, but my assumption is that the Slave SH2 will switch to using SCU DMA Channel 0 (the normal channel for slDMACopy) to gain priority over MSH2.

This causes a number of problems.
1. The SCU is slower at accessing memory than the SH2s. [Assumption, unverified, but I think it is]
2. The SCU-DSP may only DMA using SCU DMA Channel 0, therein this may directly contend with a DSP program and potentially cause it to malfunction.
3. SBL's file system uses SCU DMA Channel 0 if you set GFS_TMODE_SCU.
4. Imagine any other thing that might be using SCU DMA Channel 0, an assumed "free" channel, and guess what would happen if SGL suddenly wants to use it.
5. SCU can't access low memory. So contention in this range is gauranteed wait cycles. Frankly, that's probably better!

Again, this typically is not a problem, but because SGL is a black box it is unknown when it may enter this condition. It does not fire interrupts when it is or isn't transferring the next frame's data.

How could I come to this conclusion, and where _might_ it be a problem?

In my case, I had two DSP programs and file system transfers active using SCU DMA (to of course leave a DMA channel open on MSH2).
And, lo and behold, if you calculate 576 poly normals and have file system access via SCU (to the SCSP area), every frame both are happening will spike to 50ms.

This is an interesting cascade of contentions.
First, MSH2 and SSH2 want to access memory at the same time.
For SSH2 to gain priority over MSH2, it commands SCU to access the memory.
Then, an unexpected DMA channel is used, which interferes with SH-1, DSP, and MSH2.
In turn, this interferes once again with the SSH2's desired actions.
All piling up to a long delay before the data ends up in VDP1's memory and we move along to the next frame.
Because we can't just send the draw commands as we go.

Another anamoly is that this contention is theoretically worse than simply making MSH2 or SSH2 wait for memory access.
The file system is independent. It's not waiting for the MSH2 to tell it to start or stop transfers on a sub-frame basis, the SH-1 manages that.
Further, the DMA method used is the SCU. The whole CPU Bus has nothing to do with that data after the read commands are sent.
The DSP is also independent and internal to the SCU. If SCU-DMA Channel 0 is used up, the DSP's default behavior is to wait for completion before continuing.
I verified the DSP being uninvolved in this contention cascade by disabling the DSP programs and instead performing the calculations on MSH2. Still happens.

Finally, I got to testing what would happen if I made the normal calculations at sprite draw end via interrupt. Instead of frame-spikes, now EVERY frame was 66ms.
Then, I tested these calculations using slSlaveFunc, which puts your function after all draw commands on the Slave SH2's stack. Now frame-time depended on render load (could be 33ms, could be 50).
In both cases, the frame-times were decoupled from the file system.

The bottom line? Memory access is bad.
If you're XL2, the bottom line is asynchronous file systems are bad :)
All this complication, but it's that simple.

The solution to this problem is to keep the Master SH2 working more inside of its own cache, rather than stretching computations out so much that they involve more memory access.
Again, computing the normals themselves, no problem for the SH2. But each normal computed is unique data that starts filling cache with junk data that won't be re-used.
Instead of calculating every normal, I solved the problem by seeking through the polygon data first by concantenating the data that changes into a single 32-bit number.
It's then less data to sort through that and find out if that polygon changed. If it DID change, I can look backwards from the direction it moved to find it again.
With that known, only as much as 24 polygons ever need new normals. (1 row in 24x24 = 576).

This is an application-specific solution, but it is an example where computational simplicity was actually very much counter-productive. Instead, the program was made more complex, but faster.
Welcome to Saturn.


18
Project announcement / Re: what p64 does
« on: April 14, 2019, 05:18:17 am »

19
Project announcement / Re: what p64 does
« on: April 06, 2019, 12:20:01 pm »
Hello again,

Here is a DSP sample program for finding the normal of a polygon.

Because assembly is so hard to follow when others write it, I make as many comments as possible. It makes it difficult to crawl over the whole document but easier to follow.

/e: Hm, hesitate. I don't think its working right.
/e2: Fixed errrors.
If you are curious:
1. Line 201 was moving low order bits ("mov all,mc3" when it should have been using the high order bits "mov alh,mc3"
2. The instruction at line 277 was modified from (mvi 16,PL) to (mvi 17,PL)   our shifting output is 1 less than it should be, in comparison to the typical C logic.
3. Line 289 had an extra instruction added after it. This shifts the initial guess back right once. Because the DSP caches instructions, a "loop next instruction" command will execute its designated times to loop while the next instruction was already pre-fetched, so it will also execute, therefore an LPS command will execute the next command 1 more time than indicated in the LOP counter.

20
Project announcement / Re: what p64 does
« on: April 04, 2019, 04:24:46 pm »
Hi again,

Here's an SGL-compatible fast inverse square root function.

Code: [Select]
FIXED		fxisqrt(FIXED input){

static FIXED xSR = 0;
static FIXED pushRight = 0;
static FIXED msb = 0;
static FIXED shoffset = 0;
static FIXED yIsqr = 0;

if(input <= 65536){
return 1;
}

xSR = input>>1;
pushRight = input;
msb = 0;
shoffset = 0;
yIsqr = 0;

while(pushRight >= 65536){
pushRight >>=1;
msb++;
}

shoffset = (16 - ((msb)>>1));
yIsqr = 1<<shoffset;

return (slMulFX(yIsqr, (98304 - slMulFX(xSR, slMulFX(yIsqr, yIsqr)))));
}

21
Project announcement / Re: what p64 does
« on: March 30, 2019, 06:53:49 pm »
The DSP as a Logical Processor

I've started experimenting with the DSP, and have found contrary to my preconceptions, it is a logical processor. Now, of course, no one told me it wasn't, I just didn't know it was.
So to familiarize myself with its inner workings, I wrote a bitwise division program. One for signed values, one for unsigned values.

The DSP has no division instruction, so you need to write your own program for it.

22
Project announcement / Re: what p64 does
« on: March 17, 2019, 03:57:40 am »

23
Project announcement / Re: what p64 does
« on: February 27, 2019, 03:58:55 am »

24
Project announcement / Re: what p64 does
« on: February 12, 2019, 12:07:32 am »
i guess i should update this cuz im not dead

https://youtu.be/4DFYG2CBmO8

many methods have changed, including moving back to FIXED-point math for precision and speed.
If it looks like I've been re-treading the same math for months, I have.

I also have one pending / one visible contribution here:
https://emeraldnova.github.io/SHREC/Jo_Engine/Jo_Engine.html
He's working on other stuff too.

25
General Jo Engine Help / Re: Compilation Error: "Too many new sections"
« on: January 28, 2019, 04:55:48 pm »
Mutual inclusions of header files is OK. Just don't throw actual function code into header files :)
Mutual inclusions of source files is something you ought to avoid.
Basically, a source file includes its header and any other headers it needs. To access data and functions from that source file, you include its header.
If you want to define data in the source file, specify the data in the header file as "extern" so other sources can access it.
If you want to define data in the header file, define it normally.
Headers themselves can become too large, so perhaps they ought to be separated like I suggest too. If you have "common" data that you typically need everywhere, you can make a common header, it should work.

26
General Jo Engine Help / Re: Compilation Error: "Too many new sections"
« on: January 28, 2019, 08:05:26 am »
You are getting this error because a single source file has too many functions (the object code is too big).

This is an artifact of compiling for SH2s, as I understand it.

You must break up your code into separate source files, compile them separately, and structure them to share data properly with header files.
To add files to the compiler, edit the makefile in your project's directory such that your additional source files are in the SRCS line. You need only put a space between each file name.
For example:
SRCS=main.c mymath.c draw.c bounder.c collision.c control.c msfs.c ZT/ZT_LOAD_MODEL.c ZT/ZT_TOOLS.c ZT/ZT_CD.c

This is why your typical Saturn project is divided between a great many source files.

27
Jo Map Editor / Re: Trouble with 3D Objects
« on: January 19, 2019, 01:44:33 am »
Make sure the stream knows XL2 is a hero! Last thing, the model converter (as far as I know) expects 32-bit TGA files. Using 24-bit should make your colors wrong. Maybe not!

I'm glad I could help. And thank you for sticking around.

28
Jo Map Editor / Re: Trouble with 3D Objects
« on: January 18, 2019, 10:44:51 am »
Have a look at this.

There is indeed something wrong with your ZTP. My best guess as to the problem there are two things:
1. You may have not formatted the TGA files correctly. Using Paint.NET works for me.
2. Many (>4) polygons sharing a vertice is gonna* (probably OK but it can) cause problems, especially if said polygons are to be triangles. Look at the top and bottom of the ball. The system does not know how to share two verts stacked in the center, shared between 32 triangles. As a rule, don't allow that to happen. When making models, always format them in Blender as quads. Sometimes you will need triangles to express a shape, that's OK.

In this file main.c below, focus on line 109. that's where you change the model.
In addition, the ZT model loading demo was causing extra problems because it was trying to animate that model, causing vomit.


I'm sorry you are having so much trouble. Also pay attention to the size of the model. In the demo i provided, it's scaled up by 14 and also you can move closer or farther from it with UP and DOWN.

29
Jo Map Editor / Re: Trouble with 3D Objects
« on: January 17, 2019, 10:15:45 am »
The header file and .ZTP file contain the same data. (Correction: The ZTP file has the textures in addition to the model and any encoded animation frames. The header file does not have any textures nor any animation frames)

The header file is simply a reference point for what is in the ZTP. The ZTP is a binary file which can be loaded into memory from CD, rather than being included all the time.
You only need the ZTP file. You should only use the ZTP file.

As far as step 3 goes, you need only put the .ZTP file into the CD directory somewhere. If you don't put it in any folders on the CD, you won't have to change directory. If you do, you will need to change directory before that file can be loaded.


30
Jo Map Editor / Re: Trouble with 3D Objects
« on: January 16, 2019, 04:53:23 am »
It's been awhile since I've needed to convert any model. The bug is with the # of frames processed. I don't remember exactly (to process a base model with 62 animated frames, i'd have to command it to process 65 frames, after duplicating the last frame, for a total of 64 files--it might just be i have a misplaced expectation; its ok). When/if you do get back around to it, I'm sure you'll either notice or not notice because I was just using it wrong. Anyway, thank you again for making that model converter. Any tools you share mean a lot.

And, definitely use it over the jo engine converter, for now.

Pages: 1 [2] 3 4 ... 15
SMF spam blocked by CleanTalk