Thanks

I hope this information will be useful form someone!
I have done parallel programming with POSIX threads and always wondered how that was done on the Saturn.
I have read in several places that the bus being shared was a big disadvantage because the ram can not be accessed at the same time, however this is the case for most architectures, that's why there are several levels of CPU cache to deal with concurrent access and speed things up.
The Saturn is not entirely crippled in this regard, its not like one CPU really has to wait for the other all the time because both of them have 4KB of local cache.
After researching the documentation and the forums for the correct way to use the slave CPU I decided to make a simple test program that had concurrent access to the ram (although not the same addresses).
By splitting an array in two and process a half on each CPU we have a test case for this scenario. This is a very academical example that might not demonstrate real world usage, there is also the question of accuracy of the emulator on emulating shared bus access and the reliability of using clock ticks to measure time. However if we can thrust these measurements the results seem very positive, in the test program using the two CPUs resulted in practically taking half the time to do the same thing! I will burn a CD this weekend and I'm hopping the same thing happens on a real Saturn.
It is also important to mention that even if this proves to be true that does not mean it is easy to parallelise everything, not all algorithms can be parallelised and synchronizing both CPU's makes a big impact on performance, if it is necessary to constantly access the same variable by both CPUS we will have to do a lot of busy waiting, burning CPU time and wasting bus bandwidth.
The parallelization should be done in a way that both CPUs do most of the work on their local memory only having to synchronize the work on the end, basically following the Fork-Join model.
I'm sorry guys, this post is getting really long

, I'm going to mention just a couple more things I forgot on my firts post:
- My test program works fine on SSF but not on Yabause, I'm not sure if it has to do with Yabause not emulating some things correctly but I will verify this on a real Saturn soon.
- There are some other limitations you should take into account while using the slave CPU, I will just copy paste the part that is most relevant about this from the SGL FAQ text file where I got it from:
2-5 Cautions When Using the slSlaveFunc Function
Make sure that the functions executed by the slSlaveFunc do not overwrite the
master CPU's variables. If the variables need to be rewritten, purge the
cache on the master CPU side. In addition, do not execute functions that
issue functions to the slave CPU (such as those related to sprite control).
There are other interesting stuff in there so I put the text file in attachment if you want to check it out

.