The VDP1 limit is pretty well known, but there are ways to speed it up : use VDP2 CRAM, avoid overdraw, use lower quality textures, avoid some effects, do preclipping, etc.
For the difference, SGL is just poorly designed for memory counsumption. 4 bytes vertices (16.16 Fixed) is fine I guess (vs 2 bytes - 8.8 Fixed - for Slavedriver), but the quads are wasting way too much memory. One quad in SGL (using realtime light) is taking, without considering the vertices, 44 bytes. If you add an average of 2 vertices per quads (since some vertices are reused), that's 68 bytes.
There is nothing to be done, it's just how it is with SGL.
For lightning, again that's how Slavedriver works : they use 8 bytes for vertices, 2 for each axis (x,y,z) and 2 for the color.
You can do the same thing (static light) with SGL and gouraud shading without major issues simply by re-using the same gouraud adresses. For dynamic light, you need another vector (12 bytes). Even if you have the normals, you still need that extra vector as Sega found out their method had issues, so they just added something else on top of everything else.
Duke Nukem 3D is using 16 CLUT, not 16 bits RGB.
For mipmap, no, I can't use the VDP2 RAM. I really want to use the CRAM, but you only have a maximum of 2048 colors, but if you use 16 colors per sprite, you still need to have these 16 colors in the same area. When you have 500 sprites, it's nearly impossible. Using 256 colors is possible of course, but then you double the VRAM counsumption.
Gouraud shading can be used with VDP2 Color bank as proved by the "Chrome" demo, but Sega writes in its documentation that they can't guarantee good results.
Per texture light is great, but then again you need to store many more sprites, which isn't easy.
For the PVS, I haven't seen much code online, but unless someone else here codes something first, I'll have to do it myself.
For the cd functions, the problem is that duplicating work isn't efficient, we all have a job, girlfriends, and all, so we can't spend as much time as we'd like on these projects.
For audio RAM, for PCM, I never saw an example, but maybe some homebrew emulator do it, I really don't know.