0
0

Hi folks, not exactly on topic here, but it’s something from which FMOD could benefit:

I was looking through the source code of FMOD Ex, and wondering how certain things could be speed-up, and I came to the conclusion that manual hand-optimization would be a pain (we have two weeks to ship).

Looking for solutions ([hint..hint] mplayer.hq, dct_altivec.c) I found some, but then I found the page of VAST/AltiVec — http://www.crescentbaysoftware.com/vast_altivec.html

VAST/AltiVec is basically a C code rewriter, it takes a C written code, and transforms it using the AltiVec library. Now PS3, and even XBOX360 have compability header files for altivec calculations, so it might come handy using it.

One problem is that it’s bit expensive – $500 for the Mac OS X version, and … $3000 for the Windows NT (for Embedded programming).

We might get it, and might help us with optimizing some of the code running on the SPU.

Looks like a good software. They even have some examples like:

Original code:

extern float a[128], b[128], x;
void simple ()
{
int i;
for (i=0; i<64; i++)
a[i] = a[i]*x + b[i];
}
VAST output (intermediate C code):

/* Translated by Pacific-Sierra Research VAST-C AltiVec 7.4 H 14:45:33 8/ 8/00 /
/
Switches: -Valigned */
extern float a[128], b[128], x;
void simple( )
{
int j1, j2, j3, j4, j5, j6, j7;
int i;
{
{
vector float a1v, b1v, r1v;
vector float r2v = (vector float )(0);
vector float a4v, b4v;
vector float a3v, b3v;
vector float a2v, b2v;
*((float *)&r2v) = x;
r1v = vec_splat(r2v, 0);
for ( j1 = 0; j1 < (64 – 4 * 4) + 1; j1 += 4 * 4 )
{
j3 = j1 * sizeof(int );
j2 = j3 + 4 * sizeof(int );
a1v = vec_ld(j3, &a[0]);
b1v = vec_ld(j3, &b[0]);
a2v = vec_ld(j3 + 16, &a[0]);
b2v = vec_ld(j3 + 16, &b[0]);
a3v = vec_ld(j3 + 32, &a[0]);
b3v = vec_ld(j3 + 32, &b[0]);
a4v = vec_ld(j3 + 48, &a[0]);
b4v = vec_ld(j3 + 48, &b[0]);
a1v = vec_madd(r1v, a1v, b1v);
vec_st(a1v, j3, &a[0]);
a2v = vec_madd(r1v, a2v, b2v);
vec_st(a2v, j3 + 16, &a[0]);
a3v = vec_madd(r1v, a3v, b3v);
vec_st(a3v, j3 + 32, &a[0]);
a4v = vec_madd(r1v, a4v, b4v);
vec_st(a4v, j3 + 48, &a[0]);
}
}
}
}

And the previous code would compile fine on CBEA (IBM Cell chip machine) which is also the PS3. Should be also fine on the Xenon.

  • You must to post comments
0
0

the 360 is already totally optimized in VMX, and we will be doing our own optimizations to the ps3 soon

  • You must to post comments
Showing 1 result
Your Answer

Please first to submit.