Resynthesis/Physical Modelling Synth

webbunny · Post by **webbunny** » Sun Mar 28, 2004 8:14 am

Hi all - first post, be gentle

I have noticed that on the SPF platform everyone seems to have concentrated on VA subtractive synthesis. Which is nice and all that, but theres a bunch of other ways of making sounds.

I have had an idea for a synth, that I was keeping to myself

but now Cameleon 5000 is out and does some similar stuff, it's not so unique any more!

I thought of combining a playback only resynthesis engine with the 'Driver-Modifier' design of a physical modelling synth like the Korg Z1. The resynth engine would be the driver, and then various modifiers engines (Waveguides, vocoder style filterbanks, Soundboard emulations, VA Low/High/Band pass filters etc) could be bolted in after it.

The resynth data would be held as 'samples' of a sort - I got this part of the idea from the way MP3 files are stored, and how they can be played back at higher speeds with no pitch change (up to a point, see later).

Just like MP3, the original PCM audio could be split into 128 sample 'blocks' and then treated to a Fourier Transform to turn it into 'frequency domain' data. But unlike MP3, that data wouldn't then have all that preceptual coding stuff done to it, which is that part that throws away information and make MP3 'lossy'.

That data would then be stored in a table. When played back, it would need to be passed thru an inverse fourier to turn it back into PCM.

So, what's the point of all this extra aggro? Well, since the wave data is in the frequency domain, it makes it fairly easy to pitch shift it BEFORE it passes thru the iFFT, so you get a different pitch even though there is no change in playback speed, so no need for multisamples....

At least, apart from the problem that the formant parts of the sound would shift too, so if it was a choir sound, they'd sound like munchkins! So you'd need 2 frequency domain sample tables, one for a high note, and one for a low note, and as you played back higher and higher notes, you'd crossfade the 2 sets of frequency data BEFORE it goes to the iFFT - which would sound like a morph, with no multisample jumps.

Add 2 more sample tables, one for the low note played hard, one of the high note played hard, and you could do that same trick for velocity - the whole keyboard at all velocties covered by just 4 samples!

There are all sorts of other little details to this idea that I have had - and each one would take up as much space as I've already used up! If anyone is interested, drop me a line.

Andy

Jngaelin · Post by **Jngaelin** » Sun Mar 28, 2004 3:30 pm

I love what im reading here

scary808 · Post by **scary808** » Sun Mar 28, 2004 5:14 pm

Have you checked out the STS5000? The realtime time stretch, pitch shift could be what your after. It also has cool formant shifting too! Also, check out flexor. If the exotic is what you're after, then Flexor is for you!

webbunny · Post by **webbunny** » Sun Mar 28, 2004 5:39 pm

Hi Scary,

Yup, STS5000 and Rolands Variphrase stuff sounds similar, but I'm sure its not transforming the audio into the frequency domain. But your spot on with the effect I think this would achieve.

One of the (extra) ideas I had was passing the frequency domain data as a stream of data to drive other parts of the synth (without turning back into PCM), life parts of the Modifiers section. As an example, if you had 2 of these streams, you could implement a vocoder very simply as a bank of scaling units/amplifiers (!), because the 'spectrum analysis' and 'filter back' parts of a vocoder have already been done 'offline'. Doesn't have to do you a vocoder, I'm sure theres plenty of other 'Modifier' processes that would work on frequency domain data.

Another nuts idea was to store the Attack, Decay, Sustain and release parts of the frequency data as seperate waves, so you could 'mix and match' them, have a 'slot' for each type of wave (A,D,S,R) in the oscillator and you choose which waves from each type you put in each slot. It would need crossfading between stages to smooth things out (if you WANTED it smooth, of course!). And of course, theres not reason why the wave data for the highest and lowest keys, or velocity, have to be the SAME kind of sounds... Mucho morphing

And the last idea banging around is to normalise the original audio (so it's all the same amplitude), so you are just storing the changes in harmonics, and then follow it with a multi stage amplitude envelope....

Phew!

I sort of figured this part wouldn't use a lot of DSP to turn the data back into PCM, because the code needed is basically a third of the code of an MP3 player, and there must be plenty of optimised MP3 playback assembler for SHARC DSPs.

Not thought of what would go into the Modifiers part, apart from standard filters, waveguides, that vocoder idea, and convolution. Convolution would be a problem because SPF cards have little onboard RAM, unless the impulse response was computed on the fly by some algorithm instead of stored - you could then provide editable parameters for it.

Mr Bowden, Mr Hummel, Mr Celmo, are you reading this stuff? This is good shit

Andy

<font size=-1>[ This Message was edited by: webbunny on 2004-03-28 17:50 ]</font>

<font size=-1>[ This Message was edited by: webbunny on 2004-03-28 17:51 ]</font>

scary808 · Post by **scary808** » Sun Mar 28, 2004 5:45 pm

That's some extreme shizzle(or is it ... shizzel)! Me like!

kensuguro · Post by **kensuguro** » Sun Apr 04, 2004 6:33 pm

it's been talked about before.. err, I was pushing really hard for a FFT synth with frequency domain data mangling capabilities. Both STS 5000 and rolan's variphrase use fft resynthesis. Namely, that's the only way to accomplish pitch stretching. (unless you opt for Acid's granular path) Sampletank 2's "stretch" uses it, kontakt also uses it. It's becoming quite standard these days I think. Problem is, most all these samplers use FFT for pitch stretch, time stretch, and formant control. Basically what a phasevocoder is good at, but none of them offer cross modulation of analysis data.

It's all kind of hard tho, because of FFT's internal time resolution versus frequency resolution trade off. Kontakt seems to work around it in a smart way by combining recycle style chopping for transients and FFT for sustained sounds. (drums versus vocals, etc.)

This is all just step one tho. To make good use of FFT analysis data, you'd really need to load 2 or more, and do arithmatic operations in between them. (fades, convolution, etc.) But that's into the future as realtime convolution itself is almost too heavy for the most modern computers. (3-4 instances? means only 3-4 voices per CPU!) Needs lots of ram and lots of CPU.