Charlie1 wrote:Hi Ceilidh,
Wow - that's certainly a good attempt at explaining the science behind source first. I don't think I'll comment on the detail if that's OK as its a bit over my head. I know Fredrik has commented before along the lines that the 'type' of distortion is important - i.e. some types effect the music more than others. Don't know if that fits in with your comments, but thought I'd mention it anyway.
Cheers for now....
Hello Charlie!
Thanks for the kind words. :D But just to be clear -- my Source First musings were just conjecture, as I don't work in this area and I don't know the underlying science!
Anyway, if you have an interest in these things, here are some data points from related areas I feel more comfortable in:
1)
Sensing the World
For a while I worked in robotics, and anyone who's spent time in robotics R&D will emphatically avow that machines will not be taking over the earth anytime soon -- the poor things are deaf, blind, and numb as all get out ("all get out" = Canadian expression meaning "really, really, really" deaf, blind, and numb) when they venture out into the cold cruel world. That is, if you work with robots and machine sensors, you quickly develop an enormous respect for biological organisms and their ability to sense and respond to surrounding stimuli. It's not that robotic sensors aren't sensitive. We use sonars, radars, magnetometers, accelerometers, laser rangefinders, etc., etc., and we can hook up photo-imaging sensors that out resolve the human eye. But the problem is that the data processing we can do on a machine is nowhere near what appears to be going on in a human brain (or in a chipmunk or bird brain -- heck, there were times we wished we could do as well as a cockroach (and our researchers were the best of the best from MIT)). That's not to say that we can't get a machine to outperform living organisms in certain very specific, very particular tasks -- but we can't do generalized things that organisms do every day. For example, we can easily construct a bar code scanner that reads product attributes off a series of black and white lines, and we can get that scanner to work in a supermarket checkout line with dizzying speed and accuracy -- but we can't get a robot to consistently recognize a horse as a horse (or a rose as a rose, or a car as a car, etc.) in a variety of different situations. Organisms can do such recognition -- very rapidly, with a high degree of precision and reliability in a staggering host of changing conditions -- through a process that is not merely pattern-recognition, but which instead seems to be some sort of active pattern-reconstruction, one by which subtle fragmentary clues are rapidly woven into a model of the surrounding world (that's why you can sometimes stare at something hidden or camouflaged without having a clue what it is -- and suddenly it snaps into obvious view ("Oh! It's a horse in a thicket!...) and you find yourself wondering how you could have ever failed to see it). It's really quite extraordinary.
It's for the above reasons that I think it quite possible for very minute, barely-measurable differences in audio output to have noticeable effects on perceived performance. When we listen to a high end audio system, our ears and brains are trying to weave a sonic model of our surroundings from a myriad of subtle cues. If those cues are misleading, the model comes out wrong. And if the cues are not internally self-consistent, then the brain decides it's sensing nonsense, and the model collapses and disappears (just as a well-camouflaged object at some point simply "disappears" from view).
2)
Rejecting Extraneous Data
In a very related vein: however the sensory-data-processing engines work in our brain, those engines must be extremely proficient at rejecting enormous quantities of "extraneous" data. Consider the "horse recognition" scenario alluded to above: a tan horse is darker than dry grass, but lighter than bare earth; its outlines shift with its orientation, its mane and tail ruffle in the wind, its legs can disappear behind vegetation -- and yet we can watch a distant horse walk across a windswept hillside, with dust and leaves and grass blowing and billowing behind, around, and in front of it, without it ever becoming less than obvious that it's a horse on a windswept hillside. Our brains are so proficient at dealing with this sort of scenario that we can't easily imagine what could be so difficult about it -- but try to get a machine to deal with it, and you'll quickly see that most of what you can "measure" in such a situation has nothing to do with the walking horse. That is, your sensors will pick up enormous changes in ambient lighting, in reflectance, in contrast, in colour, in small scale shimmer, in motion, in speed, etc., etc. -- most of it having to deal with the wind and the changing perspective. The signal that makes up the "horse" is by contrast much more subtle -- and yet our brains can easily deal with it.
It's for this reason that I'm a little cautious when someone says if the "distortion" in one part of the audio chain is much more than in another, then we should concentrate the bulk of our attention on the part that's distorting most. That's an excellent general approach, and if the
type of distortion remains consistent at all points in the audio chain, it's arguably the only defensible one. But it breaks down if different types of distortion appear in different points in the audio chain, and in particular if the gross distortions are ones the brain is trained to ignore. Going back to the horse example: if a cloud sweeps across the sun while I'm viewing the distant horse, the passing shadow can darken the scene by 3 or 4 camera stops (i.e., by over 90%), the colours will suddenly become "bluer", and my contrast can drop precipitously -- measurably, I'll have a huge "distortion" in my visual "signal", but I'll still easily recognize the horse. But if the horse passes through an open thicket, there will be times when I can see enough of the horse to recognize it, and other times when it seems to vanish -- and yet the measurable distinctions between visible-horse-in-open-thicket vs. vanished-horse-in-open-thicket can be extremely, extremely small. Hence if an audio component produces enormous distortion akin to shifting colours and diminished contrast, I might reasonably complain, but I can still see my horse; but if a different audio component produces a much more subtle distortion akin to branches that sometimes look like branches and sometimes a little like horses' legs, then my horse will appear and disappear from view, and at some point I might lose sight of it entirely. Hence, because our brains are trained to ignore a lot of distortions while constructing models of the world, the type of distortion can matter much more than the magnitude, and simple distortion measurements can entirely miss the picture.
3)
An Automotive Digression
It can seem (to some people) utterly preposterous that some distortions can be worse than others (sadly, these are often technical engineer types, who growl "Distortion is Distortion" before shuffling back to their cubicles). So rather than a thought analogy, here's a (non-audio) example that most of us experience every day:
I've done some work with automotive suspensions, and one of the first things you learn with passenger vehicle suspensions is that you'd ideally like to keep the ride motions vertical (i.e., without pitching motions -- which is a bit of a trick considering that the front wheels hit a bump before the rear wheels), with a bounce frequency somewhere in the area of roughly one to 1.5 hertz (with the faster bounce frequencies for the "sportier" cars). If you can do that, the passengers will be comfortable; if you deviate very far from the ideal, they'll complain.
Now, next time you drive near a big Mercedes or Lexus sedan / saloon at high speeds on an indifferently-paved highway / motorway, take a look at how it moves up and down on its suspension -- if you've never paid heed to such things before, you may be surprised at just how large are the motions and how almost violent are the apparent accelerations. But if you ride in such a car (something yours truly doesn't get to do very often at all!!), the journey will seem quite peaceful and controlled. That's because a big Mercedes is tuned for minimal pitch at high speeds, with close to ideal vertical bouncing in the ideal frequency range. (i.e., it's not that the accelerations aren't substantial -- if you instrument such a car, you'll record quite large sensor readings -- but rather that the motions in a big Mercedes lack the fore-aft pitching and side-to-side rocking that usually unsettles lesser cars, and the vertical bounce approaches a true sinusoid.)
The question then becomes "Why do we not particularly notice vertical sinusoidal bounce at ~1-1.5 hertz?". Well, the answer has nothing to do with physics, and almost everything to do with evolutionary biology: when we walk, our heads rise and fall (a surprising amount) in a roughly-sinusoidal vertical motion at about 1 to 1.5 hertz, and we've evolved to be not bothered by it. A bounce much slower than 1 hertz induces nausea in many people (think of the slow rise and heave a ship wallowing in a storm), while bouncing above ~2 hertz feels like unpleasant shaking -- and if our evolutionary ancestors felt shaken or nauseous every time they walked anywhere, then they wouldn't have survived very long in a harsh and dangerous environment!
In short, there are many big "signals" from the world around us that we're hard-wired to ignore. In the case of a big Mercedes, you can bomb along at high speeds, bouncing quite strongly at ~1.5 hertz, and be so unaware of the bounce that you can easily detect the minute vibrational signal of a slightly out-of-balance tire. In the audio world, well, is there an equivalent? :D
4)
Wrap up
This has turned out to be a very long letter to Charlie1 (!), but to sum up: from my non-audio experience, I think it very plausible for some types of distortion to matter much more than others, if we're talking about something as related to human perception as "musicality". If we're talking waveform analysis, then NO: if all a person cares about is whether the waveform that goes into the source component closely resembles what ultimately comes out of the speakers, then measurable distortion should be what really matters, and one can spend time fussing with loudspeakers, equalizers, and room treatments (which are said to deal with the most measurable distortion). But if we're not talking about waveform analysis, but instead about the subtle cues that allow the human brain to reconstruct the musicality and spirit of a musical performance
despite all the gross changes in a waveform as it passes down the audio chain -- cues that we're hard-wired to listen for, as opposed to those we've been evolutionarily "designed" to discount -- then we might well want to spend time and attention on components that already "measure" very, very well.
But again, this is all conjecture. :D
Have a good Columbus Day, Charlie1!
-C