psychoacoustic surveys

I need to collect data on various auditory synthesis systems. Some of this is relative quality/intelligibility assessments (e.g., to evaluate the merits of different synthesis techniques and algorithms); other aspects are more "absolute" (i.e., to evaluate overall accuracy).

The survey will be distributed electronically and results collected "remotely" -- at the subject's convenience.

It's important that the conditions under which the surveys are completed are "controlled" (e.g., the order and number of times which a sample is auditioned). But, as the subject is not *completely* observable while taking the survey (i.e., there are only some things that you can monitor without "eyes-on"), there is no way for me to know that the test conditions are being satisfied (e.g., the samples must be auditioned with headphones/earbuds, not open-air "speakers"; the quality of the transducers, of course, has some significance; the nature and extent of background sounds; etc.).

Do I:

- trust participants to do-the-right-thing?

- trust, but verify (how?)

- exercise more active control over the survey environment?

Of course, the goal is to make it easy to collect lots of data (subjects) -- but, more importantly, lots of GOOD/USEFUL data.

[*So* much easier to solve problems that have closed-form solutions! These "squishy" problems always leave you wondering about the quality of your design/solution! :< ]

Thx!

Reply to
Don Y
Loading thread data ...

My only advice is to have the raw date available with an easy way to identify and sort on the identify of the source of the data. So if some of the data seems " bad " anyone can look at the data with and without the suspicious data.

Dan

Reply to
dcaster

Wouldn't it be nice if climate "scientists" did this?

Reply to
krw

(e.g., the samples must be auditioned with

There is HUGE variability in headphones. Some, even cheap ones, give VERY good audio fidelity, some, even expensive ones, sound like they have sand in the voice coil. So, I think that alone could totally scramble your results.

Jon

Reply to
Jon Elson

You can try to eliminate or at least minimize the hardware/environmental noise variables by including test samples that will give predictable result s.

If this is the kind of survey I think it is, include things like animal s ounds or musical clips all participants can be expected to be familiar with as baseline samples, kind of like the "is your full name ___" questions gi ven by lie-detector specialists before the real questions begin.

Mark L. Fergerson

Reply to
Alien8752

Yes, and I also track how *quickly* each response is issued. I.e., how much thought/consideration went into the response. In some cases, "more is better". In other cases, more means there is less confidence in the result.

I'm mainly looking at how critical I should be of the folks who are (hopefully) cooperating with me; my goal being to have good data even if that means disqualifying participants after-the-fact.

One way of doing this would be to make the survey *highly* active (e.g., enable -- with participants' consent -- their web cams while they are taking the survey) AS IF the respondent was participating "in the same room" as the querent.

Reply to
Don Y

Just to toss some muck in your waters...

- Would it not be better to measure the results, *given* the wide variation in delivery methods/systems (good/bad headphones, plugs, speakers)? That is: should the algorithm be designed to sound good through /all of these/ options? Or are you only interested in one or a few options?

- To cover the wide variation, you'll probably need a much wider survey than you were initially hoping. Larger still to address lazy and lying data (which hopefully can be sorted out in much the same way CAPTCHA works -- most people give honest, if lazy, responses; discard the outliers).

- If this is being studied for a particular application (say you were doing market research for an MP3 player that comes with its own set of headphones), then shouldn't you be running the test under much more controlled conditions (i.e., with *that particular set* of headphones)?

- Of course, a more controlled survey will also be more expensive. It doesn't need to be as big, which helps. But it may need to be big anyway, to account for variations like my first point, if you intend to support that.

- Has this been done before -- are you just repeating work, that others have done, to much greater completeness than your intended survey? Have you read the journal articles on this subject? It's a well-studied field (I think?), being of commercial importance for decades. Could you not simply license a modern codec that does basically what you're looking to do?

You didn't mention specifics, so this is all speculative -- just fishing to see if any options give you some ideas.

Tim

--
Seven Transistor Labs, LLC 
Electrical Engineering Consultation and Contract Design 
 Click to see the full signature
Reply to
Tim Williams

Also, if you're set on doing this, then, how much experience do you have with conducting surveys? It might pay to contact (or consult) some psycologists who've done human studies, get their insight into experimental design.

(Lord knows sci.electronics.design is the last place to expect to find useful information in that field. :^) )

Tim

--
Seven Transistor Labs, LLC 
Electrical Engineering Consultation and Contract Design 
 Click to see the full signature
Reply to
Tim Williams

We've been doing it for years (decades?) with product concepts and sample implementations -- but, by being *in* the room with the subjects. So, we could control the environment, the tests, etc.

E.g., adjust the stiffness in buttons to see how users react to them, rearrange prompts, change keystroke sequences, the placement of controls, etc.

But, I want to canvas a much larger population than has been done in the past. Which means either "going on the road" (ick!) or having folks come here (even ickier!).

Given that the issues I want to survey can be "made portable" (e.g., no "button stiffness" tests), electronic distribution seems natural. Whether that's a remote connection to a server (that conducts and records the survey results) *or* an application that runs on a respondent's computer. The software plays the role of "watching" that we'd previously play "in person".

But, the software can't watch everything that meatware can!

I'm using SED as a source of *opinions* on "human nature", not on the experiments being proposed.

E.g., the example I mentioned elsewhere this thread re: being able to see the entire survey/questionnaire prior to beginning it is one of my personal pet peeves. I want to see "where is this going" before I commit to any responses.

By way of practical example, I had to do some PT for an injury a few years ago. The survey was one of these "question-at-a-time" deals (conducted on an iPad). As I couldn't see the upcoming questions, my answers to the earliest ones left me very little "dynamic range" for the subsequent ones.

As a result, I consciously changed the scale that I used when answering the subsequent questions which effectively invalidated the earlier ones. The alternative would have been to render the latter responses "in the noise floor".

At the end of the PT, I was once again presented with the same survey (though I didn't expect to encounter it). Knowing what to expect, I retained the "new" scale in answering those questions. To anyone analyzing the "Before" and "After", the obvious conclusion would be that the PT had made matters *worse*!

(Sorry, it's not my job to fix YOUR survey technique; deal with THOSE results!)

Reply to
Don Y

The example I presented was just something that I assumed would be easy for folks to relate (ick, grammar). I.e., you could *imagine* how the choice of speakers/headphones, their placement, SPL, etc. could alter the results. And, how hard it would be for the querent to verify those things, "remotely".

The actual subjects in which I am interested are much more easily controlled:

- which is redder?

- which is higher pitch?

- which looks darker? etc.

I can collect ancillary data that can potentially help disqualify certain results (e.g., if question 1 was answered at 1:00PM and question 2-5 were answered at 9:00PM, then the correlation between them might be suspect).

But, I still have to rely on aspects of the experiment(s) that I can't easily control *or* observe (unless I resort to webcam surveillance).

The advantage I have is that these aren't just "random" respondents but, rather, folks who have a genuine interest in participating. However, that can backfire as they can err on the side of being *too* helpful and effectively injecting opinion instead of actual data.

[Again, you can typically see this if you are working with subjects in the room with you at the time. Hard for a piece of software to come to that conclusion; or, collect data that would allow me to come to that conclusion /post factum/]

Again, that's the "in the room with them" (and The Prototype) approach. I can't ask questions that require results like that.

OTOH, I can flash an image on the screen (for a controlled period of time) and ask them to tell me how many "wombats" they saw. Then, repeat the exercise (several) times to see when they can't notice a change in number (i.e., can they differentiate 10 from 12? 20 from 22? 50 from 52??) Note that this doesn't rely on the resolution of their monitor, frame rate, or whether or not it is color calibrated, etc.

[OTOH, taking the survey on a cell phone would undoubtedly give different results owing to the smallness of the screen]

I've been looking at The Literature (in various fields) regarding use of The Internet for (technical, not "opinion") surveys. Surprisingly, most samples are small.

I can round up ~100 colleagues to get a respectable sample size. But, they will all share the characteristic of being highly technical people. They will tend to THINK about their answers more than might be appropriate.

E.g., when I went for my DL examination, they gave me a color-perception test: a circle with a "pointillism" representation of a "digit". "What number [sic] do you see?" I couldn't answer -- and the woman official was ready to rule me as color-blind. My problem was that I saw TOO MUCH!

Anticipating her displeasure, I blurted out: "I see a red 2, a blue 3, a yellow 4, a green 5... What do you WANT me to see??" [All were present in the "spots" but I couldn't CASUALLY examine the picture and come to the "obvious" conclusion!]

See -- though the DMV uses *one* plate that has "multiple potential images". The theroy being that if you have a deficiency with one particular color, you will see a different result than someone who can perceive all colors.

Obviously, when you *use* something, you won't be spending lots of time STUDYING some aspect of its presentation. So, the results I seek are those of more *casual* users/observers -- folks that aren't going to be thinking about what they are doing while they are taking the survey.

Casually reading: PARIS IN THE THE SPRING gives a different result than being "on alert" for something "scwewwy"!

See above.

I have specific ideas as to what questions I would ask and how I would present them IF I COULD DIRECTLY OBSERVE THE RESPONDENTS. The issue is how to address the "lack of visibility" and the potential distortions that respondents could introduce to the dataset even if trying to be cooperative!

I figure I can create a "hearing survey" to give me practice on creating the *real* surveys of interest as much of the questions I'd (intentionally!) ask on a hearing survey would be ones whose "correct" responses I could anticipate (based on common sense and other past reference work). I would NOT want to "experiment" with a REAL survey as I wouldn't be able to ask the same folks to respond to version 2 of said survey without their memory of version 1 "coloring" their responses.

[I.e., the "hearing survey" is the "make one to throw away" exercise :> ]
Reply to
Don Y

I'm directly collecting the (raw) data (well, my *software* is).

But, I (the software) can only collect *certain* data. E.g., to use the "hearing" example, I can't tell if you're wearing headphones or using speakers, if you're in a quiet room or a noisey environment, if you took longer to answer question 3 than question 2 (based on the timestamps of your responses) because you were thinking harder about your 3rd answer OR because you ran outside to shovel the driveway and your pinnae are now uncomfortably cold, etc.

OTOH, if I suspect you may want to review the "question" (i.e., sound sample) multiple times, I *could* design the survey to track the number of times you clicked "PLAY" for each question. So, if you reviewed the sample for question #3 *10* times but only auditioned question 2's sample ONCE, then I might place LESS emphasis on your response to #3 (clearly?, you had a hard time coming to your eventual conclusion).

"In person", you can be more mindful of what the respondent is doing in each case.

E.g., when we would "test" video games (arcade pieces) "in the wild", we would surreptitiously watch the players to see what "wowwed" them, whether they were feeding quarters into the machine as soon as each game ended (or, whether they walked away and the next quarter came from a new player -- was he busy WAITING for the previous player to move away? Or, did he just *stumble* on the game?)

[If you set out a confection at a dinner party and find it "gone"/consumed at the end of the night, you want to know if many people just *tried* it OR if folks kept coming back for "seconds" until it was gone! If you only look at the state of the plate/bowl at the end of the evening, you lose a lot of information!]
Reply to
Don Y

So would you be okay with having the test conditions be "whatever", as long as you /knew what they were/? That being the tricky part, because no observers?

In the case of something like an iPhone app, there's a lot of data that can be accessed -- given that the user permits access to those services.

Accelerometer shows orientation; if there's a magnetometer, it might show bearing as well; GPS shows location; clock shows time; camera shows a snippet of the ambient environment, or the user, or whatever; if the app were, say, "login with Facebook" enabled, there could be further personal information available through a FB app too; etc.

(And of course there are ways to help out, like making the relatively invasive process a little more open, say by showing a preview of what the captured image looks like, which could be after applying a generous blur if you just want ambient lighting, say. (And hopefully the cam includes exposure settings in the metadata.) But, that of course only works on the assumption that the app is collecting this *heap* of data and transmitting only what it says it does, and that it will actually be used responsibly, blah blah blah, which isn't always a good assumption for the user to make.)

But even with that, it sounds like you'd have a hard time reaching useful conclusions..?

OTOOH, you'd get data on the range of spread over all those devices and users -- if a device has too little fidelity (or too much size, for that matter!), people aren't going to be able to recognize as many, and so you shouldn't design apps that depend upon people being able to resolve more than so-and-so for, say, a video game, or an interactive web page.

I suppose that's the holism view. Reductionism would naively ask: Suppose you have a display with exactly 384,000 pixels, or 2,073,600 pixels, or whatever. How many shapes can it display? Thousands, nearly millions. Then you'd ask, how many shapes can a viewer recognize? Uh...

It's an incommensurate sort of measure: the function shapes_people_can_view(pixels) needn't have a straightforward definition, even if you expand it for many parameters: aspect ratio, screen size/DPI, viewing distance, user eyesight...

Meanwhile, the holist would simply measure the stack of user_with_display. Dirty measurements aren't wrong, they just have to be representative of the ultimate goal.

If the goal itself is undefined and you need to have the reductionist measurement to go forward with any particular goal (or to direct its choice), yeah, that can be hard... see step 1, surveys are messy...

Ahh, hmm. Interesting...

There are certainly a vast number of surveys on the internet... but they exist largely to generate clicks, so tend to produce better marketing data than psychological or technical data. (If you're using a service, and it's free: it's not free, /you're/ the product...)

Which, is a good example of the X-Y problem you talk about below. Not so much as a problem, but as a direct application: you're quizzing the user on X, but you're really selling advertizement Y.

Yeah, I make a bad test subject because I'm too observant.

Supposedly, very few people notice things out of place, say at the grocery store. I notice errant items all the time...

In any case, good luck!

Tim

--
Seven Transistor Labs, LLC 
Electrical Engineering Consultation and Contract Design 
 Click to see the full signature
Reply to
Tim Williams

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.