Liksområdgiving – denne gang: Nordea

Under den fristende tittelen “Finn din balanse i økonomien” har Nordea hatt en kampanje greie gående nå i tja… 6 måneder kanskje? (Oppdatering: 04.07.2012 – cirka 2 år har denne ligget på nordea.no nå). Dere har kanskje sett den selv:

Nordeas “Finn balanse i din økonomi” fra nordea.no

Her ser det jo ut til at Nordea har klart å lage noe spennende som gjør at man kan få oversikt og… balanse… i økonomien sin. Dette må man jo bare prøve!!

Så jeg setter freidig i gang med å svare på en rekke spørsmål det første de spør om er alder. Ok, det er en viktig faktor for å si noe om folks økonomi:

Spørsmål nr. 1: alder

Fint design. Litt kronglete å treffe nøyaktig riktig alder med en slik slider, men jeg er jo en gamer så jeg klarer å treffe nøyaktig pixel for å få opp min alder. (Et inputfelt hadde kanskje vært like raskt?).

Jeg går videre og blir spurt om antall voksne i husstanden. Det står ikke hva grensen for voksen er, men det er ikke så vanskelig for min del – kanskje verre for de som har barn på 17-18 år boende hjemme…

Spørsmål 2: Antall voksne i husstanden

Ok, så langt så vel. Begynner å lure på om denne øvelsen er veldig lang. Står på fanen som heter “generelt” og det er 3 faner til. Som interaksjonsdesigner lurer jeg på om jeg kanskje hadde droppet stockfotos i toppen og brukt en annen måte å vise fremgang enn faner og prikker, men greit nok, dette er sikkert Nordeas favoritt reklamebyrå som har laget så jeg skal ikke være så nøye på det. Jeg kommer til å finne ut nøyaktig hvordan det står til med økonomien min og da tåler jeg litt smågrus i maskineriet!

Neste spørsmål: antall barn. Deretter kommer “Husstandens månedlige nettoinntekt”. Hmm, den er litt vanskelig – jeg vet min egen nettoinntekt sånn cirka, men egentlig bare årsinntekten til min samboer. Får ta en sjanse.

Spørsmål 4: Husstandens månedlige nettoinntekt

Ok, nok en gang en slider hvor 1mm = 1000 kroner. Jeg har superfin musmotorikk så dette går fint. Håper ikke at svigerfar som er Nordeakunde sliter seg gjennom denne gitt.

Spørsmål 5: Hva er din nåværende boligsituasjon? Selveier eller Aksje-/andelsleilighet eller Leier. Krysser av for Selveier.

Spørsmål 6: Vurderer du å kjøpe eller bytte bolig? Ok, hva har dette med balanse i økonomien å gjøre?

Spørsmål 7: Eier du en ekstra bolig eller fritidsbolig? Ja / Vurderer / Nei. Vurderer? er det et viktig svaralternativ eller er dette bare svar som går rett til selgerne i Nordea?

Nå begynner jeg å kjede meg… dette er bare spørsmål på spørsmål med bokser å krysse av i. Dette bør være jævlig bra.

Spørsmål 8: Husstandens totale boliglån? Nok en slider med millimeterpresisjon. Alle med mer enn 2 mill. i samlet boliglån maksimerer denne slideren…

Spørsmål 9: Husstandens totale sparekapital (sparing, investering og egen pensjonssparing)? En slider som begynner på 0 og ender på 2 mill. +. Egentlig et utrolig vanskelig spørsmål. Skal jeg sjekke min samboers pensjonssparing og hennes sparekonti pluss min egen og barnas… hmm. Dette blir bare tipping.

Spørsmål 10: Hvor mye sparer du eller ønsker du å spare hver måned? En slider som går fra 0 til 20.000+ i mnd. Jøss, jeg som føler meg rik som klarer å sette av 500 i måneden. Jeg flytter slideren 2 millimeter fra bunnen og føler meg fattig.

Nå er vi over i fanen “preferanser” – spennende. Hva betyr egentlig det?

Spørsmål 11: Jeg finner alltid de ressursene jeg trenger for å mestre enhver utfordring.

Kult – se stockfotoene har byttet:

Stockfoto bonanza – sjekk stilig mobiltelefonringing – er det en iphone eller!?

Spørsmål *gjesp* 12: Jeg tar gjerne betydelig risiko for å oppnå mine mål i livet. BAM der kom den liksom – uventet spørsmål som rokker ved mine grunnvoller. Er det slik at jeg tar for lite risiko i livet? Føler meg feig og klikker “SVÆRT uenig” – selv om jeg gjerne spiller poker på nettet og sykler uten hjelm!

WOW, der var jeg ferdig – nå er jeg SKIKKELIG spent på hvilke råd Nordea kan gi meg slik at jeg kan få BALANSE i mitt økonomiske liv!!! Trykker spent på “fortsett”-knappen for å få resultatet:

WTF!? Er dette alt? Hvor er balansen i dette?

Skuffende. Jeg er både trist og lei nå. Jeg har svart på 12 spørsmål og brukt mye av min verdifulle tid på dette og alt dere gir meg er utskrift av en brosjyreside og en link til “print” og “la oss ringe deg”. Vel, det kan du gi faen i Nordea!

Jeg har gitt dere mye informasjon om min økonomi og om jeg er en risikotaker og det hele og dere bare sier at jeg skal ringe dere.

Telefonnummeret står på forsiden deres – kunne ha spart meg 15 minutter hvis jeg hadde visst at dette var resultatet.

Dette er et perfekt eksempel på “Liksområdgiving”. Her er kjennetegnene:

  • Fancy grensesnitt, gjerne med slidere.
  • Spørsmål som er opplagte leads til markedsavdelingen ala “ønsker du å selge leiligheten innen 3 mnd.”
  • Stocfotobonanza. Vakre mennesker med mobiltelefon og/eller driver med ekstremsport og/eller ligger på en strand – helst alle 3 på en gang.
  • Spørsmål som antyder at normalbrukeren har 20.000 i måneden å spare, men aldri over 2 mill i boliglån.
  • Spørsmål som har ordet “risiko” i seg. Noe vi alle vet er ekstremt vanskelig å måle, men som bankene likevel liker å spørre deg om. Hvilken risiko er DU villig til å ta!!!?? Uavhengig av kontekst selvfølgelig.
  • 10-15 spørsmål som egentlig har lite med økonomi å gjøre.

Nordeas kampanje er dessverre direkte tragisk, men ikke et unikt fenomen i denne bransjen er jeg redd.

Hvis jeg nå forulemper noen som synes dette var en fantastisk bra kampanje så vil jeg gjerne vite om hvorfor dere synes det, eventuelt støtteerklæringer slik at jeg kan føle meg litt bedre etter å ha brukt 20 minutter (+30 minutter bloggtid) på dette makkverket!!!

Hvis dere har flere momenter jeg bør ha på listen for “liksområdgiving” så hadde det vært topp!

Oppdatering: Har sjekket denne på nordea.no i dag (04.07.2012) og ingenting er endret siden sist. 

Continued debate about the “Task Performance Indicator”

Continued debate from: http://www.iallenkelhet.no/slik-maler-du-effekten-av-nettstedet-ditt/

@Gerry McGovern and @Bjørn: I have to say that even though a webguru like McGovern uses this method and argues quite well about its advantages, I don’t trust the results and the TPI number will be… well… useless?

You both say that the “optimal time” for the task is the most difficult part of the equation. The way you calculate this number is a “black box” of mystery as @bjørn said earlier – you use the customer, the fastest participant, your own expertise…. “We take a number of issues into account” (McGovern). I’m sorry but this doesn’t seem like something that would lead to a credible result. If you are using your expertise to decide optimal time, then you apply qualitiative factor into the quantitative method that makes it actually less trustworthy as an indicator.

This week I did a usability test with only 1 task – It’s basically 4 screens to fill out if you do it right. The fastest participant used less than 6 minutes and did it without any trouble. The slowest participant did it in 23 minutes. I wouldn’t try to make any conclusions about time on task from these results (median = 882 seconds | average time = 877 sec):

Average time on task for usability test with 6 users

The result you get from the test is largely depended upon the successrate. If the successrate is low, then the TPI will be low.

Successrate is a number I have dropped from my usability analysis alltogther. Why? Because its a number depended on a large number of factors and even though 10 of 10 actually completes a task, that says next to nothing about how easy it was, or how many in the “real” world that would manage to complete the task, or how good the website really is.

Usability testing is a qualitative analysis and its wrong trying to mask it as something quantitative by introducing this magic number called TPI.

With 15-20 users you will be able to see a (strong) recurring pattern, no doubt about that, but its a long way from seeing a pattern and to grade someones website from 0-100 and calling it Task Performance Indicator slapping the grade on the report and force the client to improve the website so the number goes up!

*EDIT* Gah, so @josmag is complaining about wrong use of errormargin – even though I posted my disclaimer 2 seconds after my main post :D Let’s fix the errormargin thingy and see where that leads me:

OLD POST:

(TPI for this website would be 1(360/882) = 0,40 = 40%. Ok and then apply the error margin for +/- 19% so I can trust that my  result is really somewhere  between 21% and 59%.

That means i get a TPI that either give me: “TPI under 30 er ganske dårlig. Du har et stort problem.” OR “TPI på 51-70 er bra. Fortsatt mulig å forbedre nettstedet ditt.”

Doh? Does it suck or not? Well, the number doesn’t give me any indication really, but from what I saw in the test I would say to ignore the TPI and just fix the obvious problems.)

NEW POST:

TPI for this website would be 1 (360/882) = 0,40 = 40% or 1 (360/714) to 1 (360/1050) which gives me a number between 50% and 34%. Not a great difference from my original post. You (@josmag) are also correct that the median will be more trustworthy if I have more users in my test – we don’t know if the number will go up or down – and if we get users that don’t complete the task I will have to adjust the successrate from 100% with 5% for each user failing the task (with 20 users).

I think that the money spent on testing 20 users would be better spent if you split it up in more than 1 usability test and use the money more wisely. And as @magnusrevang points out – use analytics to get the “magic numbers” for time on task and successrate.

/End NEW POST

Actually, knowing the competence from both Netlife Research and G. McGovern I would trust their (expert) judgement a lot more than the actual number they get out of their magic blackbox :)

Hvorfor er 5 brukere nok i en brukertest?

www.iallenkelhet.no er det en debatt om hvorvidt det er fornuftig å gjøre en måling av suksessrate og tid brukt på oppgaveløsning som en erstatning (?) til tradisjonell brukertesting. Bjørn Bergslien skriver om en metode for å beregne Task Performance Indicator (TPI). Videre hevder artikkelforfatteren at dette vil hjelpe deg til å måle effekten på nettstedet ditt og gi en score på skalaen:

  • TPI under 30 er ganske dårlig. Du har et stort problem.
  • TPI på 31-50 er greit nok. Men det er mye å ta tak i.
  • TPI på 51-70 er bra. Fortsatt mulig å forbedre nettstedet ditt.
  • TPI på over 70 er veldig bra. Men den burde vært 100, ikke sant?

Magnus Revang peker på at dette er faktorer som best avdekkes vha. webstatistikk. Jeg er enig med Magnus i dette og hevder videre at utvalget på 15-20 personer også er for lite til å kunne hevde at TPI tallene er gyldige. Eirik Havfer Rønjum spør meg da hvorfor 15 til 20 personer ikke er tilstrekkelig, mens 5 brukere er nok til å utføre en brukertest.

(Dere får korrigere meg om dere synes jeg refererer galt her :) ).

Dette var såpass interessant at jeg fikk lyst til å svare litt utfyllende her istedet, og jeg føler det blir litt for omfattende å svare på iallenkelhet.no fordi jeg begynner å bli litt offtopic i forhold til TPI saken ;)

@Eirik vel, brukertesting er en kvalitativ metode. Nei, man har ikke grunnlag til å mene noe om tidsbruk, frekvens eller uttalelser som subjektive meninger* eller annet som krever et visst antall brukere for å sies å være gyldig i en brukertest. Jeg skal ikke hevde at jeg aldri har brukt utsagn som “8 av 10 brukere klarte ikke å fullføre oppgaven”, men samtidig sier jeg aldri at 80% av brukerne dine vil ikke klare å gjøre dette. Det er en forskjell å snakke om observasjoner og å hevde noe generelt om noe.

Når man gjør brukertesting bør man ikke prøve å late som om man kan si noe kvantitativt om noe som helst så lenge utvalget er så lite som det som regler er.

Hvorfor er 5 brukere tilstrekkelig på brukertesting? Det er fordi det er ikke antall personer som gjør en handling / feil på nettstedet som teller, men heller tolkningen av handlingen om det er logisk at dette vil kunne gjelde for flere brukere og hvorvidt man da bør gjøre noe for å fikse det. Det er en kvalitativ vurdering av det man observerer som er gevinsten ved brukertesting.

EKSEMPEL
Nå nylig gjorde vi en brukertest av et nettsted som selger varer på nett. Nettstedet er norsk, men har kun engelsk språk. Vi hadde inne 8 brukere til test. Selv om mange av brukerne sa noe ala “dette nettstedet burde vært på norsk fordi jeg ikke er så flink med engelske ord innenfor [bransje]” og “JEG har ikke problemer med engelsk, men mange andre vil ha det” så vil ikke det i seg selv være tilstrekkelig til å hevde at man må oversette nettstedet… vi vil si at det var brukere som reagerte på det, men ikke at det er en direkte feil fordi alle brukerne som klagde på engelsk, hadde ikke problemer med å skjønne språket. MEN så gjør 1 av brukerne noe vesentlig: han stopper opp i det han skal kjøpe en vare og spør “får jeg toll på varene”? “Jeg er vant til at når jeg handler på utenlandske nettsteder så får jeg toll på varene, er dette egentlig et utenlandsk nettsted, sendes varene fra utlandet?”. (PS! Prisen sto i norske kroner (NOK) og det var et norsk flagg i hjørnet, men teksten var på engelsk).

Dette er 1 bruker – og det er alt vi trenger for å kunne trekke slutningen om at det vil være brukere der ute som har problemer med dette… så blir det en gjetning på hvor mye penger dette koster firmaet, men denne personen hadde en handlekurv på ca. 2000 kroner og ville sannsynligvis ha droppet kjøpet fordi usikkerheten ble for stor.
/EKSEMPEL

Eksemplet ovenfor viser at det er ikke antall brukere som er viktig , men hva som faktisk skjer for 1 – X antall unike individer. Det viser delvis også at det er ikke hva brukerne sier som er viktig, men hva de gjør.

Ja – 5 brukere er tilstrekkelig til å kjøre en brukertest – noen ganger holder det med EN bruker.

Ok, vil dere si – det du gjør her er å teste noe annet enn det vi er ute etter – vi ønsker å se hvor lang tid brukerne våre bruker på en oppgave og om de klarer det – ikke nødvendigvis HVA de gjør når de feiler. Da hevder jeg at 15 – 20 personer ikke er tilstrekkelig. Det er så fantastisk mange faktorer som avgjør tidsbruk på en oppgave som gjør at det blir nærmest useriøst å hevde at man kommer nær en sannhet med så lite utvalg.

Har lyst til å spørre hvordan dere setter sammen utvalget, men er redd at dere kommer til å svare at dere prøver å få inn så mange ulike aldersgrupper og demografiske bakgrunnsvariable som mulig… og da må vi diskutere det også :D

Hmm, etter noe research snubler jeg også over en artikkel av gode gamle Jakob Nielsen som sier at med 20 brukere kan dere håpe på en feilmargin på +/-19% og at dere må opp i 71 brukere for å redusere antallet til en feilmargin på +/- 10%. Dette gjelder da bare for tidsaspektet, dere har også med suksessrate og andre ting inn i miksen så jeg blir veldig usikker på verdien av TPI ;)

Sitat fra useit.com:

With 20 users, you’ll probably have one outlier (since 6% of users are outliers), so you’ll include data from 19 users in your average. This makes your confidence interval go from 243 to 357 seconds, since the margin of error is +/- 19% for testing 19 users.

You might say that this is still a wide confidence interval, but the truth is that it’s extremely expensive to tighten it up further. To get a margin of error of +/- 10%, you need data from 71 users, so you’d have to test 76 to account for the five likely outliers.

Jakob Nielsen (2006) om kvantitativ testing (http://www.useit.com/alertbox/quantitative_testing.html).

Exit Tarantell – enter Making Waves

So, the company I currently work for; Tarantell was bought by Making Waves last week.

That made me realize that nothing is forever and keeping ones own blog can be a good idea :) Not that I’ve been producing a ton of articles, but I have a few more blogposts at Blogandtell (Tarantell blog) than shown here. I will now try to be a bit more faithful to my own blog. This post will be about my personal view on the merger.

I’ve been working in Tarantell since April 2001 and it has been a great place to work, but I welcome the friendly takeover by Making Waves for many reasons:

  • The new company will be a major player in the Norwegian marketplace with about 200 employees (160 in Norway and 40 in Poland).
  • Tarantell and MW are very similar in philosophy and focus, but has different customer bases: Tarantell has many projects in bank and finance sector while Making Waves has many public sector projects so together this will be complementary reference lists and experiences to build upon ;) Both firms seek to create great user experiences and ROI for our customers.
  • We met our new colleagues at MW over pizza and beer last Thursday . I have to admit that Tarantell won the after party, but our first social meeting was a nice experience.
  • I think that our new CEO is right when he claims that this merger is really 1+1=3. I think our customers will get a broader offering and maybe even more capacity to deliver great experiences.

BUT the next few months will be exiting times. MW has their offices near Akerselva, but that office space is too small for 200 people. Tarantell has also too small office space for both firms (room for about 100?). Management is working to solve that issue and that will probably lead to relocation of both firms. I vote for the “Barcode offices” in Bjørvika ;)

My main concern for the next few months is who’s going to leave because of the changes. It’s a known fact that nobody really likes change, but some are more affected by change than others… There will probably be structural changes within the two companies since there is not room for 2 sets of management so someone has to change their work description. That could affect both the managers and the employees that will get a new manager they may not know.

It’s easy to listen to offers that comes along when you don’t quite know what you get… I hope that all my Tarantell colleagues will stay and give the new company a fair chance.

For my own part I will stay until the dust settles if I can continue my focus on usability and user behavior studies  in the new company in a proper way. Our competition should be afraid, be very afraid ;)

Exciting times!

I’m a Sofarebel!

There are currently a huge number of Facebook groups ranging from “Stop little children watching Man Utd” (currently 75000 members) to more serious groups like “The Nature Conservancy” (currently 95 000 members donating $260.000). This is good.

Before the social media revolution (that we are in the middle of right now) there were not that easy to show which causes you support and it was also difficult to find time to actually go out and support anything. Sure we had online forums and discussion groups discussing important issues, but there were no focus on the number of members because there were real discussions going on between those who were for something and those who opposed.

Today we have this wonderful digital landscape with twitter, facebook, myspace, blogs and whatnot where people voice their opinion.

It’s so easy to show support.

It’s so easy to tell everyone what my views are.

I can join 100 facebook groups today and color my face green in Twitter to show support for democracy in Iran and I have saved myself for a lot of discussions and protesting. I don’t even have to do anything else but click my mouse and sit back and watch the result.

I’m a sofarebel!

From my comfortable seat I can tell the world about what my meanings are and the world listen. How easy this is. How empovering this new technology is.

This new technology makes my life so much easier that I don’t have to do much of an effort to make change happen in the world. I don’t have to give any money. I don’t have to go out in the streets risking anything. I can sit right here and be important and show support to a lot of different things all at once! The best thing is that I don’t have to discuss anything with anyone either. No one can reveal my ignorance either cause I just find the biggest wave and surf on it till it ebbs out. Everyone else does it, so I’m safe.

Me, myself and my fellow sofarebels are all well and safe. As it should be.

We are not like the rebels out in the streets risking their life on twitter.  We are smarter, we stay inside, clicking the links, coloring our faces green for democracy.

Check my green face at www.twitter.com/haakonha

(PS! Can anyone tell me when I should stop beeing green, it’s not very pretty tbh).

Why using 30 observers in the same room as the participants is 99% bad*

(Thanks for taking time to reply to my post Jared Spool. I have huge respect for your position as a usability guru thats not afraid to go upstream with your opinions. In this case though I think that your article Usability tests with 30 observers points the usability testing community in the wrong direction)

You have valid points in defense of your method, but I still don’t think 30 observers is a good idea. My main concern is (still) for the participants. But I have also some other arguments that supports the two-room testing setup:

The ethics! (once again)

I still claim that this method puts unnecessary strain and stress on the test situation and on the participants. People tend to agree on any kind of disclosure you present to them, that doesn’t mean that they look forward to sitting in front of 30 observers.

(One great scene in the movie “The Elephant Man” comes to mind where the kind doctor shows off the elephant man to his collegues in a big auditorium. Sure it was full consent from all parties and John Merrick did it by his own free will, but I think that when he get the reactions from the other doctors I think he rather would not have come).

I think that a big audience has a bad effect on the users performance. Usability professionals can probably agree that a nervous user is the worst thing that can happen to a usability test next to a poor written test script (?)

Jared says in his post that “a silent room is a room that is paying attention”. However, any slip-up by the observers (laughing, coughing, moans) at the wrong point can have direct impact on the test. I’d rather have a observation room filled with cursing (tech-)monkeys than anything that will disturb and influence participant behavior :D

The value of having observers separated from the test participant

Sometimes you have observers saying something like: “oh, this user is so stupid that we can’t blame the website/interaction/design”, but after seeing the 3rd participant doing the same thing they usually shut up and takes the lesson. My point is that statements about the user doesn’t mean that they don’t pay any attention to the situation. The passing of notes between the observers could (for all you know) be the same comments in writing ;)

My best argument to keep the observers in a separate room is the possibility to ask questions to the observers (but via the facilitator!) while the test is going on without disturbing the participant at all. This can give valuable feedback to the facilitator sitting next to the user. (We use MSN-chat between the facilitator in the observation room and facilitator in the testlab to get this communication going – I see that I forgot to mention this in my previous post).

How distracted are the participants by the cameras vs. 30 observers in the same room?

In response to Daniel Szucs question about how quickly the participant forgets he is being observed Jared answer that “They always seem aware of their observers, though, with the right facilitation, they can prevent that awareness by focusing on the task at hand.”. But later in the same reply he says that “Ideally, we’d give observers 5 minutes after each task and about 15 minutes at the end of the session”. This must surely be a fireproof way to remind the participant that he has audience – if he actually manages to block out the 30 pair of eyes on him during the test, he will certainly be reminded every 5-7 mins. about the size of the audience :)

I have only had 1 – one – participant in all the tests I’ve done that were aware and seemingly distracted by the camera standing next to the screen. I remember it well because we forgot to turn off the VNC control feature that resulted in the facilitator in the observer room to take control over the mouse for 10 secs by a mistake… resulting in a overly suspicious participant that probably wondered when the candid camera crew would appear. This is also a reminder that devil is in the details when it comes to making participants comfortable during the test.

I’m not a big fan of one-way mirrors either. I’ve only been invited as observer on one of these tests and my impression is that the sound isolation when using a one-way mirror is poor so that any loud discussion might be heard by the test participant. This might be a trick by the facilitator in order to keep us silent in the observation room though…

What effect will the observers comments and questions have on the participants actions (and ultimately on the result of the test)?

I think that allowing direct contact between the participant and the observers is generally bad. The observers will in many cases have a vested interest in a good result and a satisfied user. That may result in biased questioning from the observers: “Why did you click that button, when the red big button next to it was the logical choice?” This question will probably have a direct impact on how the user will perform and express himself for the rest of the test.

My advice for the best possible usability test situation:

  • Don’t intimidate the participant. A nervous participant is a useless participant. Keep the observers out of sight and out of mind!
  • Let the observers make comments in the observer room but try to keep it constructive and informative. Teach them to defend the user and blame the system (always).
  • Use instant message communication between the two facilitators to make sure that the questions from the observation room are presented to the participant in a non-biased manner.
  • Use two facilitators always.
  • Try to keep the test situation as similar as possible for each participant. How can you draw conclusions about anything if each test is unique?

* actually it doesn’t seem 99% bad, maybe only 60% bad. But it is bad, for sure!

Comment on Jared Spools post “Usability Tests with 30 Observers”

This is my response to Jared Spools post: “Usability Tests with 30 observers“:

I’ve been conducting usability testing for several years (5+), but I have never even considered mixing the observers with the participants.

The reason for this is twofold:

1. I can’t see the real benefit from doing this. You say that it seems to keep the observers alert and quiet while conducting the test and that they don’t get so easily bored when they are in the same room. In my experience its often good to let the observers express some emotions for whats happening. Sometimes they try to downplay or comment on whats happening and those comments can sometimes be just as enlightning as the participants actions. Other times you can easily explain why things happen so that the observers doesn’t jump to wrong / pre-determined conclusions about what he sees. A silent room of 30 observers? Why?

2. Usability test ethics. You say that:

“Make sure the participant is not surprised upon entering the room by the crowd. Talking to them before they walk in will help tremendously. If you can warn them when talking to them on the phone the day before, that’s even better.”

This must be a cultural thing. I think that if you put an average Scandinavian test person in a room with 30 observers you will have one seriously nervous participant. Nervous participants are (as you well know) not of good use for the client you do the usability test for. I don’t know if this will be just as severe with American test participants.

So I think the participant will be nervous and reserved throughout the whole session. And what if he/she really messes it up in the usabilitytest and does something really funny or stupid? No matter how well informed the 30 observers are – you will have a problem with laughter (or even anger?). This creates a very bad situation for the poor test participant and in worst case u get a person that is a bundle of nerves for the rest of the session.

“Having the observers in the same room as the participants means they can interact.”

This is another thing I wouldn’t like to see. I can imagine this would create a few situations where you get observers that goes on with “why did you do that?” and then constantly reminding the test participant that there are 30! people he have to explain the error to.

I think both situations are unethical because it makes the situation akward for the participant.

I’m not pretending I’ve got THE answer on how to do it but here is our setup:
We use 2 cameras and live transmission of the screen/mousemovements projected on canvas in the observer room. A similar setup like the one you have with loudspeakers and also headphones to the main observer than takes the (main) notes. The cameras are connected to 2 TVs about 25″ big. So the real focus is not on the person but on the big canvas showing the actual website in action + sound. The expressions and facial tells is mainly for the trained observer and not for the bulk of the observers anyway.

I can see the entertainment effect of 30 people watching in the same room, but I can’t see the real benefit here.