The infrastructure of the internet is a mass of emotionless machine technologies. When teaching work based studies via distance learning, tutors have to convey humanness to provide a rich and rewarding experience for students. I argue that bringing the tutor voice into the machine world is an essential part of creating an outstanding experience.


Voices in the Machine: Reflections on the Use of Audio in Online Learning.

Text is a precise and effective tool for HE tutors facilitating student learning on online courses with a dominantly asynchronous dialogue based pedagogy. This paper discusses how and why tutors should augment their communications through the incorporation of voice recordings and how this can be achieved efficiently and effectively. Analysis of work done over the last 12 years provides evidence of the impact of this on both tutor and student. All of the online projects from which experiential data for this paper is drawn were based on social learning and the notion of creating a ‘Community of Practice’ (COP) Wenger, 2002.

First steps into audio

At the turn of the century, a new approach to training respiratory Specialist Registrars (SpRs) via the internet was being explored in a partnership between the British Thoracic Society (BTS) and Ultralab. At that time ISDN had arrived, downloads of up to 1Mb per minute were now possible at home, however; this was not cheap and file size was very important. Computers with 8Mb of RAM and no loudspeakers were not unusual. Text is a very efficient form of communication: it can be concise, precise, quick to read and generates low amounts of data, just a few 10s of kilobytes for hundreds of words. An audio recording, even when efficiently compressed, can be around 1 Mb per minute of speech. Asynchronous text, via email or online forums, was the dominant communications method in the SpR online community. To cater for the lack of audio capability, alternative video of keynote presentations was provided with no audio but including the narrative transcribed as overlaid text; see figure 1.

Figure 1. Screenshots from a video capture of keynote presentation.

The asynchronous media rich delivery was clearly appreciated by the trainees:

You have the fabric there of exactly what we want; we want to go home, relax and when the kids are asleep, we can say this is the time to learn."
 Respiratory SpR. November 2000.

An audio enabled version was later viewed on newer computers with audio capability; this contrast enables evaluation of the power of vocal cues. The SpRs and Respiratory Consultants on the project reported that hearing the voice of the presenter enabled them to differentiate between conviction and uncertainty and that this was critical when assessing the viability of the keyhole surgery presentation that was literally at the cutting edge of invasive thoracic strategies at that time.

Identity and authenticity
Amongst the key issues relating to success in fully online learning is that of identity and its relationship to developing trust. Text has been in use for many thousands of years, however; in many cultures it is only in the last few hundreds of years that reading and writing have become tools of the masses; in evolutionary terms it is a very new paradigm. Dunbar (1997) notes the importance of trust and bonding in human and animal relationships and suggested that complex speech evolved partly as a more efficient way of bonding than social grooming. It was noted in studies of several species of monkeys and apes that special treatment is given to grooming partners. That most people are expert in decoding emotive cues from vocal sounds was demonstrated by Aeschlimann et al. (2008) who found that non-linguistic human vocalisation sequences, of just 2 seconds duration, were reliably perceived as emotionally positive, negative or neutral.

Scott (2011) illustrated the fluidity of processing and responding to speech:

 “…we take turns in conversations with barely perceptible gaps between one speaker stopping and the next person starting (a recent study reported that 45% of turns fell within a window of +/- 250ms, and 85% within +/- 750ms, in a corpus of strangers talking to each other on the phone). Speech is the currency of most social interactions, and this speech is rapidly produced and co-ordinated with the speech of the other people to whom we speak.”

The asynchronous dialogic approach, used in the BTS project to developing online COP, was further developed through the Talking Heads project, as reported by Chapman and Ramondt 2005. Bradshaw, Powell and Terrell (2005) show that learning driven by text based dialogic processes proved very successful in undergraduate work-focused learning for supporting students who are geographically distant from each other and their tutors, and who also access study in differing temporal frameworks. In line with the Ultralab findings in these projects, it is acknowledged that socially mediated learning through online discussions can be complex (Coats and Stevenson 2006). Wenger (2007) also recognised that a process of regular social interaction, negotiation and resultant ‘meaning making’ defines the identity of individuals and of the online community. Millwood, Powell and Tindal (2008a) discuss a further evolution of this approach as applied in the Ultraversity project.

A common theme is that significant effort can be required to develop trust and bonding via text-based discussions and that conveying authenticity and identity of students and tutors was critical in achieving this. Below are two examples that suggest the use of audio by tutors can reduce the effort required to convey identity and develop trust. An interview with an Anglia Ruskin final year undergraduate student, provided insight into the value of emotive vocal cues:

“The impact was quite strong in that it softened the process by reducing some of my anxieties. There was a realisation that we were interacting with people and not a name on the screen. Another aspect that I found energising was the inclusion of extra information through your tone, intonation and human-ness of the delivery.”

The student went on to say that after hearing several of my podcasts he could then 'hear' my tone when reading my text posts in the online community.

The image below in figure 2. was created by one of my undergraduate students following initial text-based interactions over the first few weeks of semester on a fully online course. Although the implication is of a benevolent tutor, the image conveys a perception of distance between student and tutor.

Figure 2 Drawn by a student to convey initial impressions of tutor.

Following the use of audio messages (podcasts) in weeks 2 and 3 of semester the perception of the student changed to one where 'Ian as lofty cloud-based tutor' evolved to Ian a feet on the ground real person, friendly and and approachable - a co-learner with expertise to share.

Augmenting course resources with digital storytelling
The Ultraversity project started in 2003 and lead to a learning design for undergraduate courses based on assessment through patchwork text and media described by Millwood, Powell and Tindal (2008a and b), and by Arnold et al. (2009). The Viable System Model (VSM) developed by Beer (1985), uses amplifiers and attenuators as mechanisms for controlling complexity within systems. The patchwork media approach reflects this process in that it breaks down learning activities into discrete, but related, patches.

In 2008 the VSM process was applied to further evolve our pedagogy as student review had mentioned initial feelings of overload when faced with a mass of online text-based resources. The solution to attenuating complexity was in the form of telling the story of a learning journey and was expressed as an interactive map. This reduction of variety assisted with initial conceptualization of tasks. As students click on each stage on the map they follow the progress of an avatar and read concise text relating to each activity, the augmentation via more expansive audio story clips amplifies the subsequent understanding.

Fig 3. Screenshot of an interactive learning journey hosted in the Anglia VLE.

One student’s reaction to this approach conveyed a sense of excitement rather than overload:

“Wow.......how exciting is this!!!
The climb begins.
Good luck one and all”

Another student reflected the journey metaphor in her assignment and implied a change in motivation:

I was ready to hang up my walking boots [after the previous module] but I pulled them on, tightened the laces and began the climb. I soon found myself at the first plateau and learning activity...I thoroughly enjoyed this module and I hope it is reflected in my work. I really liked the mountain and loved the climb as much as the descent!!!”

Although this solution was not entirely audio based it was the concept of storytelling that led to the conceptualisation of the journey and the audio clips that provided a cognitive bridge to the more detailed course resources.

Audio for assessment
Feedback from final year students indicated that vocal cues are important in conveying the critical friendship aspect of peer assessment. The use of audio helps students develop the confidence to offer deep and challenging critical commentary:

“…we inadvertently stumbled upon gold when we decided to do it [peer assessment] using audio because immediately you could sense the hesitation in the voice of your critic, 'the humanness of the medium' was the phrase that I used whereas text can be very harsh because it has no tone no intonation its not organic. I think one of the reasons why video wasn't used for me is self consciousness, its nothing more not even vanity, you can see a lot in a voice…”
Julian Keith

 Hi Ian, I experimented with audio feedback in year one with Toby and Julian - I think we all found it quite an easy way to be a critical friend - without the worry of being unable to express our empathy.  I think the ability to use alternative media when providing critical feedback allows me to be far more open and honest, because I am able to communicate my perspective and that fact that it is MY perspective and not a statement of fact - hope that makes some sense.” 
Sally Clifford

This next observation from a student was a response to audio feedback that accompanied digital annotation of work in progress during supervision of a final year dissertation.

“I think it is fantastic that you use voice recordings as well as written feedback, it really enabled me to understand the context in which you were giving the information. It also gave me the ability to 'sense' whether I was on the right track or not.”

This student was used to traditional course delivery and the supervision was the first contact with remote online tutoring. The authenticity transferred through emotive vocal cues was a key element in improving this personalised feedback experience.  A 2010 JISC report uncovered similar findings:

Many learners find feedback via digital audio and video more detailed and helpful. In contrast, written feedback is perceived as brief, unclear and difficult to recall. A more personal approach to feedback adds value to learners’ experience of higher education.” 

According to a more recent JISC report (2012), literature suggests around 20% of marked work may go uncollected. Recent action inquiry on an undergraduate course has explored the value of ‘generic feedback’. This is based on compiling notes made during final assessment and aims to raise the visibility of key learning targets to all students. It is presented to the whole cohort as an mp3 file located in a subsequent module specific discussion forum. It is difficult to assess whether students learn more but it is clear that some students do listen more than they might read:

 “I listen to the generic feedback podcasts on the train on the way to work, I listened to one twice a day for a week, it really soaked in, I wouldn’t do that with text.”

McClean et al. (2012) uncovered similar findings, notably the students felt audio feedback was more personal than written comments and the perception of not feeling like it came from a stock set of comments. Students also mentioned listening to audio feedback several times.

I have used audio in:
 FirstClass client - this provides a one click approach to adding an audio interface in an email header, the recording is done in the email and the file automatically attaches once done.

Plone - here audio files were recorded outside of the platform then uploaded to contents panels, this was less efficient that FirstClass but did work well.

SharePoint - some versions allow audio to be embedded in discussion items, this is very useful as it places the audio directly in context. Other set ups do not allow audio in messages but it can still be embedded in wiki or html resources pages as well as being placed as a downloadable podcast in a documents or shared documents folder.

Turnitin/Grademark - the addition of a recording interface on feedback pages can provide a very useful supplement to text based feedback.

Telling the story

Robin (2006) provides seven key elements of digital storytelling:

Figure 4. The Educational Uses of Digital Storytelling. Robin (2006).

A powerful use of podcasts is conveying the story of a personal experience, this can provide an authentic insight with conviction that is hard to achieve via text. For many years a text based example of double loop learning (Argyris 1982), drawn from my own student teacher days, had been in the course resources. Converting this same story to a podcast changed the impact considerably. In an unrecorded tutorial discussion about this several students agreed that the text version had been “something a tutor had written” however the audio version transformed the story into: “An important event someone had experienced”.  This example of a changing perspective reinforces the value of Robin’s elements 3 and 4.

A significant barrier to many people is that the voice we hear when we speak is not the voice we hear when we listen to a recording. In addition to the external voice that others hear, we hear sounds that resonate through our skull and internal spaces such as sinuses. This in part explains why many people do not like hearing their recorded voice and are reluctant to make recordings. My first attempts at recording audio involved reading from a script, I made mistakes, got to the end of one line and went back to the start of the same line or missed part of a sentence out. Focusing on the script resulted in many unsatisfactory takes and lead to an unconvincing final recording. The emotive cues conveyed my nervousness rather than my passion for the topic. I persevered and over several months of experimentation found that a set of prompts proved more effective than a script or a totally ad hoc approach. This can be seen in the video capture showing recording using Audacity on the same page as the above podcast.

McDonnell et al. (2004), argue that creating a story for an audience requires the teller to become an observer of the experience and links the changing conceptualisation to stages in Kolb’s learning cycles. The reflective and analytical steps outlined in the diagram below enable this perspective.

Figure 5. A knowledge management process to support the recording of digital stories.

This process lead to a more natural recording as can be heard on this introduction to the second year of study which starts with a reflective practice module.


Student feedback indicated it was both a useful initial introduction to the module and provided valuable insight into expected behaviours and standards such as increased student autonomy in managing their learning.

The original was punctuated by what I felt were highly irritating non-linguistic sounds, the umms and errs I was not aware of saying and many of which I edited out. When subsequently discussing with a student whether umms and errs are distracting he said something that proved to be a significant event for me: “...the umms give me a few moments to absorb what you have just said.”
Another observation provided further evidence of the value of informality:

“I really look forward to your audio messages Ian.  I think that is because it brings the module to life, your style of delivery is informal and this works particularly well, you are not lecturing us.  I feel that your interest in us is genuine and that you deliver in this way to add a personal dimension…I listen through the whole audio initially and then on the second playing I jot down key points that you make on Stickies and plaster them around the wall above my computer.”

Reflecting on this systematically and at length after the event (Schon 1983 and Argyris 1982) the conversation became a critical incident (Tripp, 1993) that transformed my willingness to engage in audio recording. It seems there is no need to aspire to achieve a formal ‘BBC standard’ of recording and that whether the speaker likes their recorded voice or not is irrelevant. What is important is that this powerful emotive tool is not neglected.

Six key elements in capturing an authentic recording are proposed:

  1.  Be natural, be yourself, anything else will sound artificial unless you are an expert at creating alternative voices.
  2. Work from prompts rather than a script.
  3. Talk as if you were speaking to a student - as if they are in the room with you.
  4. Let your voice convey your passion - enthusiasm for the topic will be appreciated.
  5.  Let your voice convey praise and concern - this can reinforce the value of high quality work and the importance of improvement.
  6.  If you are at all nervous about the process, avoid listening back to a recording.

The overarching problem to be solved relating to use of audio, in fully online learning at HE level, can be summarized as: Is audio an effective use of time?

Beneath this umbrella are several sub-questions:
1.     How much time and effort is involved?
2.     Do students learn more?
3.     What should it be used for?
4.     How should it be done?

The JISC 2010 overview suggests audio and screen capture may save time for tutors and improve learners’ engagement. However, the rhetoric in the blog and twitter worlds conveys a polarity between the use of audio being perceived as ‘very effective’ and ‘a waste of time’. The use of software such as Audacity to record and export mp3 files is a very straightforward process. This software is compatible with Mac and Windows and is free to download. For tutors involved in online learning the effort required to explore this approach is minimal and the benefit to students is potentially significant.


