CALL: Computer Assisted Listening Lessons

by Daniel Schulstad

Listening is one of the hardest skills to master for the ESL/EFL learner (Chang, Tseng et al. 2011; Leveridge and Yang 2013). Learners in authentic situations struggle to decipher native-speaker speech in real time, as speed of verbal delivery, local accents, lack of contextual or visual cues (telephone English) and lack of vocabulary knowledge impede understanding, which in turn blocks communication. Without preparation or prior information as to content, with the additional complexity of a strong regional accent, even a native speaker can struggle to understand spoken English. Inability to communicate is de-motivating for language learners, which in turn affects learning negatively as per the “affective filter” hypothesis, which emphasizes the role of emotional state on learning (Krashen and Terrell 1983).

Due to these two factors; difficulty and limited exposure, researchers have looked to the affordances of computer facilitated multimedia learning as a way to not only encourage more meaningful learning, but also increase practice opportunities for learners in EFL contexts and in classrooms with a high student to teacher ratio (Liou and Tsing 1994; Ba 2006; Tsai 2012). Multimedia can be seen as quite a broad term, encompassing many mediums, from online web learning to CD-ROM products. For the purposes of this paper we are going to use Mayer and Moreno’s (2003) definition describing multimedia learning as learning facilitated through visuals and words (printed or spoken) (ibid., p. 43), as this allows us to develop principles that are applicable to both online or offline computer facilitated materials.

Cognitive load theory allows some common sense application of theory to lesson design. While measuring types of cognitive load is problematic (Kirschner, Ayres et al. 2011), certain key terminology is needed to make distinctions between what can be changed by designers and instructors, what the learner themselves brings to the equation, and what the task itself contains. Cognitive load research has focused on “extraneous” load, in other words the mental effort that is expended unnecessarily due to poor or superfluous learning design (Adesope and Nesbit 2012). As working memory is limited, a designer’s concern is how to facilitate transfer of information from the working memory to the long-term memory as quickly and easily as possible. A dual processing theory of multimedia learning divides up working memory into an auditory and a visual system, both limited in capacity and to a certain degree independent of one another (Diao, Chandler et al. 2007; Chang, Tseng et al. 2011; Adesope and Nesbit 2012).

Figure 1. Mayer’s Cognitive Theory of Multimedia Learning (2003p. 44)

This division has lead researchers to study the interplay between auditory (words) and visual (pictures including textual) information in learning design, in order to create optimal conditions for learning. Mayer and Moreno (2003) and Diao et al. (2007) have advanced research into what is termed the “redundancy effect” where cognitive load is increased due to mirroring of information in both the visual and auditory modes. It is this theory of a “redundancy effect”, resulting in cognitive overload, which is problematic when applied to the teaching of languages. The use of redundant text and audio in the form of closed captions and subtitles has been extensively employed in language learning practice and, in conflict with Mayer and Moreno’s theory, has shown to be actually effective in improving language learning outcomes (Diao, Chandler et al. 2007; Leveridge and Yang 2013).

However, even in the field of language learning, other research has indicated that there are certain conditions under which Mayer and Moreno’s (2003) redundancy effect is also valid; for example with older language learners, or in the teaching of listening strategies as opposed to improving listening comprehension (Diao, Chandler et al. 2007). In addition, the level of proficiency of the language learner is an important variable, making measurement of learning outcomes more difficult, as many studies have not allowed for this effect- using the same listening material across a variety of learner levels and abilities to draw conclusions. The results are then predictable- lower level learners rely heavily on captions, while more proficient learners find them distracting, adding to extraneous cognitive load (Leveridge and Yang 2013). The common teacher recommendation “watch with English subtitles” can be counter-productive.

Tracing the findings of the following studies, it is possible to see an evolution of thinking about the application of cognitive load theory and verbal redundancy to language learning. Adescope and Nesbit (2011) see a need for further research to confirm the findings of their meta-analysis and they provide a stronger foundation for the design of multimedia listening lessons.

Authors & Objective Positive findings Negative Findings
Mayer and Moreno (2003)

Reducing cognitive load to enhance learning

Dual channel assumption of auditory and visual input facilitates meaningful learning Redundancy Effect shows better transfer of learning when words are auditory rather than auditory and visual- recommends avoidance of presentation of identical stream of printed and spoken words (p. 46)
Diao et al. (2007)

Investigation of effect of simultaneous written  presentations on comprehension on spoken English as a Foreign Language

Written subtitles and scripts aid comprehension No effect was observed on listening skills in general- full scripts or simultaneous subtitles results in extraneous cognitive load that interferes with learning- visual aids should be eliminated when teaching learners to listen
Adescope and Nesbit (2011)

Meta-analysis to investigate the effects of spoken-only, written-only and spoken-written presentations on learning retention and transfer

Learners who learn from spoken-written presentations learn more than from spoken-only presentations

Presentations showing key terms resulted in higher intake than verbatim presentations- partial redundancy is beneficial

Signaling text associated with greater learning

Adding text to audio narration beneficial

Prior knowledge strong variable- higher level learners do not benefit as greatly from spoken-written presentations

Verbatim spoken-written presentations no significant effect without consideration of moderating effects


-reading fluency of learners


-inclusion of images or animation

Adding audio narration to text not beneficial

Split attention affect means that benefit of partial verbal redundancy is negated by concurrent use of visuals such as animation and diagrams

Table 1. Relation of cognitive load theory and multimedia learning to language learning

The affordances of multimedia learning design can also facilitate common ESL practice applied to the teaching of listening skills lessons. Current practice includes tasks that are a combination of both “top-down” and “bottom-up” strategies when developing comprehension (Tsui and Fullilove 1998). Top-down strategies include activation of “schema” prior to listening, in the form of visuals and brainstorming. Schema theory views learned knowledge stored in the long-term memory that is in a schematic form, allowing the addition, building and adaptation of existing frameworks. This process facilitates deeper learning (Diao, Chandler et al. 2007). “Gist” listening, designed to allow access to general understanding, followed by listening tasks that require intensive listening, where understanding is provided through focus on individual words and phrases, are known as “bottom-up” strategies. These strategies can be implemented into multimedia design, and correlate with principles drawn from cognitive load theory involving transference of understanding into the schemas stored in long term memory. In addition, the pre-teaching of vocabulary is a practice that lowers the “germane” cognitive load on the learner-which is defined as the amount of useful information and activity required to complete the task (Kirschner, Kester et al. 2011). By simultaneously activating existing knowledge (schema) and preventing a “blocking” effect when encountering new words (pre-teaching of vocabulary), using visuals or animation in listening lessons has been shown to contribute to effective learning (Samur 2012).

Self-pacing, by which learners are able to replay, back-track and repeat sections until they themselves are satisfied with the outcome, seems to be positive for lower level students, while adding to cognitive load for higher levels (Koehler, Thompson et al. 2011) and in a related sense, learner control over the path and order of the learning materials is not universally beneficial either (Ferney and Waller 2001). Again lower level learners risk disorientation or uneven learning as a result of having to make decisions related to content, while more advanced learners find choice that doesn’t facilitate their learning more extraneous and frustrating, than a system determined path.

Indeed compensation, or customization for different levels of learners is complicated in the language-learning context, where global level determinations do not take into account individual skill levels in each area of learning. Different levels of proficiency have been shown to have a significant impact on cognitive load, reducing the effectiveness of materials across a wide range of abilities (Atkinson, Derry et al. 2000; Diao, Chandler et al. 2007; Chang, Tseng et al. 2011; Koehler, Thompson et al. 2011).

In terms of interface design, Ferney and Waller (2001) provide useful guidelines for considerations that have implications on cognitive load and usability of multimedia materials drawn from their experience designing a CD-Rom for an academic English context. A summary of their considerations follows:

  • use of drafting “storyboards” to best balance text and images
  • control over the amount of text optimally displayed on a computer screen, as well as optimal font size (12pt)
  • use of visual cues, chunking of text and clear signposts to aid learner navigation
  • clear menus, tables of contents, headings, summaries, page numbers and so on, to aid in navigation, as learners may be learning independently without the assistance of a tutor to help them navigate
  • consistent use of font types, sizes and color
  • clear separation of content from feedback and model answers based on color and font use (Reagan and Murray 2002)

The limitation of computer generated multimedia feedback that largely consists of standardized model answers (Ferney and Waller 2001) may be compensated through the value provided through the inclusion of worked examples, which break down the process of task completion for the learner (Atkinson, Derry et al. 2000) and, if provided both prior to and post task completion are similar to Mayer and Moreno’s (2003) idea of pre-training for complex tasks. Atikinson et al. (2000) also propose that feedback include a cuing strategy, so that, in the case of a listening lesson for example, portions of the text containing the answer to comprehension questions can be visually or aurally highlighted. This is also called signaling- and has been seen to be beneficial for learners across different disciplines (Kirschner, Ayres et al. 2011). Vehbi (2012) also takes this idea a step further, recommending that such signaling or cuing be available to the learner on demand.

Working specifically with listening texts in a multimedia context, it appears that this signaling effect can be applied to captioning in general. While full subtitling or captioning may indeed increase cognitive load, especially across a range of proficiencies (Diao, Chandler et al. 2007) supporting Mayer and Moreno’s (2003) redundancy theory, there is evidence that some use of key words with captioning can result in improved comprehension (Diao, Chandler et al. 2007; Hung 2011; Samur 2012). The showing of key information simultaneously with audio increases comprehension for learners, especially for lower level learners who struggle to decode full captions (Li 2012). In addition, if this feature can be turned on and off it allows individual learners to choose the mode that best suits their style. This allows for the ultimate goal of teaching listening strategies to be realized; for learners to be able to listen to and comprehend authentic materials and operate in authentic situations without scaffolding.

In summary, given the research on cognitive load theory and multimedia design, optimal computer facilitated listening lesson design can include:

  1. Simultaneous display of key words with audio to assist in comprehension
  2. Schema activating pre-listening activities including removal of blocking vocabulary
  3. Worked examples that explicitly illustrate the listening strategies and cognitive process leading to successful comprehension
  4. Feedback, including signaling or cuing portions of text that contain answers
  5. Consistent interface design to allow easy navigation and clear signposts
  6. In-built learner controlled scaffolding to allow for materials exploitation at different proficiency levels allowing choice of pacing, subtitling and glossary use.
Author’s Bio: Dan is Centre Director at International House Boston. In addition to CELTA training, running a busy summer camp for teenagers and the odd carpentry job around the school, he has just completed his Master’s of IT in Education and Training. This has left him with more spare time to spend with his toddler, record collection and lowbrow detective novels (occasionally in that order).


Adesope, O. and J. Nesbit (2012). “Verbal redundancy in multimedia learning environments: A meta-analysis.” Journal of Educational Psychology 104(1): 250.

Atkinson, R. K., S. J. Derry, et al. (2000). “Learning from Examples: Instructional Principles from the Worked Examples Research.” Review of Educational Research 70(2): 181-214.

Ba, E. (2006). “A Blended-learning Pedagogical Model for Teaching and Learning EFL Successfully Through an Online Interactive Multimedia Environment.” CALICO Journal23(3): 533.

Chang, C.-C., K.-H. Tseng, et al. (2011). “Is single or dual channel with different English proficiencies better for English listening comprehension, cognitive load and attitude in ubiquitous learning environment?” Computers & Education 57(4): 2313-2321.

Diao, Y., P. Chandler, et al. (2007). “The Effect of Written Text on Comprehension of Spoken English as a Foreign Language.” The American Journal of Psychology 120(2): 237-261.

Ferney, D. and S. Waller (2001). “Reflections on Multimedia Design Criteria for the International Language Learning Community.” Computer Assisted Language Learning14(2): 145-168.

Hung, H.-T. (2011). “Design-Based Research: Designing a Multimedia Environment to Support Language Learning.” Innovations in Education and Teaching International 48(2): 159-169.

Kirschner, F., L. Kester, et al. (2011). “Cognitive load theory and multimedia learning, task characteristics and learning engagement: The Current State of the Art.” Computers in Human Behavior 27(1): 1-4.

Kirschner, P. A., P. Ayres, et al. (2011). “Contemporary cognitive load theory research: The good, the bad and the ugly.” Computers in Human Behavior 27(1): 99-105.

Koehler, N. A., A. D. Thompson, et al. (2011). “A design study of a multimedia instructional grammar program with embedded tracking.” Instructional Science 39(6): 939-974.

Krashen, S. and T. Terrell (1983). The Natural Approach: Language Acquisition in the Classroom. Oxford, Pergamon.

Leveridge, A. N. and J. C. Yang (2013). “Testing Learner Reliance on Caption Supports in Second Language Listening Comprehension Multimedia Environments.” ReCALL 25(2): 199-214.

Li, C.-H. (2012). “Are They Listening Better? Supporting EFL College Students’ DVD Video Comprehension With Advance Organizers In A Multimedia English Course.” Journal of College Teaching & Learning 9(4): 277-288.

Liou, H. C. and N. Tsing (1994). “Practical Considerations for Multimedia Courseware Development: An EFL IVD Experience.” CALICO Journal 11(3): 47.

Mayer, R. E. and R. Moreno (2003). “Nine ways to reduce cognitive load in multimedia learning.” EDUCATIONAL PSYCHOLOGIST 38(1): 43-52.

Reagan, N. and O. Murray (2002). “Book Reviews.” TESOL Journal 11: 49-52.

Samur, Y. (2012). “Redundancy effect on retention of vocabulary words using multimedia presentation.” British Journal of Educational Technology 43(6): E166-E170.

Tsai, S.-C. (2012). “Integration of multimedia courseware into ESP instruction for technological purposes in higher technical education.” Educational Technology & Society15(2): 50+.

Tsui, A. and J. Fullilove (1998). “Bottom-up or Top-down Processing as a Discriminator of L2 Listening Performance.” Applied Linguistics 19(4): 432-451.

Vehbi, T. (2012). “Design of Feedback in Interactive Multimedia Language Learning Environments.” Linguistik Online 54(4): 35-50.