Developing Word Recognition Sub-skills in Listening

Developing Word Recognition Sub-skills in Listening

by Daniel Tse

One of the most frequent challenges faced by second-language learners is their inability to fully understand other speakers of English. How can teachers help them to negotiate this challenge? Before we explore several practical classroom ideas, let us turn our attention to how listening comprehension is achieved.

Listening comprehension

Contemporary views on listening comprehension are based on an integrated model of top-down and bottom-up processing. Top-down processing involves the listener’s application of schemata, or background knowledge, to speech input as the basis for comprehension (Richards 2008: 7). At the other end of the spectrum, bottom-up processing refers to the use of systematic knowledge to decode incoming acoustic signals (Richards 2008: 4).

Field (2008: 114) further illustrates how meaning is derived from decoding speech to discrete units of spoken language in his information-processing model. According to Field, the listener uses his or her phonological, lexical, and syntactical knowledge to convert speech signals to progressively larger language units, ranging from the phoneme to the whole utterance. In the decoding process, individual words are formed by mentally combining individual phonemes and syllables. Word recognition, therefore, represents one of the sub-skills required for successful bottom-up processing.

Using Field’s classification of listening sub-skills (1998: 8), word recognition can further be divided into discrimination and segmentation. The former refers to the ability to distinguish between minimal pairs, such as ‘walk’ /wɔːk/ and ‘work’ /wɜːk/, with only a contrasting phoneme between two words. The latter requires the listener to correctly identify word boundaries when arranging phonemes and syllables. An example is ‘make a cup of tea’ /meɪkə kʌpə tiː/, which may be perceived as three blocks, not five distinct words, in connected speech (ibid.: 8).

It is with the sub-skill of segmentation that learners often experience the greatest difficulties. This is primarily caused by certain features of connected speech which deviate from the pronunciation of words in their citation form. The most common features include assimilation, elision, catenation (or linking), and the unstressed reduced-form vowel /ə/ (the schwa). If left without adequate training in segmentation skills, learners may apply their first-language habits indiscriminately or even fail outright to decipher words from blocks of sound. In the above chunk, ‘make a cup of tea’, consonant-vowel catenation can present problems to the learners whose first language does not feature a reduction of unstressed vowels.

In order to develop learners’ word recognition sub-skills in listening, teachers can use a range of activities for both awareness-raising and skill-deploying purposes.

Discrimination tasks

In this alternate response task type, learners are given pairs of words containing the problematic phonemes, such as /ɜː/ (‘work’) and /ɔ:/ (‘walk’). The teacher says either word from each pair aloud and the learners must identify it correctly. They can respond by underlining the correct word on their worksheet, or by raising their left or right hand. In my experience, the former response format is practical only when the learners’ difficulties are already known to the teacher, whereas the latter is more effective with Young Learners and those who prefer learning kinaesthetically. Nevertheless, isolated words do not carry any meaning unless the context is clear. This task type would therefore be more suitable for an ad hoc use in dealing with learners’ immediate errors.

An extension to the above task involves placing minimal pairs within sentences. This allows the learners to hear such pairs of words with their co-text as well as in connected speech. To purposefully add games to lessons with Young Learners and Teens classes, teachers can embed discrimination tasks in the Destination Maze (Hancock and McDonald 2013).

In the planning stage, teachers should select four minimal pairs based on their learners’ specific difficulties. Each word from a pair is then assigned the left or right direction for the maze. At the beginning of the game, the teacher reads aloud a sentence containing one word from a minimal pair; the learners must correctly identify the word before they choose either direction at the first crossroads. For instance, they go left upon hearing ‘work’ or right upon ‘walk’ and arrive at the second crossroads. The same step is repeated three times with different minimal pairs until the learners reach a destination city at the end of the maze. They need to successfully discriminate between a series of minimal pairs in order to get to the correct destination, and if this destination does not match what the teacher has, the teacher will instantly know that they have gone wrong somewhere.


Although dictation has its origin in the Grammar-Translation Method (Stansfield 1985), teachers can use this versatile task to raise learners’ awareness of specific features of connected speech (Field 2008: 88). If decoding the reduced-form schwa /ə/ represents the most serious issue, learners can count the number of words by listening to sentences containing salient instances of this feature. When they compare the task sentences to their own answers, it should encourage them to notice the short length and the unstressed nature of the /ə/.

To provide learners with opportunities for practising word recognition skills, teachers can dictate short sentences containing a target feature of connected speech. If the level of challenge proves too great for a specific group of learners, teachers can consider using a gap-fill to reduce the task demands. Either task type requires writing skills as learners have to decode speech in listening and encode words in giving their responses. Once they have the full task sentences, teachers can play each one along with the script. This fosters in the learner stronger mental connections between the spoken and the written forms, which underlies the sub-skills of word recognition.

In my experience, the effectiveness of dictation depends on the teacher’s careful choice of task sentences, so that the learners can focus their attention on one salient feature of connected speech at a time. Another consideration of dictation is that learners should not be required to recognise words beyond their current knowledge. In terms of duration, Field (2008: 89) favours shorter tasks which last about five minutes. This is confirmed by my observation of General English learners that most of them begin to exhibit signs of restlessness after a comparable amount of time.

Audio script

After the traditional listening stages, learners can focus on the most problematic parts by listening again to the audio accompanied by the script. Akin to dictation, this awareness-raising exercise enables the learners to connect sound to spelling. More importantly, it gives teachers an opportunity to diagnose the feature of connected speech which poses the biggest obstacles to their learners’ word recognition. Subsequently, their learning needs can inform the teacher’s syllabus choice and constitute the target for sub-skill practice in future listening lessons. In my experience, the process of identifying the learners’ problems with word recognition is relatively straightforward in one-to-one lessons. It is possible for the teacher to stop the audio upon the learner’s request and focus on the problematic parts. In group classes, however, there are usually different areas of difficulty or conflicting needs; teachers will need to carefully select the feature of connected speech that represents the greatest challenge for most learners.

Another type of script work involves the learners’ use of word recognition sub-skills to identify instances of assimilation, elision, catenation or reduced-form /ə/. When listening with a script, learners can mark a target feature as soon as they recognise it. For example, elision can be represented visually by a strikethrough over unsounded letters (‘don’t know’), catenation by a musical slur symbol across word boundaries (‘gin‿and tonic’), and the schwa by its phonemic symbol /ə/ above the letters corresponding to this vowel sound. Similarly, instances of assimilation can be marked by using a combination of slashes and phonemic symbols, such as ‘ten[m] pounds’.


Now that we have explored a variety of classroom activities, when should teachers use bottom-up listening tasks in a skills lesson? Although these tasks can appear before or after the top-down ‘gist’ and ‘detailed comprehension’ stages (Field 2008: 97), teachers should evaluate the effects of different procedural arrangements on the learners’ experiences.

From the learners’ perspectives, an early bottom-up task can help them to acclimatise their ears to different English accents and develop partial comprehension of a text. Teachers, however, would risk unintentionally creating a stand-alone listening stage if learners did nothing about the task sentences in their subsequent listening skills work. On the contrary, it is more straightforward for teachers to provide a coherent learning experience if the top-down stages come before the bottom-up ones. This arrangement ensures direct relevance of the bottom-up task sentences to a listening text as learners will by then have achieved detailed comprehension of the text. In lesson planning, teachers can therefore adapt the bottom-up gap-fill tasks in coursebooks for their learners’ specific needs in listening skills.

By using various awareness-raising and skill-deploying tasks, teachers can develop learners’ ability to recognise words in listening. This will in turn enhance their bottom-up processing skills towards achieving listening comprehension.


Field, J. Listening in the Language Classroom. Cambridge: Cambridge University Press (2008).
Field, J. ‘Skills and strategies: Towards a new methodology for listening’ in ELT Journal, Vol. 52/2. Oxford: Oxford University Press (1998).
Hancock, M. Pron Journey Hit v Heat <> (2013). Accessed on 6 February 2023.
Richards, J.C. Teaching Listening and Speaking: From Theory to Practice. Cambridge: Cambridge University Press, (2008).
Stansfield, C. W. ‘A history of dictation in foreign language teaching and testing’ in The Modern Language Journal, Vol. 69/2. Hoboken (NJ): Wiley (1985).

Author Biography

Daniel Tse went into ELT in 2019 and started teaching at IH Milan and San Donato, Italy in the same year. He works with Young Learners, teens, and adults across the full range of CEFR levels. An early-career teacher, he is currently on his journey through the DELTA. He mainly teaches Cambridge/IELTS Exam Preparation and Business English courses. He has also spoken at conferences in Milan and Barcelona.