by Frank Carrizo Zirit


The adoption of multimodality in English Language Teaching (ELT) marks a significant departure from traditional, predominantly text-centric teaching methods. This approach, which integrates various modes of communication, aims to mirror the complexity and diversity of real-world communication more accurately. This shift not only enhances the teaching and learning experience but also prepares students more effectively for the multifaceted nature of contemporary communication.

Multimodality, as defined by Jewitt and Kress (2003), involves the use of multiple semiotic modes in communication, each contributing to the meaning-making process. These modes include linguistic, visual, audio, gestural, and spatial elements, all of which play a significant role in how we understand and interact with the world (Kress, 2010). In ELT, the incorporation of these modes seeks to engage learners in a more holistic and interactive learning experience, recognising that language learning extends beyond mere textual comprehension.

Multimodality versus Multiple Intelligences: A Clarification

Despite the initial enthusiasm, some personal reservations about the multimodal approaches arise, especially when they are contrasted with the multiple intelligences (MI) theory, which has been challenged for having insufficient evidence and possibly pigeonholing students to predetermined learning styles. Multimodality appears to have the same basic assumptions as the MI theory, i.e., the way the learner receives the stimuli through different senses.

The MI theory, proposed by Howard Gardner in 1983, suggests that individuals possess different kinds of intelligences, such as linguistic, logical-mathematical, musical, and spatial, among others (Gardner, 1983) and to achieve learning, lessons should be tailored to cater for these intelligences, or at least, the predominant learning style in the classroom. Despite its seemingly intuitive logic, the theory has faced criticism for its lack of empirical validation and its tendency to categorise learners in a way that might limit their learning potential (Waterhouse, 2006). Similarly, Willingham (2004) argues that the theory lacks a strong research base and suggests that educators should be cautious about adopting it uncritically.

In contrast, the multimodal approach in ELT does not attempt to classify students based on supposed innate intelligences. Instead, it focuses on providing varied and rich learning experiences that align with the naturally diverse ways in which information is communicated and processed in real-world settings (Bezemer & Kress, 2016). This implies that materials should be suited to their most innate medium regardless of the learner. This approach is supported by research that underscores the benefits of engaging multiple senses and modes of communication in the learning process (Mayer, 2009).

Empirical Support and Practical Applications

The shift towards multimodal teaching and learning in ELT is not merely about adding variety or "lots of stuff" to lessons for the sake of engagement. It is grounded in a pedagogical framework that recognises the diverse ways in which individuals perceive, interpret, and understand information. Multimodality goes beyond surface-level engagement, aiming to create deeper learning experiences by integrating various communication modes that align with how communication occurs in real life.

Mayer’s (2009) Cognitive Theory of Multimedia Learning highlights the importance of using multiple modes of representation to enhance learning. According to this theory, learning is more effective when both visual and auditory materials are presented, as it allows learners to process information through dual channels, leading to better comprehension and retention. For example, when teaching vocabulary, an instructor might use a combination of images (visual), spoken words (auditory), and text (linguistic) to reinforce the meaning and use of new words. This multimodal approach caters to different learning channels, making it easier for students to retain and recall vocabulary.

Furthermore, Jewitt and Kress (2003) emphasise that multimodality is not just about using different types of media or sensory stimuli but about understanding how these different modes contribute to meaning-making. They argue that language is just one mode of communication and that effective communication and understanding require the integration of multiple modes such as visual, gestural, spatial, and linguistic. A practical application in an ELT classroom could involve analysing a movie scene. Students would not only listen to the dialogue (aural) but also interpret body language and facial expressions (gestural), understand the setting (spatial), and engage in discussions or written responses (linguistic). This holistic approach enables students to appreciate the nuances of language use in different contexts.

Moreover, Kress – along with Bezemer (2016) – explores the idea that multimodality in education is not only about making students interested or busy; it is about understanding and engaging with the different ways they learn and understand information. One way to show this is in a narrative writing lesson, where students could read a short story (linguistic), then see a related short film (visual and auditory), and then do a group activity where they make a storyboard for their own narrative (spatial, gestural, and linguistic). This way, the lesson may turn out more interesting and help students learn narrative structure through different modalities.

Multimodality in ELT Material Writing and Coursebook Design

The concept of multimodality has significantly influenced ELT material writing and coursebook design, moving beyond traditional text-based approaches to incorporate a variety of communicative modes. This shift aligns with the growing recognition of the diverse ways students engage with and understand language, as well as the need to prepare them for the multimodal nature of contemporary communication.

Stec’s (2019) exploration of verbal and visual modalities in ELT materials for young learners highlights an important trend in material writing. Modern coursebooks increasingly incorporate diverse cultural content, integrating images, audio clips, and interactive elements alongside traditional text. This integration aims to provide a richer context for language learning, helping students understand language within the framework of different cultures and settings. The use of diverse imagery and real-world contexts in coursebooks helps learners connect linguistic concepts with their applications in varied cultural scenarios.

The principles outlined in the research by Wei et al. (2022) demonstrate the potential of multimodal approaches in enhancing visual recognition tasks. Applying this to material writing, coursebooks are now featuring more integrated visual aids, such as infographics, interactive videos, and augmented reality elements. These tools are not just supplementary but are integrated into the core content, facilitating a more immersive and engaging learning experience. By combining visual elements with linguistic instruction, these materials aid in the retention and understanding of language concepts.

A recent review on the integration of technology into ELT (Yanti & Nurhidayah, 2020) provides insight into how digital tools are being leveraged in multimodal teaching. ELT materials are increasingly being designed with technology integration in mind, incorporating digital platforms for interactive learning experiences. This includes online exercises, language learning apps, and digital storytelling tools, allowing for a more dynamic interaction with the language. These digital components encourage active participation and provide instant feedback, which is crucial for language acquisition.

Implications for Material Writers and Publishers

For material writers and publishers, embracing multimodality means rethinking how language is taught and presented. It involves a shift from purely text-based materials to a more integrated approach that combines text, audio, visuals, and interactive digital elements. This approach not only caters to different learning mediums but also reflects the real-world scenarios in which the language is used.

The challenge for writers and publishers is to create materials that effectively balance these different modes without overwhelming the learner. It requires a nuanced understanding of how different modes interact and complement each other to facilitate effective language learning. Additionally, there is a need for continuous research and feedback from educators and learners to refine and adapt materials to changing educational needs and technological advancements.

Addressing Challenges and Looking Forward

The integration of multimodality in ELT has opened new vistas for educators and learners alike, promising a more engaging and comprehensive approach to language acquisition. However, this promising paradigm is not without its challenges, and a critical reflection on these can illuminate paths for more effective implementation.

One of the primary challenges in integrating multimodality into ELT is the disparity in resource availability. As Cimasko and Shin (2008) note, not all educational settings have equal access to the technological and material resources required for a truly multimodal approach. This disparity raises concerns about equity in language education, potentially widening the gap between well-resourced and under-resourced learning environments.

What is more, the effective implementation of multimodal strategies in ELT goes beyond mere superficial enhancements like using PowerPoint for visual appeal or adding a separate module on technology use on your CELTA or DELTA course. As highlighted by Walsh (2010), the challenge lies not only in the adoption of new tools but also in a profound shift in teacher training and pedagogical approaches. Many educators, traditionally trained in text-centric methodologies, may find it daunting to navigate and integrate multiple modes of communication effectively. This transition demands a deeper understanding of how different mediums contribute to the acquisition of knowledge. It is about comprehensively rethinking the role of the teacher and the pedagogical strategies in a multimodal classroom. Educators must be equipped not just with the technical know-how of using various tools but with the pedagogical insight to understand how these tools and modes can be seamlessly and meaningfully integrated into the learning process. This involves recognising that each medium, whether it be visual, auditory, or kinesthetic, has its unique way of facilitating learning and that these modes must be used strategically to complement and enhance the linguistic content of the lessons.

Another critical challenge is the development of assessment methods that accurately reflect students' multimodal learning experiences. Traditional assessment methods may not capture the breadth of skills and competencies developed through multimodal learning. Developing new assessment frameworks that can holistically evaluate these skills is crucial but remains a complex and ongoing challenge (The New London Group, 1996).

And while there is growing enthusiasm for multimodal approaches, there is still a need for more empirical research to substantiate its effectiveness in ELT. As Mayer (2009) argues, while multimedia learning theories provide a foundation, specific research focusing on language acquisition through multimodality is necessary. Such research can inform best practices and help educators implement multimodality more effectively.

For such an end, addressing these challenges requires a collaborative effort among educators, researchers, material developers, and policy makers. As Jewitt (2008) suggests, a shared understanding and continuous dialogue among these stakeholders are essential for the successful integration of multimodality in ELT.

In the pursuit of effectively implementing multimodality in ELT, several critical considerations arise, demanding careful reflection and strategic planning.

Inclusivity and Cultural Sensitivity: Central to the discussion is the question of how multimodal ELT can be tailored to embrace cultural sensitivity and inclusivity. This consideration is paramount, given the diverse backgrounds of language learners. It invites an exploration of ways in which multimodal resources can represent a wide spectrum of cultural narratives, ensuring that all students find relevance and connection in their learning materials. The challenge lies in creating content that not only acknowledges but also celebrates diversity, thereby fostering an inclusive learning environment.

Learner Autonomy and Engagement: Another dimension to consider is the potential of multimodality to enhance learner autonomy and engagement. This aspect provokes a deeper inquiry into how various modes of communication can be employed to empower learners, facilitating a shift from passive reception to active participation in the learning process. The implications for learner motivation and self-efficacy are substantial, as a more engaged and autonomous learner is likely to exhibit higher levels of motivation and confidence in their language learning journey.

Balancing Modes and Overload: Lastly, the effective balance of multiple modes in teaching without leading to cognitive overload is an area that warrants careful consideration. The goal is to identify strategies that allow educators to harness the benefits of multimodal teaching while avoiding the pitfalls of overstimulation. Finding this equilibrium is crucial, as it ensures that the integration of various modes enhances, rather than detracts from, the learning experience. The challenge is to discern the optimal blend of modalities that enriches the learning process while maintaining a clear and focused educational trajectory.

These considerations highlight the nuanced complexities involved in implementing multimodal approaches in ELT. They underscore the need for a thoughtful, well-informed approach to multimodal curriculum design and delivery, ensuring that the integration of various communicative modes achieves its intended educational outcomes.


Multimodality in ELT has the potential to transform language education, as it goes beyond the limitations of conventional methods. This change of perspective towards combining different modes of communication aims to improve the learning experience, reflecting more accurately the intricacies of real-world communication. However, multimodality should not be dismissed as a passing fad full of meta-language and other catchphrases, like the multiple intelligences movement was, but a solid, pedagogically sound approach backed by empirical research.

Having established this, the successful integration of multimodality in ELT faces several challenges, including resource disparities, the need for revised teacher training methodologies, and the development of new assessment frameworks. These challenges demand a collaborative and ongoing effort across the educational spectrum, involving educators, researchers, material developers, and policymakers. The critical reflection on these challenges, coupled with a proactive approach to addressing them, is essential for realising the full potential of multimodal education.

Moreover, multimodality brings to the forefront important considerations such as inclusivity, cultural sensitivity, learner autonomy, and the need to balance various modes to avoid cognitive overload. These considerations are not merely theoretical; they are practical necessities that shape the effectiveness of multimodal approaches in diverse educational settings.

In conclusion, while multimodality in ELT presents an exciting opportunity to enrich language teaching and learning, it requires a thoughtful, evidence-based, and inclusive approach. Embracing this complexity will enable educators to provide a more engaging, comprehensive, and effective language learning experience that prepares students for the intricacies of modern communication. As the field of ELT continues to evolve, multimodality stands as a pivotal element in the ongoing journey towards educational innovation and excellence.


Jewitt, C., & Kress, G. (2003). Multimodal literacy. Peter Lang.

Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. Routledge.

Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. Basic Books.

Waterhouse, L. (2006). Inadequate evidence for multiple intelligences, Mozart effect, and emotional intelligence theories. Educational Psychologist, 41(4), 247-255.

Willingham, D. T. (2004). Reframing the mind: Howard Gardner and the theory of multiple intelligences. Education Next, 4(3), 19-24.

Bezemer, J., & Kress, G. (2016). Multimodality, learning and communication: A social semiotic frame. Routledge.

Mayer, R. E. (2009). Multimedia learning. Cambridge.

Stec, M. (2019). Identity And Multimodality Of Cultural Content In ELT Coursebooks For Yls. The European Proceedings of Social & Behavioural Sciences, 72, 274-288.

Wei, L., Xie, L., Zhou, W., Li, H., & Tian, Q. (2022). MVP: Multimodality-guided Visual Pre-training. European Conference on Computer Vision, 1-17.

Yanti, G. S., & Nurhidayah, R. (2020). Practices on Technology Integration in ELT: A Review on Existing Researches. Briliant: Jurnal Riset dan Konseptual, 5(2), 292-306.

Cimasko, T., & Shin, D.-s. (2008). Multimodal Composition in a College ESL Class: New Tools, Traditional Norms. Computers and Composition, 25(4), 376-395.

Walsh, M. (2010). Multimodal literacy: What does it mean for classroom practice? The Australian Journal of Language and Literacy, 33(3), 211-239.

The New London Group. (1996). A Pedagogy of Multiliteracies: Designing Social Futures. Harvard Educational Review, 66(1), 60-92.

Jewitt, C. (2008). Multimodality and Literacy in School Classrooms. Review of Research in Education, 32(1), 241-267.

Author Biography

Frank Carrizo Zirit is an experienced DELTA-certified English teacher specialising in teaching English as a foreign language to adults. He holds qualifications from prestigious institutions such as Cambridge Assessment English, Arizona State University, and the University of California Irvine. With over 20 years of experience, Frank is highly skilled in academic and business writing, linguistic studies, and teaching English for exam preparation. He is a Speaking Examiner for Cambridge Exams and has trained new examiners as a Team Leader. Additionally, Frank has conducted professional development workshops, hosted a popular podcast on exam preparation called “What You Say in English,” and has had research interests in exams, bilingualism, varieties of English, and teaching English as a Lingua Franca.