Instructional techniques that are highly effective with inexperienced learners can lose their effectiveness when used with more experienced learners.
Instructional techniques that are highly effective with inexperienced learners can lose their effectiveness and even have negative consequences when used with more experienced learners.
In this edition, I summarize The Expertise Reversal Effect (2003) from Sweller, J., Ayres, P. L., Kalyuga, S. & Chandler, P. A.
This is reposted from my site, with some edits for clarity. I’m wondering if I should write more paper summaries like this, so please comment or reply if you like it!
The paper walks through a number of pedagogical strategies based on Cognitive Load Theory and demonstrates that most of them have a marked ‘flip’ in results depending on the level of expertise of the learner.
The full paper (pdf) is only 8 pages, so it’s a quick read (if you’ve got the appropriate schemas 😉).
Background: Cognitive Load Theory
Instructional effectiveness and expertise
Expertise Reversal and the Split Attention and Redundancy effects
Text Processing and Expertise Reversal
But what about modalities?
Worked Examples and Expertise Reversal
Interacting elements and expertise reversal
Imagination effect and expertise reversal
1. Cognitive Load Theory
To make sense of this paper, you need to know about Cognitive Load Theory. Here’s the Wikipedia entry on Cognitive Load if you want a quick primer.
The paper describes lots of instructional techniques. The cognitive science explanations for why the techniques work all come down to limits on working memory.
Cognitive Load Theory says that there are only a few “slots” available for “chunks” of information to fit in working memory at a time. If you try to hold too many chunks in your head at once, you get reduced processing ability.
The fix is schemas. Learners build schemas that make concepts fit into less working memory.
Because of the limited capacity of working memory, the proper allocation of available cognitive resources is essential to learning. If a learner has to expend limited resources on activities not directly related to schema construction and automation, learning may be inhibited.
In Cognitive Load Theory, all learning is framed in light of working memory limits. Learners acquire new schemas, then through practice, make schema use automatic instead of effortful. Schema use reduces the working memory burden. Expertise means having lots of schemas, practiced to the level of automatic use. Experts can handle complex tasks because their schemas reduce working memory demands.
This all makes intuitive sense to me. When there’s too many new facts and terms, I get overwhelmed and can’t really process the material. After spending time learning, I am comfortable with how the ideas fit together, so I can fit in new facts more easily without getting overwhelmed.
2. Instructional effectiveness and expertise
The goal of instruction is to scaffold the construction of schemas.
Novice learners have fewer schemas in place, and therefore, less ability to organize new information. Effective instruction can substitute for missing schemas by structuring new information, like “pre-chewing” the tough new knowledge to make it easier to digest. Instructors can also model the schemas for learners. Instructors show how they use the schemas, and their example helps learners build schemas faster. Without structuring from an instructor, learners are more prone to cognitive overload, which limits learning.
Expert learners already have some schemas in place to guide them in dealing with a new task. If instruction provides guidance that’s helpful for novices, it may be redundant for experts. Experts still need to process the redundant information. The added guidance still requires attention, i.e. it takes up working memory, so it might be distracting.
The overlap of schema-based guidance already in experts’ heads and guidance from instruction can lead to cognitive overload. That’s the Expertise Reversal effect in brief: guidance that is useful for novices can be negative for experts.
The rest of the paper explores situations where this effect shows up.
3. Expertise Reversal and the Split Attention and Redundancy effects
First, let’s define what these effects are.
The Split Attention Effect
Separating sources adds cognitive load, because learners have to search and match between representations.
If you have a diagram and explanatory text side-by-side, readers have to scan back and forth to match up concepts. The scanning back and forth adds cognitive load, limiting learning. If you integrate the text with the diagram, it reduces the load from the searching and matching.
This effect is similar for text shown now and text shown later. If you have to think about a previous slide in a deck and compare it to the current slide in your mind, that adds cognitive load, compared to showing them at once.
Spatially and chronologically integrated materials reduce cognitive load for new learners.
The Redundancy Effect
If multiple sources of info are necessary for learning a concept, integrating them is good. However, if they could stand on their own, eliminating the redundant one is better.
At a glance, this is surprising! Adding more information can hurt learning? On reflection, it makes sense. If many representations of an idea are all shown at once, that’s more to process, which makes it harder.
One source alone is better than redundant sources. I would naively expect that repetition and presenting information in multiple modalities (as text and as an image), would make something easier to learn. But, counter to that expectation, adding redundant sources of information will increase cognitive load and limit learning.
The way I understand this effect is to think about the level of cognitive load in a particular moment. Over time, seeing something repeated or presented in different ways might aid learning, but not if it leads to cognitive overload. In the space of a single slide, adding more information can be overwhelming, and therefore worse for learning.
Okay, acknowledging the Split Attention and Redundancy Effects, how do you design an individual slide? The Split Attention Effect says that you should present the necessary information together, to reduce the cognitive load from scanning and matching. But, the Redundancy Effect says to not present more information at once than is necessary.
This seeming contradiction is what the Expertise Reversal Effect attempts to resolve.
A source of information that is essential for a novice may be redundant for someone with more domain-specific knowledge.
Inexperienced trainees benefitted from textual explanations integrated into the diagrams (to reduce split attention). However, more experienced trainees performed significantly better with the diagram-only format. For these more knowledgeable learners, the textual information, rather than being essential and so best integrated with the diagram, was redundant and so best eliminated.
This is the Expertise Reversal Effect. Adding expertise as a dimension resolves the contradiction between the Split Attention and Redundancy effects.
4. Text Processing and Expertise Reversal
When reading about new concepts, verbose and detailed explanations can help inexperienced learners. Learners with more expertise get distracted by the additional explanatory text, and benefit from minimal text.
Less knowledgeable learners benefited from additional explanatory material, but more knowledgeable learners were better able to process the material without the additions.
Text that is minimally coherent for novices may well be fully coherent for experts. Providing additional text is redundant for experts and will have negative rather than positive effects.
Expertise Reversal again! Another domain where novice learners need something that would be detrimental to expert learners.
5. What about multiple modalities?
I mentioned above that presenting information in multiple modalities could help learners — in a way that seems to contradict the Redundancy Effect. The authors have a CLT-based explanation for why multi-modal learning works:
[The] capacity to process information is distributed over several partly independent subsystems.
Spreading the load across different systems means more total working memory available. The authors consider visual and auditory systems as having semi-independent working memory capacities:
Many studies have demonstrated that learners can integrate words and diagrams more easily when the words are presented in auditory form rather than visually.
Visual working memory is one ‘bucket’ that gets filled by an image or diagram, so doesn’t have capacity for a textual explanation of that diagram. But there’s another bucket — the auditory working memory bucket — that students can use for the words.
Seeing and hearing at the same time spreads the load across more available cognitive resources. Less chance of cognitive overload, so better learning.
What about the Redundancy Effect, and Expertise Reversal?
auditory explanations may also become redundant when presented to more experienced learners
Adding auditory explanation detracts from learning, based on learners’ level of experience! Experts get distracted by additional material that would benefit novices.
As an instructor or experience designer, adding additional explanatory material or additional modalities will make things better for novices, but it might hurt more advanced learners.
6. Worked Examples and Expertise Reversal
Worked examples are problems presented with along with solution steps. Worked examples are often more effective than other problem-solving based teaching strategies. For instance, guided tutorials work better than unstructured exploration for introducing a concept to beginners.
However, for experts, worked examples add cognitive load. Advanced learners do better working through the problems on their own, without having the steps laid out for them.
The description of the experiment from the paper:
Inexperienced mechanical trade apprentices were presented with either a series of worked examples to study or problems to solve. On subsequent tests, inexperienced trainees benefited most from the worked examples condition. Trainees who studied worked examples performed better with lower ratings of mental load than similar trainees who solved problems, duplicating a conventional worked example effect. With more experience in the domain, the superiority of worked examples disappeared. Eventually, with sufficient experience, additional learning was facilitated more by problem solving than through studying worked examples. The worked examples became redundant and problem solving proved superior, demonstrating another expertise reversal effect.
Before you’ve seen someone do something, it’s overwhelming to be ‘thrown in the deep end’ and try to solve things on your own. However, after you’ve seen someone else demonstrate a skill, you benefit most from trying it on your own.
Many types of support reduce cognitive load for beginners, but add cognitive load for experts. The other implication is that when instructors give too much help, they miss an opportunity to allow more expert learners to practice their schemas.
Inexperienced learners benefited most from an instructional procedure that placed a heavy emphasis on guidance. Any additional instructional guidance (e.g., indicating a goal or subgoals associated with a task, suggesting a strategy to use, providing solution examples, etc.) should reduce cognitive load for inexperienced learners, especially in the case of structurally complex instructional materials. At the same time, additional instructional guidance might be redundant for more experienced learners and require additional working memory resources to integrate the instructional guidance.
Unsurprisingly, experience is a gradient. For all of these effects, we see a gradual fading out and then crossing over, not a sudden flip.
7. Interacting elements and expertise reversal
Systems with lots of interacting elements are hard to learn.
There’s a double bind — you need to keep all the pieces in your head in order to understand how the pieces work individually, but you become cognitively overloaded from trying to keep all the new things in your head at once.
It’s a chicken-and-egg problem for instructors: learners need the schemas to reduce cognitive load, but before they have the schemas, they have too much cognitive load to learn effectively.
For example, learning the syntax of a foreign language requires you to keep the track of relations between all the different parts of speech in your head. You need to hold ‘how nouns work’ and ‘how verbs work’ and ‘how modifiers work’ all at once, which is hard (and prone to cognitive load failure). Conversely, learning a new language’s vocabulary, while time-intensive, only requires a few new items in working memory at a time. It might be boring, but it’s not confusing or overwhelming the way that learning grammar is.
The instructional solution for teaching concepts with interacting parts is to present a simplified (but false) model to help learners build a partial schema. That way, learners have some tools in place when they encounter the full system with all its interacting parts.
This matches my mental image of scaffolding. The instructor puts up a fake structure to hang on to, so that students can eventually handle the real, complex model. Instructors help students manage complexity by hand-waving, over-simplifying, and ignoring what they can, until students have built the necessary schemas.
Interestingly, the instructional strategy suggested here did not result in the full expertise reversal effect. Experienced learners showed no difference in effectiveness between the mixed approach (isolated elements followed by interacting elements) and the conventional method (interacting elements instruction during both stages).
Since there’s no added cognitive load for advanced learners, it's safe for teachers to explore false-but-suggestive isolated elements models with students of any level as a way of building up to the truer interacting elements model. Cool!
8. Imagination effect and expertise reversal
This effect has a great name, and seems like an underrated strategy in general, so I’m glad the authors bring it up. Here’s how the technique works:
Instead of giving students actual problems to work through, or having them study worked examples, the instructor prompts students to imagine the steps they would take to solve a problem. Imagining the steps encourages automation of schemas, which improves learning. Given that the students have enough experience with the concept, the technique is more effective than worked examples!
The imagination effect “turns on” as learners gain more experience. When students don’t have sufficient experience, worked examples are more effective. The explanation in the paper is in terms of working memory overload — imagining works if schemas are in place. Without the schemas, learners’ working memory gets overloaded, since they have to process too many individual components.
To me, it seems like instead of working memory overload, this is probably a recall issue. Students that don’t have enough experience aren’t overloaded by too much to process, they don’t have enough in their head to usefully imagine a procedure.
The working memory explanation seems like it’s probably just the authors interpreting everything through the working memory lens. It’s super hard to inspect the contents of learners minds as they are instructed to imagine a problem solving procedure. We can’t tell whether they’re imagining but overloaded, or just squeezing their eyes closed and pretending to imagine.
Instructional design should account for cognitive load. But, if you knew about CLT before, you already knew that. So, what’s new?
A lot of it can be summed up as “know your student”. If your target student is a novice, add more support. If your target student has incorporated some schemas already, that same support could be distracting.
Materials should recognize the level of the learner in terms of 1) the schemas they have incorporated and 2) the level of automation with those schemas. One idea that struck me for instructional design practice was a ‘cognitive load audit’. Count the number of new concepts learners have to hold in memory at any given time in a course, and learners’ expected level of automaticity of schemas, and figure out where the course is prone to cognitive overload. The goal is to always present the material “just right” for the learner — not too much information, not too little.
Since the paper demonstrates expertise reversal for so many different effects, it also serves as a survey of findings in the broader Cognitive Load Theory literature. The Split Attention, Redundancy, Text Processing, Worked Examples, Interacting Elements, and Imagination Effects are each worth digging into and applying on their own.
Now that you’ve got some of the schemas in place (and if you’re still curious), maybe read the paper itself: The Expertise Reversal Effect (2003). If you’re curious to dig into this literature more, the references section is a jumping off point to tons of other interesting papers.
Thanks for reading!
Let me know if paper summaries like this are interesting or helpful to you, and I’ll do more. If there’s particular papers you think are interesting, send me a link!