Cognitive Load and the PI theory

Theoretical Background

Tao Yu
(edited by Rik Min)

This text originally was one of the chapters of the master thesis of Yu Tao
(projectleader Rik Min; supervisor: J. Moonen)

Web-based learning works not only as a new concept but also as a new model, new style and new thinking of learning. In web-based learning environment today, it is no doubt that the integration between tradtional learning theories and new-built theories will work for a long time and have a great influence. Here let us have a brief overview of the philosophies and learning theories behind or connected with the concept of Parallelism and the PI theory.

In section 1, I will introduce the philsophies about learning . Section 2 will give a description of concerned theories with Parallelism and the PI theory. Finally I give a example of web-page design familiar (following) and relating with Parallelism and the PI theory in the section 3.

1. Philosophies about learning

Learning design is an applied, decision-oriented field. As novice in this field it is no doubt that we need more philosophical and theoratical base for my ability to engage in design, redesign and application. Philosophies and theories are the source of principles from which many of the prescriptions for design and redesign arise, and my understanding from this base will help me to learn that implement the knowledge. Of course treating philosophy and theory involves the relationship of specialists and scholars to their field of study and practice along with my relationship to that field, which can allow me to share with other professioanls in that field. I believe these philosophies and theories can provide rationale for many of my decisions.

1.1 Constructivism

Constructivism is an educational philosophy within a larger category of philosophies that are described as "rationalis." A rationalist philosophy is characterized by the belief and the assumption that reason is the primary source of knowledge and that reality is constructed rather than discovered. Most rationalists would propose that there is not a single reality to be discovered, but that each individual has constructed a personal reality. (Smith & Ragan, 1999)

A foundational tenet of constructivism is the assumption that "knowledge is not transmitted: it is constructed." Now three major assumptions covering most of the constructivism assuptions are overviewed and concluded by Simth and Ragan (1999) as following:

Individual Constructivism

The key assumption of individual constructivism are the following:

Knowledge is cinstructed from experience;
Learning results from a personal interpretation of knowledge;
Learning is an active process in which meaning is developed on the basis of experience

Social Constructivism

One key assumption follows:

Learning is collaborative with meaning negotiated from multiple perspectives.

Contextualism

The key assumptions of contextualism are the following:

Learning should occur (or be "situated") in realistic setting;
Testing should be integraed into the task, and is not a separate activity.

Here I agree with Cobb’s (1996) conclusions regarding three major instructional implication of constructivism:

Priority should be given to the development of meaning and understanding rather than the training of behavior.
Researchers and teachers should assume that students’ are rational given the way that they currently make sense of things.
Students’ errors and unanticipated responses should be viewed as occasions to learn about students’ understanding.

Tenets of constructivism encourage instructioanl designers to increase the care of their consideration of the intentionality of the learners. In our design and redesign of the experiment, we can follow many important principles offered by constructivism.

1.2 Empiricism

Empiricism, sometimes termed objectivism, postulates that knowledge is acquired through experience. Most empiricists would propose that this experience allows an individual to come to know a reality that is objective and singular. That is, most experience is defined as sensory experience, as oposed to any "experience" that one might obtain through a "mental life" of reconceptualization and interpretation. Empiricism is also oftern typified by "reductionism", efforts to reduce complex entities to their more simple components, and "associationism" , a tendency to relate ideas if they are experienced contiguously in either space or time. (Smith & Ragan, 1999) Since my project is an empirical study, I will follow the belief of empiricism and employ the experimenation and seek to draw generalizations based upon data, because I agree that a valuing of experimentation and generalization are clearly qualities of empiricists.

1.3 Pragmatism

Pragmatism might be considered a "middle ground" between rationalism (constructivism) and empiricism (Driscoll, 1994). Although pragmatists, like empiricists, believe that knowledge is acquired through experience, they believe that this knowledge is interpreted through reason and is temporary and tentative. Pragmatists propose that the question of whether there is a "real" reality is an unproductive questions, since, if there is a reality, it can never be totally known. Pragmatists suggest that knowledge in a particular field be negotiated based upon an agreement of experts as to a common interpretation of experience. They would describe knowledge in terms of "truth for now". So Pragmatists propose that knowledge be built up by testing this "truth for now " hypothesis and revising or discarding this "truth" as common experience as common experience and interpretation implies it should be modified. (Smith & Ragan, 1999)

Of course this philosophy works in every instructional and learning design since it so emphasizes the testing and checking. But I still share with the empiricists a valuing of testing knowledge through the accumulation of data, and a belief that there are some generalizable principles of learning that can be "discovered."

2. Theories concerned with Parallelism and the PI theory

After reviewing the learning philosophies, I would like to introducing some relevant theories on which either the PI theory is based or has a closed relation. A overview of these theories can both deepen comprehension of the concept of Parallelism and the PI theory and make a reference as well as preparation to the experiment design. Since Parallelism and the PI theory emphasize the importance of parallel, simultineous information, so the right time, the sequence and the distance of the output of information are the keys. In the following theories mentioned, some important concepts such as coginitive load, cognitive architecture, split attention and so on will be introduced.

2.1 Overview of the Cognitive Load Theory

The Cognitive Load Theory (CLT) is an internationally well known and widespread theory, which has been empirically confirmed in numerous studies. The Cognitive Load Theory suggests that learning happen best under conditions that are aligned with human cognitive architecture. In his article, Kirschner (2001) gave an overview of CLT.

Cognitive architecture: memory and schemas

Short-term or working memory (STM or WM) is what you are using at this very moment to process this text (stimuli have entered your sensory register through attention and recognition). You use it for all of your conscious activities and it is the only memory that you can monitor. Everything else- content and function- is concealed until brought into working memory. A problem, especially for instruction designers, is that it is limited to about seven items or elements of information at any one time (Miller, 1956; Baddeley, 1992). Furthermore, because working memory is also used to organize, contrast, compare or work on that information, you probably can only process two or three items of information simultaneously as opposed to merely holding that information. Finally, working memory is seen not as one monolithic structure, but rather a system embodying at least two mode-specific components: a visual-spatial sketchpad and a phonological loop coordinated by a central executive. Long-term memory (LTM) is, in contrast, what you use to make sense of and give meaning to what you are doing now. It is the repository for more permanent knowledge and skills and includes all things in memory that are not currently being used but which are needed to understand (Bower, 1975). Most cognitive scientists believe that the storage capacity of LTM unlimited and that is a permanent record of everything that you have learnt. You are not directly conscious of LTM. Awareness of its contents and functioning is filtered through working (conscious) memory.

Human cognition, thus, places its primary emphasis on the ability to store seemingly unlimited amounts of information including large, complex interactions and procedures in LTM. Human intellect comes from this stored knowledge and not from long, complex chains of reasoning in working memory, which is incapable of such highly complex interactions using elements not previously stored in LTM. It follows, that instruction (and instructional design) that require learners to engage in complex reasoning processes involving combinations of unfamiliar elements are likely to present problems and not work well. Instructions, thus, must consider how is this information stored and organized in LTM so that it is accessible when and where it is needed.

According to schema theory, knowledge is stored in LTM in schemata. Schemata categorize information elements according to how they will be used (Chi, Glaser, & Rees, 1982). A schema can hold a huge amount of information, yet is processed as a single unit in working memory. Schemata can integrate information elements and production rules and become automated, thus requiring less storage and controlled processing. Skilled performance consists of building increasing numbers of increasingly complex schemas by combining elements consisting of lower level schemas into higher level schemas.

Schemas can also reduce working-memory load. Although working memory can process only a limited number of elements at a time, the size, complexity, and sophistication of elements is not. A schema can be anything that has been learnt and is treated as single entity. If learning has occurred over a long period of time, a schema may incorporate a huge amount of information. In summary, schema construction aids the storage and organization of information in long-term memory and reduces working memory load.

Cognitive load

The CLT assumes a limited working memory connected to an unlimited long-term memory (Baddeley, 1986). As a result of this limitation instruction should be designed such that working memory is capable of processing the instruction. The CLT, thus, is concerned with the limitations of working-memory capacity and the measures that can be taken to promote learning, that is the construction of schemata, by imposing adequate levels of cognitive load (CL).

Working memory load is affected by the inherent nature of the material (intrinsic CL) and by the manner in which the material is presented (extraneous and germane CL). The following is a short explication of these three aspects of CL. Learning, reflected by performance change, requires working-memory capacity. That is, it imposes a germane CL on the learner (Sweller, van Merrienboer, & Paas, 1998). Germane CL is required for the construction and storage of schemata into long-term memory. The construction of adequate and rich schemata is especially important in complex learning tasks where it will require more effort, because the elements contained by the to-be-learned material are highly interconnected. This is referred to as intrinsic CL, which is the portion of load that is imposed by the intrinsic characteristics of the task of subject matter. According to CLT the limitations of working memory are rarely taken into account in conventional instruction. Conventional instructions tend to impose an extraneous CL on working memory, whereas learning something requires shifting form extraneous to germane CL.

The CLT states that the instructional interventions cannot change the intrinsic CL because this is ceteris paribus intrinsic to the material being dealt with. Extraneous and germane CL, however, are determined by the instructional interventions, extraneous CL is the effort required to process poorly designed instruction, whereas germane CL is the effort that contributes, as stated, to the construction of schemas. Appropriate instructional designs decrease extraneous CL but increase germane CL, provided that the total CL stays within the limits.

The difference between an expert and a novice is that a novice hasn't acquired the schemas of an expert. Learning requires a change in the schematic structures of long term memory and is demonstrated by performance that progresses from clumsy, error-prone, slow and difficult to smooth and effortless. The change in performance occurs because as the learner becomes increasingly familiar with the material, the cognitive characteristics associated with the material are altered so that it can be handled more efficiently by working memory.

From an instructional perspective, information contained in instructional material must first be processed by working memory. For schema acquisition to occur, instruction should be designed to reduce working memory load. Cognitive Load theory is concerned with techniques for reducing working memory load in order to facilitate the changes in long term memory associated with schema acquisition.

Sweller's theories are best applied in the area of instructional design of cognitively complex or technically challenging material. His concentration is on the reasons that people have difficulty learning material of this nature. Cognitive load theory has many implications in the design of learning materials which must, if they are to be effective, keep cognitive load of learners at a minimum during the learning process. While in the past the theory has been applied primarily to technical areas, it is now being applied to more language-based discursive areas.

Principles:

Specific recommendations relative to the design of instructional material include:

1. Change problem solving methods to avoid means-ends approaches that impose a heavy working memory load, by using goal-free problems or worked examples.
2. Eliminate the working memory load associated with having to mentally integrate several sources of information by physically integrating those sources of information.
3. Eliminate the working memory load associated with unnecessarily processing repetitive information by reducing redundancy.
4. Increase working memory capacity by using auditory as well as visual information under conditions where both sources of information are essential (i.e. non-redundant) to understanding. [Online available at http://tip.psychology.org/sweller.html]

The Cognitive Load theory (CLT) (according Schweller and others) give us a theoretical basis to the Parallel Instruction Theory of Min (the PI theory), on the basis of human memory theories. According to the Cognitive Load Theory users may get an information overload when there is too much information that is presented parallel (Schneider and Shiffrin, 1977; Shiffrin and Schneider, 1977). However, if the effort used to process large portions of information enhances learning, cognitive load theory speaks of germane cognitive load. (Min, 2001)

Several instructional rules, with respect to parallel presentation of information, were derived from the CLT:

Split-Attention Effect: if two sources of information, that have to be integrated to be understood, are provided separately (for example, a diagram and accompanying statements), 'students' perform worse than if these sources of information are integrated physically (Mousavi, Low, and Sweller, 1995).

Redundancy Effect: if two sources of information, that each represent the same information in a different manner, are provided simultaneously, 'students' perform worse, than if only one of these sources is used.

Modality effect: if visual information is presented simultaneously with auditory information, information processing is better than if visual information is presented simultaneously with visual information (Baddeley, 1992).

2.2 Overview of the Split Attention Theory

The Split Attention Theory, sometimes only called Split Attention Effect, is developed by Sweller and his colleagues. Simply speaking, the split-attention effect indicates that the use of different information sources causes a higher cognitive load of working memory, and therefore impedes the learning process. That means that the text and corresponding parts of the picture are not perceived simultaneously causing a split-attention effect. Users may have a split-attention problem due to the need to attend simultaneously to different media.

Acording to the cognitive theory of multimedia learning, the visual and verbal channels are limited in capacity. When words are presented as on-screen text they must be processed- at least initially - through the visual system along with the animation. In this way, the test competes for visual attention with the animation creation what Mousavi, Low, and Sweller (1995) call a split-attention effect. When words are presented as narration they can be processed in the verbal channel, thereby freeing capacity in the visual channel that can be devoted to processing the animation more deeply. In this way, spoken text serves to reduce the load on the visual chnnel and to increase the chances for deeper cognitive processing (Mayer, & Moreno, 2000).

The rationale for presenting only animation and narration is that the addition of on-screen text could overload visual working memory. Adding on-screen text can create a split-attention effect in which students must look both at the animation and the text, thereby missing out on some of the presented material. When visual working memory is overloaded, there is less cognitive energy to build connections between visual and verbal representations.

Principle: When giving a multimedia explanation, present words as auditory narration rather than as visual on-screen text.

According to Baddeley (1992) working memory consists of an 'auditory part' (phonological loop) and a 'visual part' (visual-spatial scratch pad). These memory parts are considered as independent traces. As a consequence a combination of visual and auditory information results in better processing than the combination visual (text) and visual (picture) (Chandler & Sweller, 1996).

As far as to parallel presentation of visual information, the benefits and losses of presenting visual information in a parallel manner are:

Benefits:

Serves as external memory;
Provides context information;
Decreases search for relevant information;
Increases possibility to compare and associate related information.

Losses:

Shallower information processing;
Increases chance of an information overload.

2.3 Overview of the Dual Codes Theory

The dual coding theory proposed by Paivio (1986) attempts to give equal weight to verbal and non-verbal processing. Paivio stated: "Human cognition is unique in that it has become specialized for dealing simultaneously with language and with nonverbal objects and events. Moreover, the language system is peculiar in that it deals directly with linguistic input and output (in the form of speech or writing) while at the same time serving a symbolic function with respect to nonverbal objects, events, and behaviors. Any representational theory must accommodate this dual functionality" (Paivio, 1986).

The theory assumes that there are two cognitive subsystems, one specialized for the representation and processing of nonverbal objects/events (i.e., imagery), and the other specialized for dealing with language. Paivio also postulates two different types of representational units: ‘imagens’ for mental images and ‘logogens’ for verbal entities which he describes as being similar to ‘chunks’ as described by Miller. Logogens are organized in terms of associations and hierarchies while imagens are organized in terms of part-whole relationships.

The Dual Coding theory identified three types of processing: (1) representational, the direct activation of verbal or non-verbal representations, (2) referential, the activation of the verbal system by the nonverbal system or vice-versa, and (3) associative processing, the activation of representations within the same verbal or nonverbal system. A given task may require any or all of the three kinds of processing.

(http://tip.psychology.org/paivio.html)

The Dual Coding theory has been applied to many cognitive phenomena including mnemonics, problem solving, concept learning and language. The Dual Coding theory accounts for the significance of spatial abilities in theories of intelligence. Paivio (1986) provides a dual coding explanation of bilingual processing. Clark & Paivio (1991) present dual coding theory as a general framework for educational psychology.

The principle of the Dual Coding theory emphasizes that recall/recognition is enhanced by presenting information in both visual and verbal form. While parallelism and the PI theory also take this point as its key. Min always pays an attention to make full use of manual, the instructional sheets and books to parallel the instruction. In our design and redesign of the experiment, the instruction information will be delivered on the sheets by the coach. In certain degree, we can conclude that the Dual coding theory is the theoretical base of the concept of parallelism and the PI theory and influences the PI theory deeply.

2.4 Overview of the Cognitive Flexibility Theory

The Cognitive Flexibility theory focuses on the nature of learning in complex and ill-structured domains. Spiro & Jehng (1990, p. 165) state: "By cognitive flexibility, we mean the ability to spontaneously restructure one's knowledge, in many ways, in adaptive response to radically changing situational demands...This is a function of both the way knowledge is represented (e.g., along multiple rather single conceptual dimensions) and the processes that operate on those mental representations (e.g., processes of schema assembly rather than intact schema retrieval)."

The theory is largely concerned with transfer of knowledge and skills beyond their initial learning situation. For this reason, emphasis is placed upon the presentation of information from multiple perspectives and use of many case studies that present diverse examples. The theory also asserts that effective learning is context-dependent, so instruction needs to be very specific. In addition, the theory stresses the importance of constructed knowledge; learners must be given an opportunity to develop their own representations of information in order to learn properly. [Online available at http://tip.psychology.org/spiro.html]

Principles of the Cognitive Flexibility Theory:

1. Learning activities must provide multiple representations of content.
2. Instructional materials should avoid oversimplifying the content domain and support context-dependent knowledge.
3. Instruction should be case-based and emphasize knowledge construction, not transmission of information.
4. Knowledge sources should be highly interconnected rather than compartmentalized. [Online available at http://tip.psychology.org/spiro.html]

Among the principles of the Cognitive Flexibility theory, the principle one and principle four are very close to the principles of Parallelism and the PI theory, which propose to provide parallel information and make the "knowledge sources" interconnected. Through comparing and contrasting these two theories, we can find that they share the same or similar point of views in instruction and learning methodology, which could be apply to larger scope of education domain.

2.5 Overview of Information Processing Theory

George A. Miller has provided two theoretical ideas that are fundamental to cognitive psychology and the information-processing framework.

The first concept is "chunking" and the capacity of short-term memory. Miller (1956) presented the idea that short-term memory could only hold 5-9 chunks of information (seven plus or minus two) where a chunk is any meaningful unit. A chunk could refer to digits, words, chess positions, or people's faces. The concept of chunking and the limited capacity of short-term memory became a basic element of all subsequent theories of memory.

The second concept is TOTE (Test-Operate-Test-Exit) proposed by Miller, Galanter & Pribram (1960). Miller et al. suggested that TOTE should replace the stimulus-response as the basic unit of behavior. In a TOTE unit, a goal is tested to see if it has been achieved and if not an operation is performed to achieve the goal; this cycle of test-operate is repeated until the goal is eventually achieved or abandoned. The TOTE concept provided the basis of many subsequent theories of problem solving (e.g., GPS) and production systems. (http://tip.psychology.org/miller.html)

Information processing theory has become a general theory of human cognition; the phenomenon of chunking has been verified at all levels of cognitive processing. How to overcome the limitation of shot term memory and make full use of ‘meaningful chunks’ should be the focus. Parallelism and the PI theory can offer help on this point by providing parallel meaningful information, which can reduce the extraneous cognitive load to encourage the learners to engage in conscious cognitive processing. As to the concept of TOTE, the design with Parallelism and PI theory could adopt this model to improve and develop appropriate instruction.

Principles of Information Processing Theory:

Short-term memory (STM) (or attention span) is limited to seven chunks of information.
Planning (in the form of TOTE units) is a fundamental cognitive process.
Behavior is hierarchically organized (e.g., chunks, TOTE units) (http://tip.psychology.org/miller.html).

3. Examples relevant to the application of Parallelism and the PI theory

Theories come from experience and practice and can be examined in practice. Many experiments and updated web sites offered examples guided or concerned with the application of Parallelism and the PI theory.

3.1 Experiment guided by the PI theory

Ter Burg and Groenewoud investigated in their report the effect of information that was simultaneously presented on multiple screens (condition 1), compared to information that was presented on one screen with multiple windows (condition 2). On the screen with multiple windows the information was not continuously presented parallel, but was continuously directly available (second order parallelism). Subjects had to perform a task with a simulation program, CARDIO, about the cardiovascular system and the blood pressure regulation of the human body. In this program medicines had to be administered so that the blood pressure changed. These changes had to be registered in tables. Results showed that task performance (percentage correct answers) was better in the first condition with multiple screens. A questionnaire showed that subjects (students at the University of Twente) also slightly preferred information that was presented on two screens. In addition, although there were no differences found in working speed and perceived usability comfort, subjects thought they had worked faster in the multiple screen condition. Thus, for this particular task the more first order kind of presentation elicited better task performance than second order parallel presentation. As mentioned earlier these research results cannot be generalized to all tasks in general. This is proved by research from Benshoof and Hooper (1993). Their experiments showed that second order information presentation elicited better task performance from high ability students than first order parallelism. Low ability students performed equally for single- (1st order), and multiple (2nd order) window treatments. In this case the task was more memory intensive than performing actions like registering gained information (Burg & Groenewoud, 1996). Here, subjects had to apply a formula in different situations (rule-based). The formula consisted of ciphers that were represented by symbols. To calculate sums subjects had to remember or look up which cipher was represented by a certain symbol. In the multiple window case subjects had to rely more on their short term memory than in the condition where the meaning of symbols was available on the same page as the sums.

3.2 Elastic Windows: a web site relating to the PI theory

"Elastic Windows," one of several new software programs developed at the University of Maryland at College Park, displays many windows in full view at the same time, rather than one behind another. The hope is that users will be able to switch among tasks more quickly and easily.

The authors, Kandogan and Shneiderman (1997) think that computer users frequently need to jump around related text or graphics when working on a task. For example, programmers have to jump from procedural code to data declarations, from application window to debugger window. In general, many users need to access multiple information rapidly, while minimally disrupting their concentration on the task. With large displays and higher resolution a large number of related documents can be displayed simultaneously, but with current approaches visibility and arrangement of multiple windows is a problem.

According to the authors, almost all current systems follow the independent overlapping windows approach, where windows are allowed to overlap each other, operations on windows are performed one at a time, and size and location of each window is independent. They developed a new windowing approach called Elastic Windows.

It is based on three principles:

Hierarchical windows, to provide an organization to web pages decreasing the disorientation, which supports users structuring their work environment according to task.

Multiple window operations, to examine multiple pages and to reorganize them on the screen efficiently so as to decrease the number of window operations to lessen the load on the cognitive abilities of users.

Hierarchicons, to provide quick access to pages to use the screen space more productively.

The screen is divided into two main windows. The window on the left displays the new incoming mail, and the window on the right contains the old correspondence, represented as icons with pictures of the sender, grouped hierarchically in separate windows. A default icon is used for people with no pictures in the system. The border coloring indicates the nesting. It gets gradually darker, deeper in the hierarchy. The Figure xx shows the web page with Elastic Windows. [Online available at http://www.cs.umd.edu/hcil/elastic-windows/]

Figure xx. Example of Elastic window

The principles of the Elastic Windows are very close to the concept of Parallelism and the PI theory. The web page designed with Elastic Windows principles also similar to the product guided by the PI theory. Maybe "great minds think alike" is the explanation. Of course this is a new thing and more tests are needed. My final project, in a sense, could be taken as one of the initial tries.

As a new developed theory for instruction design, the PI theory relates closely to the learning philosophies and theories. To understand the concept of Parallelism and the PI theory deeply, it is necessary to review the concerned philosophies and theories, which also can provide profitable help to the coming design and redesign task. In the next section- chapter 3, the general design and redesign of the project including the specific context, the decisions, the objectives and the development will be described.

See also the chapters 3, 4, 5 and 6,
including the raw data
in the appendices somewhere else on this site.

Enschede, sept. 3, 2002; updated: febr. 2003