Language Learning & Technology
Vol. 1, No. 2, January 1998, pp. 23-40

INPUT VS. OUTPUT PRACTICE IN EDUCATIONAL SOFTWARE FOR SECOND LANGUAGE ACQUISITION

Noriko Nagata
Department of Modern and Classical Languages
University of San Francisco

ABSTRACT

This paper presents an experiment concerning the relative effectiveness of computer assisted comprehension practice and production practice in the acquisition of a second language. Two computer programs were developed: (a) an input-focused program providing students with explicit grammatical instruction and comprehension exercises and (b) an output-focused program providing the same grammatical instruction together with production exercises. The results of the study show that the output-focused group performed significantly better than the input-focused group for the production of Japanese honorifics and equally well for the comprehension of these structures. The study supports Swain's argument that there are roles for output in second language acquisition that are independent of comprehensible input.

INTRODUCTION

There is increasing interest in the use of computer-assisted language instruction, for obvious reasons. The extensive exercises and drills required in second language instruction place significant demands on class time, and students must wait for feedback on their exercises until the instructor corrects them. Computer-assisted language instruction, in conjunction with contemporary natural language processing technology, holds out the promise of unlimited, immediate feedback pinpointed to the specific grammatical errors made by the student (Nagata, 1993, 1995, 1996, 1997b). But, even though it is technologically feasible for a computer to provide individualized grammatical feedback, there remains an important empirical question about how the exercises should be formatted to optimize their instructional effectiveness in promoting different sorts of competence for different types of target structures. Several such studies have been performed.

Doughty (1991) compares three kinds of computerized instruction, in which all subjects were presented the same reading texts on the computer, but the rule-oriented instructional group received explanations of the grammatical rules in relative-clause constructions, the meaning-oriented instructional group was encouraged to focus on both the content and structure, and the control group was merely exposed to the reading texts. While both the rule-oriented instructional group and the meaning-oriented instructional group improved equally well in relativization ability and significantly better than the control group, the meaning-oriented instructional group performed best in comprehending the reading texts.

N. Ellis (1993) performed a computerized experiment to compare the effectiveness of explicit (Rule), structured (Rule & Instances), and implicit (Random) programs to teach the soft

-23-

mutation construction in Welsh. The Rule group and the Rule & Instances group received instruction in grammatical rules, but only the Rule & Instances group was shown how each rule applied to two instances of vocabulary. Afterward, all three groups were presented Welsh phrases on the computer screen and were asked to type in the appropriate English translation for each phrase. The Rule & Instances group performed best on well-formedness judgments and the Random group performed worst.

DeKeyser's study (1995) employed a computer program to compare explicit-deductive instruction with implicit-inductive instruction. Both the explicit-deductive group and the implicit-inductive group were presented pictures with corresponding sentences in Implexan (a miniature linguistic system) on the computer, but only the explicit-deductive group was provided explanations on Implexan grammatical rules. The grammatical rules included two types: straightforward ("categorical") rules and fuzzy rules ("prototypicality patterns" that cannot be completely reduced to an abstract rule). On the final production test, the explicit-deductive subjects performed significantly better than the implicit-inductive subjects for the straightforward rules, while no such advantage was observed in the fuzzy rules1.

Robinson's study (1996) employed computerized instruction to teach both simple and complex structures of English, under several conditions. All subjects were presented the same target sentences on the computer, but, for example, the rule-instructed subjects were asked metalinguistic questions regarding the sentences, the rule-search subjects were asked if they identified any rule in the given sentences, and the implicit subjects were instructed to memorize the target sentences. The rule-instructed subjects performed significantly better than the rule-search subjects and the implicit subjects for the simple structure on the grammaticality judgment test. The rule-instructed subjects also outperformed the other groups for the complex structure although the difference was statistically significant only between the rule-instructed subjects and the rule-search subjects.

Nagata's study (1997a) employed a computer program providing fill-in-the-blank exercises to practice Japanese particles. Two types of feedback were implemented: metalinguistic feedback (explaining metalinguistic rules in response to particle errors) and English translations (providing English equivalents to the Japanese particles). The results of the study suggest that ongoing metalinguistic feedback is more effective than first-language translation feedback in producing the Japanese particles.

This paper presents a new study investigating the relative effectiveness of production (output) exercises and comprehension (input) exercises presented and graded by personal computers. Although the target structures are Japanese honorifics, the results should interest anyone concerned with computer-assisted language instruction (CALI) and the role of input and output practice in second language acquisition.

THE STUDY

Theoretical Background

Many studies have investigated the role of input in second language acquisition (e.g., Ellis, R., 1981; Faerch & Kasper, 1986; Gass & Madden, 1985; Krashen, 1980, 1985, 1987; Loschky, 1994; Sharwood Smith, 1993; White, 1987). It seems that the role of output has received less attention. According to Krashen (1987), "comprehensible input"2 and the affective state are the true causes of language acquisition. On this hypothesis, production exercises would be

-24-

relevant to language acquisition only insofar as they lower affective barriers or provide additional comprehensible input. VanPatten and Cadierno (1993a, 1993b) examined the effects of two types of instruction, traditional instruction and processing instruction, in both interpreting and producing Spanish object pronouns in OVS and OV order. The traditional instruction involved grammatical explanations and output practice, while the processing instruction involved grammatical explanations and comprehension practice. These two kinds of instruction were also different in the grammatical information provided3 and the instructional approach adopted.4 The result of their study indicates that the processing group performed significantly better than the traditional group on comprehension post-tests and equally well on production post-tests. VanPatten and Cadierno conclude that "instruction is apparently more beneficial when it is directed at how learners perceive and process input rather than when instruction is focused on practice via output," (1993a, p. 54; 1993b, p. 240).5

Swain (1985, p. 248), however, argues that "there are roles for output in second language acquisition that are independent of comprehensible input," (see also Swain and Lapkin, 1995). The results of her study (1985) indicate that sixth-grade French immersion students perform similarly to native speakers on those aspects of discourse and sociolinguistic competence which do not rely heavily on grammar for their realization but their grammatical performance is not equivalent to that of native speakers (p. 251). The immersion students in her study received enough comprehensible input, but their "comprehensible output"6 was very limited. Swain conjectures that producing the language, as opposed to simply comprehending the language, may force the learner to move from semantic processing to syntactic processing, thereby facilitating more grammatical competence. Swain also refers to the phenomenon of individuals who can understand a language and yet can only produce limited utterances in it: a ninth-grade immersion student said, "I understand everything anyone says to me, and I can hear in my head how I should sound when I talk, but it never comes out that way," (p. 248). This indicates that comprehension does not necessarily transfer to production.

DeKeyser and Sokalski (1996) replicated Van Patten and Cadierno's study using two different target structures: the Spanish direct object clitics (the same structure used in Van Patten and Cadierno's study) and the Spanish conditional, which is more complex and difficult to produce. DeKeyser and Sokalski's study eliminated extra variables by providing the same grammatical instruction and exercise content, so the comparison was entirely between comprehension practice and production practice. The results of the immediate post-test show that for object clitics, the input practice group performed better in the comprehension tasks and the output practice group performed better in the production tasks. For the conditional, the output practice group outperformed the input practice group in both the production and the comprehension tasks. These differences faded in the long term, however. The results indicate that "the relative effectiveness of production versus comprehension practice depends on the morphosyntactic complexity of the structure in question as well as on the delay between practice and testing" (DeKeyser and Sokalski 1996).

The present study investigates the relative effectiveness of comprehension and production practice in the acquisition of Japanese honorifics, both formally and conceptually complex structures of Japanese.7 The following describes the Japanese honorific system briefly.

The Japanese Honorific System

Japanese honorifics (keigo) have traditionally been sub-classified into respectful words (sonkeigo) and humble words (kenzyoogo).8 The use of honorifics depends on the notions of

-25-

"out-group" and "in-group." The distinctions between "out-group" and "in-group" may be understood in terms of differences in rank, age, affiliation, intimacy, and so forth. Typically, the "out-group" includes the speaker's superiors (e.g., teachers, supervisors, etc.) and the "in-group" includes the speaker and the speaker's family members or subordinates (e.g., assistants, secretaries, etc.). Japanese honorifics are used in both spoken and written contexts. There are irregular and regular honorific verbal forms. The regular, respectful form of a verb is constructed using the fixed pattern "o + verb stem + ni narimasu."9 This form is used when an out-group person is the subject who performs the action in a sentence (e.g., Sensee ga kono hon o o-kaki-ni-narimasita (respectful), 'My teacher wrote this book'). The regular, humble form of a verb is arrived at using the pattern "o + verb stem + simasu." This form is used, for example, when the speaker or the speaker's in-group member is the subject who performs the action in a sentence and an out-group person is the object/direction/goal of the action (e.g., Sensee ni o-ai-simasu (humble), 'I will meet my teacher'). The irregular honorific forms are not arrived at by these patterns and must be memorized for each verb (e.g., Sensee ga irassyaimasu (respectful), 'My teacher will come'; Watasi ga mairimasu (humble), 'I will come').

In short, a speaker needs to choose honorific forms depending on who the subject of the sentence is, on who the object of the sentence is, on whom the speaker is talking to, and so forth.10 Verbs may take regular honorific forms, irregular honorific forms, or both regular and irregular honorific forms. Japanese honorifics are fairly complicated structures which represent a major hurdle for second-language learners of Japanese.

Subjects

Fourteen students in a second-semester Japanese course at the University of San Francisco participated in this study. The students were paired based on the scores they obtained on the mid-term exam and were randomly divided into two groups,11 so that the two groups had no significant difference in the level of achievement in the course, prior to the experiment (t = 1.07, p = 0.324).12 Each group consisted of three males and four females. The students' first language was English, except for one student in the input-focused group whose first language was Korean but who was also fluent in English.13

Materials

Two computer programs were developed in HyperCard (these programs are called BANZAI: HONORIFICS). One is an input-focused program which provides explicit grammatical instruction together with comprehension exercises and the other is an output-focused program which provides the same grammatical instruction augmented by production exercises. Macintosh computers were used in this experiment. The target structures, Japanese honorifics, were new to the students. Four lessons of grammar notes and exercises were implemented in both programs.14 The grammar note includes grammatical/conceptual descriptions of the Japanese honorific system accompanied by examples. The exercises include five types of tasks, in order to develop the learners' comprehension or production skills through word-level, sentence-level, and paragraph-level practice. Appendices A, B, C, D, and E describe the five types of comprehension and production exercises respectively.

Figure 1 illustrates one of the type 3 comprehension exercises, as presented on the computer screen. Every exercise in the input-focused program provides the students with a choice of three answers .15 Suppose a student clicks #1 or #2 to answer the question in Figure 1 (the

-26-

input-focused program). He/she is informed that "O-KAKI-NI-NARIMASITA is a respectful form, so the subject who performs the action in the sentence should be an out-group member (e.g., superior)."16 (Capitalized Japanese words are presented on the computer screen in the Japanese kana and kanji writing systems). If #3 is selected, the student is informed that the answer is correct and a Japanese pronunciation of the correct answer is provided.



Figure 2 illustrates one of the type 3 production exercises as it appears on the computer screen. The output-focused program involves the same sentence used in the input-focused program, but the exercise is to "produce" the sentence. For example, the student is asked to type a sentence in the box at the bottom of the screen and push the "Check Answer" button to check whether or not his/her response is correct. The program uses a Japanese word processor so that the students can type their responses in the Japanese writing system.17 The correct answer for this question is something like Sensee ga o-kaki-ni-narimasita, 'My teacher wrote it.' If the student types Sensee ga o-kaki-simasita, he/she is informed that "O-KAKI-SIMASITA is a humble form, but the subject who performs the action in the sentence is an out-group member (e.g., superior). Use a respectful form."18 If the student in the output-focused group fails to provide a correct response three times, an "Answer" button appears. Pressing the "Answer" button provides the student with the correct answer. If the student types a correct response, he/she is informed that the answer is correct and a Japanese pronunciation of the correct answer is provided. In short, both the input-focused program and the output-focused program provide the same content of exercises with on-going grammatical feedback in response to the learners' correct and incorrect answers. Both programs also provide a "Grammar" button to open the grammar note which the students read at the beginning of the session and a "Vocabulary Hint" button to see the list of words used in each exercise.

-27-


A pilot study was conducted with two third-semester Japanese students. The result of the pilot study indicated that comprehension exercises require less time for the students to complete than the production exercises. In order to equalize the total instructional time for both groups, the input-focused program provided more exercises than the output-focused program.19 Through four computer sessions (approximately one hour per session), the students in both groups received 137 exercises.20 In addition, the input-focused group received 130 more exercises.21

Procedure

The students in both groups participated in four one-hour computer sessions over the course of two weeks: one group received the input-focused program and the other received the output-focused program.22 In each computer session, the input-focused group spent five to seven minutes reading a grammar note (i.e., a short text describing the grammatical structures on the computer), and then proceeded to the comprehension exercises. The output-focused group also first spent five to seven minutes reading the same grammar note, and then moved to the production exercises.

A questionnaire was administered at the end of the last computer session. On the questionnaire, the students were asked to rate 23 items on a 5-point scale (1 strongly disagree, 2 disagree, 3 undecided, 4 agree, and 5 strongly agree) and to write comments about the computer program.

Two days after the last computer session, the students took an achievement test in their usual classroom. The achievement test included both comprehension and production tasks similar to the ones provided in the computer sessions. The comprehension tasks consisted of a total of 21 questions (in which 4 questions were of exercise type 1, 4 questions of type 2, 7 questions of type 3, 2 questions of type 4, and 4 questions of type 5). The production tasks consisted of a total of 20 questions (in which 9 questions were of type 2, 7 questions of type 3, 2 questions of type 4, and 2 questions of type 5).23 The achievement test was not conducted on the computer because only the output-focused group used the Japanese word processor and it was suspected that if the production tasks were performed on the computer, the input-focused group might

-28-

have difficulty in typing Japanese on the Japanese word processor. It was also optional whether to use Japanese or roman characters to write the answers. A perfect score on the comprehension tasks was 42 and that on the production tasks was 43. The following scoring system was used for the achievement test. For the comprehension tasks, 2.0 points were deducted for any incorrect choice. For the production tasks, points were deducted according to the relative importance of errors in the given questions. For example, 1.0 points were deducted for an incorrect or missing verb (although when the error was on the verbal forms such as using humble for respect, plain for honorific, perfective for imperfective, or negative for affirmative, only 0.5 points were deducted), and 0.5 points were deducted for other incorrect or missing words/particles and for incorrect word order. A spelling mistake also resulted in a 0.2 point reduction. The students' scores were converted into percentages.

After the students took the achievement test, they were away from class for one week during the university's spring break period. The students in both groups were asked to write a one-page letter about what they did during the break to the teacher using honorific forms. After they submitted the homework, the homework was corrected and returned to them. A week after the spring break, the students were assigned another one-page written assignment to translate English conversations to Japanese. The conversations were between a teacher and a student, so to translate them use of appropriate honorific forms was required. In the following class, the students practiced oral conversations on the basis of this homework for about thirty minutes. During the rest of the course, no more special activity was provided regarding honorifics.

A month after the computer sessions, the students took a retention test. The retention test consists of comprehension tasks (4 questions of exercise type 1, 3 questions of type 2, and 5 questions of type 3) and production tasks (7 questions of type 3) which were all provided on the achievement test. Fewer questions were given on the retention test than on the achievement test because the retention test was administered together with the written final exam, so space was limited. The retention test focused on sentence-level comprehension and production tasks. The perfect score on both the comprehension tasks and the production tasks was 24. The retention test followed the same scoring system used in the achievement test. The students' scores were converted into percentages.

A week after the retention test (i.e., five weeks after the computer sessions), the students took an oral conversation test involving Japanese honorifics. In the oral conversation test, each student performed conversations together with the instructor and the tutor of the Japanese course for about 5 minutes. The students were asked ten questions such as whether they came to school yesterday, what they did, whether they met their friends or teacher, what the teacher did, etc. The conversation was recorded and graded based on ten sentences that the students were supposed to produce using honorifics. A perfect score on the oral test was 14. The oral test followed the same scoring system used for production tasks on the achievement test. The students' scores were converted into percentages.

ANALYSIS

The two-sample dependent t-test (the paired two-sample t-test) was applied to examine whether there is a significant difference between the input-focused group and the output-focused group in their scores on the achievement test, the retention test, and the oral test. All statistical analysis in this study employed two-tailed t-tests.

-29-

RESULTS AND DISCUSSION

Table 1 shows the means and standard deviations of the two groups for the comprehension tasks and the production tasks on the achievement test, and the results of the corresponding t-tests. The results of the achievement test show that there is no significant difference between the two groups in the comprehension tasks, while the difference between the two groups in the production tasks is statistically significant at the 0.002 level, favoring the output-focused group. The results suggest that given the same grammatical instruction, output-focused practice is more effective than input-focused practice for the development of skill in producing Japanese honorifics and is equally effective for the comprehension of these structures.

Table 1: The results of the comprehension tasks and the production tasks on the achievement test (scores out of 100)
Achievement test Input group Output group T-test  
  Mean SD Mean SD t Sig of t

Comprehension tasks

92.4

9.0

95.2

3.8

1.15

NS

Production tasks

72.3

16.1

85.4

13.4

5.67

0.002

Figure 3 presents the means of the two groups for each type of comprehension tasks (type 1 through type 5) on the achievement test. The scores for each type of comprehension task were converted into percentages. The result shows that there is no significant difference between the two groups for each type of comprehension task.

Figure 3: The result of each type of comprehension task on the achievement test (scores out of 100)

Figure 4 presents the means of the two groups for each type of production task (type 2 through 5) on the achievement test. (As noted, type 1 tasks were omitted, due to time considerations and their similarity to type 2 tasks.) The scores for each type of production task were converted into percentages. A significant difference was found in types 3 and 4 (p < 0.001 for

-30-

type 3 and p < 0.05 for type 4), but not in types 2 and 5. Type 2 production tasks asked the students to fill in the blank with a verbal predicate (as illustrated in example (4) in Appendix B). This type of question involves only production of a verb form, while type 3 production tasks require full-sentence production (see example (6) in Appendix C). Therefore, type 3 production tasks involve more complex syntactic processing than type 2 tasks. Type 4 production tasks are also more complex than type 2 tasks because they require the students to read the text and to revise verbs with appropriate honorific forms when necessary (see example (8) in Appendix D). In this type of task, the learners need to understand discourse context and to recover some unstated subjects and objects from context in order to determine appropriate honorific forms. The results suggest that when more complex syntactic and discourse processing is involved in production tasks, it becomes more difficult for learners to apply their learning from comprehension exercises directly to the production tasks. Type 5 is a semi-production task because the students were presented a few Japanese sentences orally and were asked to dictate them, which is different from constructing sentences by themselves. For this task, an oral cue was provided three times for each sentence, and the students were given enough time to write down each sentence. Since the nature of the task was half receptive, this might be one reason the input-focused group performed as well as the output-focused group in type 5 production tasks.

Figure 4: The result of each type of production task on the achievement test (scores out of 100)

The retention test results (Table 2) are consistent with the achievement test results: there is no significant difference between the two groups in the comprehension tasks, while the difference between the two groups in the production tasks is statistically significant at the 0.02 level, favoring the output-focused group.24 The result of the oral conversation test (Table 3) also exhibits a statistically significant difference, favoring the output-focused group (p < 0.02).25 These results indicate that given the same follow-up written assignments and oral conversation practice, output-focused practice is more effective than input-focused practice in the long term in both written and oral production of Japanese honorific sentences.

Table 2: The results of the comprehension tasks and the production tasks on the retention test (scores out of 100)

-31-

Retention test Input group Output group T-test  
  Mean SD Mean SD t Sig of t

Comprehension tasks

88.1

11.6

96.4

4.5

1.62

NS

Production tasks

74.4

15.9

84.8

16.2

3.18

0.02

Table 3: The result of the oral conversation test (scores out of 100)
Oral conversation test Input group Output group T-test  
  Mean SD Mean SD t Sig of t

Oral production

68.8

19.3

83.2

11.0

3.29

0.02

Table 4 presents the means and standard deviations of the ratings for each item on the questionnaire (1 strongly disagree, 2 disagree, 3 undecided, 4 agree, and 5 strongly agree). The result of the questionnaire shows that both groups had very positive attitudes toward the computer program, regardless of whether they received the input-focused program or the output-focused program. The ratings for items 3, 9, 18, 19, 20, and 21 show some differences between the two groups, although the differences are not statistically significant.26The reason item 3 ("I didn't have technical problems when working on the program,") was rated 0.9 lower by the output-focused group might be related to the fact that the output-focused group used the Japanese word processor and the students often hit the wr