ILR Interagency Language Roundtable
History of the ILR Scale
 
 

AN OVERVIEW OF THE HISTORY OF
THE ILR LANGUAGE PROFICIENCY SKILL LEVEL DESCRIPTIONS AND SCALE

Dr. Martha Herzog


HOW DID THE LANGUAGE PROFICIENCY SCALE GET STARTED?

The United States has traditionally had special problems defining foreign language competence because of the historic inattention to languages in our general educational programs. Faced with academic gaps, the Government has had to fill them for Government purposes. Fortunately, some of the lessons learned by the Government have been used by others.

The foreign language competence of U. S. Government employees was not examined during the first 175 years of our history. However, in the 1950s, as a war with Japan was followed by a war in Korea, the United States’ lack of preparation in foreign languages was recognized as a serious problem. In 1952 the Civil Service Commission was directed to inventory the language ability of Government employees and develop a register of these employees’ language skills, background, and experience.

Unfortunately, the Commission had no system for conducting an inventory, no proficiency test, and no criteria for test construction. Available, instead, were employees’ grades in language courses and self-reports on job applications. Self-reports were likely to state something like “fluent in French” or “excellent German,” and there has never been standardized grading across academic institutions in this country. The Commission concluded that the United States Government needed a system that was objective, applicable to all languages and all Civil Service positions, and unrelated to any particular language curriculum. Because the academic community did not have such a system, the Government had to develop its own.

Initially the concept met resistance. Some Government agencies feared loss of autonomy, and everyone understood that test results could embarrass many employees who claimed to be “fluent” or “excellent.”

Nevertheless, the Foreign Service Institute (FSI) began to work on solving the problem under the leadership of their Dean, Dr. Henry Lee Smith. He headed an interagency committee that devised a single scale ranging from 1 to 6; that is, the first scale did not distinguish among the four skills but simply rated “language.” Although other government agencies lost interest for a time, FSI continued to refine the scale.

In 1955 a survey of all Foreign Service officers based on the new scale showed that fewer than half reported a level of language “useful to the service.” The extent of the problem was further highlighted in 1956, when only 25% of entering Foreign Service Officers were tested at a “useful” level of proficiency in any foreign language. In November of 1956, the Secretary of State announced a new language policy, including the requirement that language ability “will be verified by tests.” In 1958, language proficiency tests became “mandatory” for all Foreign Service Officers.

FSI’s first efforts to test according to the scale were not reliable. The faculty found it difficult to apply the scale consistently, so results varied from tester to tester. Tests were considered subjective and thought to be much easier in some languages than others. However, many valuable lessons were learned from initial tests. FSI built upon this experience to revise the scale. One extremely important decision involved changing the single scale for “language” to separate scales for each skill. The scale was eventually standardized to six base levels, ranging from 0 (= no functional ability) to 5 (= equivalent to an educated native speaker).

Equally important was the creation in 1958 of an independent testing office at FSI headed by Frank Rice and Claudia Wilds, who had studied with John B. Carroll. Professor Carroll, then at Harvard, served as a consultant as the test was designed. The FSI Testing Unit developed a structured interview in direct support of the 6 point scale. Standardized factors were developed for scoring, and the interview format ensured that all factors were tested. The interaction of test format and rating factors was crucial to the success of the test. Emphasis on a well-structured interview reduced the problems associated with the earlier tests. The development of standardized rating factors reduced subjectivity. The factors provided a basis for testers’ agreement on important aspects of test performance and helped to focus their attention during testing and rating. This innovation created the framework for checking interrater reliability, and a high degree of consistency in scoring resulted.

The interview soon became the standard method of testing at FSI. For many years it was known world-wide as the FSI interview, or just “the FSI.” The interview and the scale gained wide recognition, and many other Government agencies adopted the system, including the Peace Corps for the testing of all its overseas volunteers. In 1968 several agencies cooperatively wrote formal descriptions of the base levels in four skills—speaking, listening, reading, and writing. The resulting scale became part of the United States Government Personnel Manual. The original challenge to inventory Government employees’ language ability could finally be met.

New developments continued. In 1976 NATO adopted a language proficiency scale related to the 1968 document. By 1985 the U. S. document had been revised under the auspices of the Interagency Language Roundtable (ILR) to include full descriptions of the “plus” levels that had gradually been incorporated into the scoring system. (Since then, the official Government Language Skill Level Descriptions have been known as the “ILR Scale” or the “ILR Definitions.”) Although specific testing tasks and procedures now differ somewhat from one agency to another for operational reasons, all U.S. Government agencies adhere to the ILR Definitions as the standard measuring stick of language proficiency.

Also in the 1980s, the American Council on the Teaching of Foreign Languages (ACTFL) developed and published for academic use Proficiency Guidelines based on the ILR definitions. Like the ILR scale, the ACTFL guidelines have undergone refinement.. ACTFL also developed an OPI similar to the Government test and began training educators to test according to their scale. ACTFL and the Government have worked together closely for almost twenty years to ensure that the two proficiency testing systems are complementary.


References

Adams, M. L. (1980a). Five concurring factors in speaking proficiency. In: Reference papers compiled for testing kit workshop III (pp. 46-51). School of Language Studies, Foreign Service Institute.

Adams, M. L. (1980b). Measuring foreign language speaking proficiency: A study of agreement among raters. In: Reference papers compiled for testing kit workshop III (pp. 13-36). School of Language Studies, Foreign Service Institute.

Lowe Jr, P. (1988). The unassimilated history. In: P. Lowe Jr. & C. W. Stansfield (Eds.), Second language proficiency assessment: Current issues (pp. 11-51). Englewood Cliffs, NJ: Prentice Hall Regents.

Sollenberger, H. E. (1978). Development and current use of the FSI oral interview test. In: J. L. D. Clark (Ed.), Direct testing of speaking proficiency—Theory and practice (pp. 3-12). Princeton, NJ: Educational Testing Service.

Wilds, C. P. (1975). The oral interview test. In: R. Jones & B. Spolsky (Eds.), Testing language proficiency (pp. 29-44). Arlington, VA: Center for Applied Linguistics.

Wilds, C. P. (1978). The measurement of speaking and reading proficiency in a foreign language. In: M. L. Adams and J.R. Frith (Eds.), Testing kit (pp. 1-12). School of Language Studies, Department of State. U.S. Government Printing Office.


 
Copyright 2007 Interagency Language Roundtable