Drawing on the second language performance assessment item specifications created in Technical Report 18, this volume describes the creation of actual performance assessment instruments and their validation. After briefly reviewing the background to the project, the book explains the test and rating scale development processes and the administration of the resulting seven-task tests to 60 students from the University of Hawai'i (30 on Form P and 30 on Form Q) and 30 students from the Kanda University of International Studies (on Form J). The results based on task-dependent, task-independent, and self-rating scales are examined in terms of descriptive statistics, multi-faceted Rasch model analyses, reliability estimates, and correlational analyses. These results are discussed at length in terms of the effects of test revision, and in terms of comparisons among the task-dependent, task-independent, and self-rating scales especially with regard to their reliability and validity.