Master's Theses - Daytona Beach

Assessing Reliability of Expert Ratings Among Judges Responding to a Survey Instrument Developed to Study the Long Term Efficacy of the ABET Engineering Criteria, EC2000

Tracy L. Litzinger, Embry-Riddle Aeronautical University

Date of Award

Spring 5-2006

Document Type

Thesis - Open Access

Degree Name

Master of Science in Human Factors & Systems

Department

Human Factors and Systems

Committee Chair

Shawn Michael Doherty

Committee Member

Elizabeth L. Blickensderfer

Committee Member

Rosemarie Reynolds

Abstract

In today’s assessment processes, especially those evaluations that rely on humans to make subjective judgements, it is necessary to analyze the quality of their ratings. The psychometric issues associated with assessment provide the lens through which researchers interpret results and important decisions are made. Therefore, inter-rater agreement (IRA) and inter-rater reliability (IRR) are pre-requisites for rater-dependent data analysis. A survey instrument cannot provide “good” information if it is not reliable; in other words, reliability is central to the validation of an instrument. When judges cannot be shown to reliably rate a performance, item, or target, the question becomes why the judges’ responses are different from one another. If the judges’ ratings covary unreliably because the construct is poorly defined or the rating framework is defective, then the resultant scores will have questionable meaning. On the other hand, if the judges’ ratings differ because they have a true difference in opinion, this is of importance to the researcher and may not necessarily diminish the validity of the scores. The intraclass correlation coefficient (ICC) is the most efficient method to assess these rater differences and identify the specific sources of inconsistency in measurement. This study examined how ICCs can be used to inform researchers of the extent in which legitimate differences of opinion may appear as a lack of reliability and/or agreement, demonstrating the need for analyzing survey data beyond standard descriptive statistics. Overall, both the IRA and IRR correlations, as calculated by ICC, ranged from .79 to .91 indicating high levels of agreement and consistency in the scoring among the judges' ratings. When group membership was accounted for the IRA values increased suggesting the common judges agreed more than those judges who varied in their perspectives.

Scholarly Commons Citation

Litzinger, Tracy L., "Assessing Reliability of Expert Ratings Among Judges Responding to a Survey Instrument Developed to Study the Long Term Efficacy of the ABET Engineering Criteria, EC2000" (2006). Master's Theses - Daytona Beach. 123.
https://commons.erau.edu/db-theses/123

Download

Included in

Industrial and Organizational Psychology Commons

COinS

Master's Theses - Daytona Beach

Assessing Reliability of Expert Ratings Among Judges Responding to a Survey Instrument Developed to Study the Long Term Efficacy of the ABET Engineering Criteria, EC2000

Date of Award

Document Type

Degree Name

Department

Committee Chair

Committee Member

Committee Member

Abstract

Scholarly Commons Citation

Included in

Search

Browse

Author Corner

Links

Master's Theses - Daytona Beach

Assessing Reliability of Expert Ratings Among Judges Responding to a Survey Instrument Developed to Study the Long Term Efficacy of the ABET Engineering Criteria, EC2000

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair

Committee Member

Committee Member

Abstract

Scholarly Commons Citation

Included in

Share

Search

Browse

Author Corner

Links