Last updated: May 20, 2019
Topic: EducationSchool
Sample donated:

When examining the data they have collected, it is essential for researchers to determine to what degree is their data consistent. There are many methods by which researchers can assess the reliability of their instruments. Through different statistical procedures, researchers will evaluate the reliability coefficient to ascertain to what extent their data is consistent. The reliability coefficient is generally a numerical value between 0. 00 and 1. 00, where the coefficients of reliability are numerical values close to 1. 00.

The test-retest approach to reliability is used to assess how consistent a measured group of subjects tested with the same measuring instrument remain over time. Using this method, a researcher administers the same test twice to the same group of people, using the same measuring instrument, with the two test separated by a period of time. The researcher then compares the sets of scores and the resulting correlation coefficient is known as the test-retest reliability coefficient, or coefficient of stability. It is important to note that the amount of time elapsed between the two tests will most probably effect the coefficient of stability.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

For instance, if a group of students were to take the same mathematics test two weeks apart, it is likely that they will produce similar results. However, should the group of students take the mathematics test once at the beginning of the school year and once at the end of the school year, the scores may vary since the lessons taught in the interim will provide the students with more knowledge that will enable them to perform better on the exam. Another disadvantage regarding the test-retest approach is that the subjects may remember the questions from the first test, which would enable them to produce a better score on the retest.

The test-retest reliability method is often used for cognitive test and characteristic skills. Generally, this approach to reliability is useful when the subject being assessed is likely to remain stable over a period of time. Unlike the test-retest reliability, the equivalent-forms reliability is a method used to measure people using two forms of the same subject. Using this technique, a researcher administers two separate tests to one group of people and evaluates the two scores produced. The two forms are comparable in that they are supposed to focus on the same element, yet they vary in the exact questions presented in each form.

The two tests are expected to yield similar scores, thus indicating a high degree of consistency. This form of testing reliability would eliminate the problem of the subjects remembering questions from the previous test, as is applicable with the test-retest reliability. However, it is often difficult to create two tests that are equivalent. The internal consistency reliability method is used by researchers to determine the degree to which their measuring instruments possess internal consistency. Using this form to test reliability, a researcher administers a test once to a group of individuals.

The scores are then computed using a statistical procedure to obtain a coefficient indicating the level of internal consistency. There are many available techniques used to determine the internal consistency, including the split-half reliability method, in which the researcher divides each performance into two halves, generally categorizing them by odd and even numbers. Each half is marked separately and the scores are correlated using the Spearman- Brown correction formula to determine the split-half reliability coefficient. This method is most applicable for lengthy test.

However, since reliability tends to be higher for longer tests and the split-half reliability only represents the reliability for fifty percent of the test, the Spearman-Brown formula must be incorporated to produce accurate results. The Kuder-Richardson #20 is a more advanced and complex version of the split half method to assessing reliability as the results provided are not dependent upon the ordering of the questions. Therefore, the resulting coefficient is certain to be impartial to the manner in which the test was divided.

Another method used to assess internal consistency is known as Cronbach’s alpha which is similar to the Kuder-Richardson 20 approach, except it allows researchers to test using instruments make up of items that can be scores with three or more possible values. The alpha approach is commonly used regarding a multiple choice exam or surveys where the answers vary from strongly disagree to strongly agree. The interrater reliability is another approach used to measure consistency in which raters evaluate a set of items, applicants, or pictures.

This method is commonly used in the Olympics, where judges rate the performances of athletes, such as gymnasts or ice-skaters. The percent-agreement measure is the simplest form of interrater reliability. Using this technique, researchers calculate the percent of the occurrences where raters agree in the ratings. One disadvantage to measuring using this system is that there is a possibility that the raters chose the same ratings by chance. Therefore, as with any measuring reliability for any data, it is vital for researchers to evaluate the different approaches available and to consider the positive and negative aspects for using each one.

Validity refers to the degree to which a measuring instrument accurately assesses the specific concept that a researcher wants to measure. There are numerous frequently used methods for assessing validity. Content validity considers the degree to which a measuring instrument reflects the specific subject being studied. This procedure is commonly used in the academic field, where teachers regularly test their students to assess how well they understand the material covered in the lessons. When examining tests, questionnaires, or inventories, researchers assess how well the form covers the subject being studied.

Generally, the content validity of a measurement is simply determined by having experts analyze the various aspects of the test to see if they cover the specific topic being explored. For instance, if students were tested on mathematics, but the test only consisted of addition problems, experts would claim that the test did not have content validity since it did not test the whole mathematical ability. It can be difficult to test for content validity as it may be complicated to determine what questions will adequately represent the specific topic being observed.

In regard to criterion-related validity, researchers may assess the validity of their instruments by comparing the scores of their test with scores on a significant criterion variable. The criterion variable can be defined as a more reputable, widely acknowledged test. For instance, a college may administer an entrance exam (measuring instrument), and the scores that the students produce would be compared to their SAT scores (criterion variable). There are two forms of criterion-related validity. Concurrent validity measures the relationship between the current criterion variable and the scores obtained from the new test.

In this method, researchers administer their new test around the same time they gather data regarding the criterion variable. Unlike concurrent validity, predictive validity measures the extent to which a future score can be estimated from observing a current score. For instance, if a student scored high on his PSAT exam, it is probable that he will produce a high score on his SAT exam as well. Construct validity is often used in social sciences and psychology, and is used to determine how much of a trait is possessed by the examinee through administering a test or experiment.

Construct validity can be used when assessing one’s level of intelligence or emotions. It is often difficult to properly assess construct validity since most of the concepts involved are theoretical and abstract. The negative aspect to testing for construct validity is that humans tend to act differently when they are observed and under pressure. Additionally, researchers may influence the behavior of the examinees by unintentionally giving cues using body language. As with any test, researchers must be aware of the potential pitfalls that are associated with their experiment to enable them to effectively test for validity.


[1] Huck, Schuyler. (2008). Reading statistics and research. Allyn & Bacon.

[2] Shuttleworth, Martyn (2009). Construct Validity. Retrieved [June 13, 2010] from Experiment Resources:

Read more: