Introduction

The requirements of the industries, the government and the service sector in relation to the quality of testing and calibration have expanded over the past few years. From this fact, the laboratories of different areas have come to fulfill with accreditation criteria.

Within the context of the qualification of laboratories, there are the requirements of ISO/IEC 17025 and ISO 15189, in which it is explicit that the laboratory should monitor the validity of tests and calibrations performed through a procedure of quality control. Such monitoring can be accomplished through participation in a proficiency test (PT) [1].

The PT are important programs that support the reliability of tests and calibrations. They are programs that compare results among a group of laboratories, with the goal of evaluating the technical competence for performing a method of testing or calibration [2]. After participating in a PT, the laboratory has evidence regarding its measurements, checking its proficiency.

In Brazil, the participation in PT is a prerequisite for requesting accreditation by National Institute of Metrology, Quality, and Technology (INMETRO). PT are needed in the routine entities seeking their qualification and recognition of third parties [3]. Laboratories have difficulty validating methods and evaluating measurement uncertainty and these activities can be supported by PT [4].

The PT are conducted through a system that aims to support the testing and calibration laboratories, ensuring the services offered and providing important information to the company quality management [5]. Through PT, it is possible to evaluate the performance of laboratories for specific tests or measurements, identifying analytical problems, establish comparability of methods for testing or calibration, provide additional assurance to laboratory customers, enabling participants based on results of interlaboratory comparisons, validate the declared uncertainty and assign values to reference materials [6].

The comparison programs may vary according to the needs of the industry in which they are used, sample characteristics, methods in use and the number of participants. The nature of the test or measurement taken in PT defines the method of comparison of performance, which can be quantitative, qualitative or interpretive [6].

Within this context, we highlight the following research question: What are the main practices and knowledge developed and implemented in the performance assessment of laboratories in PT?

The purpose of this study is to identify and analyze the knowledge and leading practices developed and implemented in PT. Our specific goal is to identify the key trends in this area and the theoretical gaps in the development of PT. This article is structured in four sections: introduction, description of the protocol of the systematic review, analysis of results and conclusions.

Protocol of the systematic review

This is a research of applied nature and has goals of exploratory character and is dependent on knowledge of the primary sources consulted. The approach of this study is considered qualitative. The proposed logic to perform the systematic review is described in Fig. 1, which shows the method used to perform the search, critical appraisal and synthesis of the information selected.

Fig. 1
figure 1

Protocol of the systematic review

The proposed method is based on the concepts presented by Akonbeng [7]. The systematic review was chosen for this study based on the statements made by the author cited above, who highlights the fact that this kind of work enables incorporating a larger number of contributions of relevant results, rather than just limiting the completion of some authors, allowing generalizability of the results.

The first stage of the protocol consisted on the elaboration of the research question which underlies the research proposal. The next step was to identify in what language the search would be performed, which was defined as only English. The survey was conducted in six scientific databases, where the initial focus of the search was papers published in indexed journals.

The keywords of the search were defined using Boolean logic, with applications such as OR, AND and *. The survey period was also limited between 2005 and 2012 (until June). The first result was 10563 papers. Subsequently areas that dealt only with specific matters, without addressing the research question and the theme of PT, were excluded. At this point, the number decreased to 2354 papers. As inclusion criteria for journals to prioritize the search, it was stipulated that those with the area of PT in its scope and papers that were directly related to the research topic would be considered, amounting to 125 studies selected. The analysis and selection of articles were conducted through a critical reading of their abstracts, and 12 papers were discarded because they were not directly related to the research question. Thus, 113 papers were selected.

The last step was a secondary search in the references of selected papers, identifying key standards, guides and recommendations from entities related to Metrology, Accreditation and Quality areas, in which 34 more references were added. Other details about the protocol of the systematic review are described in Fig. 1.

As Fig. 1 demonstrates, the implementation of the protocol generated a total of 113 articles and 34 additional references. The next section, which discusses the results of the study, presents a critical analysis of selected documents as well as the major theoretical and practical contributions identified, in order to develop considerations on the subject exposed.

Analysis of the selected documents

The analysis of the references researched was divided into two distinct parts: papers published in journals and standards and other documents in the field of PT. Following are the key concepts and practices identified.

Papers published in journals

We selected 113 papers published in journals that presented the topic PT and were directly related to the research. The main journals selected were: Accreditation and Quality Assurance (80), Flow Measurement and Instrumentation (2), IEEE Transactions on Instrumentation and Measurement (9), Measurement Transaction (11), and Metrologia/BIPM (11). The selected studies were critically analyzed and classified into four sub-areas, according to their approach and the application of the research conducted, namely: performance assessment in PT; calculation method for performance assessment in PT; use of PT for validation and/or estimation of measurement uncertainty, and management and improvements obtained in PT. There are some papers that can be related to more than one approach, but we analyzed them and classified in the approach that has more correlation to the article. This classification is presented in Table 1.

Table 1 Approaches and selected papers

Papers about performance assessment

Several publications analyzed are connected to the use of PT for performance assessment of laboratories, where they are used to confirm modifications or improvements made in measuring methods, and may also be used to assess different measurement systems [9, 11, 13, 14, 23, 40, 49]. This PT is usually made with reference laboratories involved, which can come from National Metrology Institutes (NMI) [10]. Comparisons are also frequently made between NMI, called key comparison, which are relevant to ensure the measurements made by NMI are equivalent [11]. The purpose of key comparison is to support equivalence of measurements of NMI. Comparisons with long rounds can use NMI reference laboratories and also pivot laboratories, which make intermediate measurements and are considered to be sub-references, and can participate in the stability study of the artifact [13].

Another common practice is to perform bilateral comparisons, generally made between two laboratories, where the one that has the best measurement capability, that is the lowest uncertainty, is designated as the Reference [1518, 21]. Bilateral comparisons can be made with or without the presence of an NMI.

PT performance assessment also allows predictions concerning the analytical performance of laboratories in one country or a large organization. Research indicates that, in the field of microbiology, it was possible to assess the performance of Belgian and Canadian laboratories in a project for technical improvement of laboratories [60]. A similar approach was presented in comparisons made in other countries such as Croatia, Finland, France, Germany, Hungary, Russia, Slovenia, Spain and Switzerland, where one can have an overview of the participants and can assess the quality of results issued broadly, identifying regional deficiencies [66].

The performance evaluation of laboratories can also be accomplished through the use of Certified Reference Materials (CRM), with property values already known. In this case, as the number of participating laboratories increases, the cost of PT increases, since these materials are expensive [63]. Another possibility is to use a consensus value or historical value of other PT. This approach is also discussed in the next section.

Bilateral comparisons are more frequent in the area of calibration or physical measurement systems. In the testing area more specifically in the chemical and biological areas, the most frequent type of PT is the simultaneous samples, where there are rounds of comparison with several laboratories (usually more than 20 involved).

Papers about calculation method for PT performance assessment

Most of the papers analyzed discuss the statistical methods used to evaluate the performance of laboratories in PT. Surveys indicate that there is a reasonable harmonization in the use of indexes such as z-score and Normalized Error (E n) but the procedure used to set the assigned value and the standard deviation (s) or uncertainty of reference are not harmonized [24, 41, 43, 46, 61, 68, 71, 73, 90, 98].

A result is considered satisfactory when the absolute value of the z-score is less than or equal to 2, questionable when it is between 2 and 3, and unsatisfactory when it is equal to or larger than 3. Already |E n| should be smaller than 1, so the results were satisfactory. The estimated reference standard deviation and measurement uncertainty need to be reliable. When they are not correctly estimated, the performance evaluation can be considered inconsistent [46].

There are different approaches to obtaining the assigned values in PT. The safest way is to obtain the value of a known sample, such as a CRM, or a reliable reference laboratory such as an NMI. Accredited laboratories could also be considered to be a reference, but for this, they should, in addition to accreditation, provide a suitable measurement capability (a reduced uncertainty) [41]. In the latter case, a prior demonstration of proficiency would also be advisable.

One of the common approaches in terms of calculation methods for PT performance assessment is the use of consensus value, calculated by classical or robust statistics. The reliability of the determination of the consensus value is relevant, since the mean, median or mode calculated will be designated as the reference value for PT. The estimated standard deviation also plays a key role in the evaluation of performance, so it must be assessed by the PT provider with caution [59, 61].

A study of PT providers from different European countries and the USA (in the health area with hemoglobin and leukocytes analyses) indicated that the method of calculation used for performance assessment does not have a standard [66]. The exclusion of outliers was performed by providers who participated in the survey, but using different procedures (for instance, in Russia values above 2s were considered outliers, in Finland, Spain, France, Hungary and Slovenia values above 3s, and in Germany values deviating from the value of the median by more than 40 %). In this same survey, the designation of the assigned value was performed in different ways: in Germany and Slovenia reference labs were involved; in Croatia, France, Russia, Spain and Finland the mean value was applied, while in Belgium and Switzerland the median value; and in Hungary the result of a specialist laboratory was used as the assigned value. This demonstrates the lack of standardization of the different providers. The criterion used for satisfactory results was also variable. Half of the countries surveyed used a percentage deviation from the designated target value as criterion, which ranged between 3 % and 25 %. These values are usually stipulated by the legislation of those countries. The other providers work with the criterion based on the deviation of PT, and the range of satisfactory ranged from s to 2s [66]. Besides the differences observed, the decision about the PT Scheme design in performance evaluation is not a cultural issue, not even a regional issue. The decision is based on the PT Scheme provider, except in the case that designs are set by regulation.

Several studies indicate that the probability distribution of the PT data, when working with consensus value, should also be considered [30, 32, 50, 116]. Ideally, it should follow a Gaussian distribution, that is, symmetrical. If the associated probability distribution is not normal, the assessment by consensus value may be impaired (in the case of bimodal or asymmetric distributions, for instance) [30].

Another important issue is when the number of laboratories is reduced in a PT (<30, for example), because one should be more careful in performance assessment, since the reliability regarding the estimated reference standard deviation tends to decrease significantly [57]. Another case that deserves special attention is when the amounts of the analyses of interest are very low, because in this case the use of the standard deviation of consensus may not be the best alternative. Studies indicate that the proposal of Horwitz or the determination of deviations based on historical data of rounds, considering the mass fraction of the element analyzed, prove to be the most appropriate alternatives for the designation of reference standard deviation [68, 73, 89].

Researchers have also conducted simulations to verify the suitability of the use of consensus values of the PT through the Monte Carlo method [41, 79]. In these studies, it is clear that the concentration of the analyses, the method bias, the tendency of the laboratory and its repeatability can affect the consensus value. Even so, the approach of using the consensus value was considered adequate (considering the different simulated scenarios). It is worth highlight that studies comparing the use of consensus value with the use of CRM as a reference value were also conducted.

It was observed that performance assessment by E n is more frequent in the calibration area. This index is the absolute value of the ratio between the difference of a value measured by a laboratory and a reference laboratory and the root of the quadratic sum of the expanded uncertainties (of the laboratory being assessed and of the reference laboratory). Usually, this index must be less than 1 to be satisfactory, but it is also possible that the evaluation criteria are less than 2 when working with a standard uncertainty [28, 39]. However, it is not possible to assess the performance of E n only mathematically. This index is valid only if the uncertainty of the reference value is less than or equal to the uncertainty of the laboratory being assessed. Studies show that even labs with E n < 1 still may have inadequate results compared to the others [59]. Other publications also comment on the necessary caution when comparing results with high uncertainty, which benefits the laboratories with a high random error. Moreover, it is commented on the problem of using only the z-score between laboratories in PT, which only evaluates the trueness of the laboratory, but does not account for its repeatability. Therefore, it is always necessary to consider uncertainty or its components in a consistent performance evaluation [65].

Other related approaches with methods for PT performance assessment can also be highlighted, such as: applications of a new statistical method ordinal analysis of variance (ORDANOVA) for interlaboratory comparisons with measurement or semi-quantitative (ordinal) and qualitative (binary) test results [78]; development of methods for quantitative analysis of PT [118], taking as an example the case of the Organization for the Prohibition of Chemical Weapons working with PT to verify that laboratories are able to identify prohibited chemical substances and hazardous samples [76]; using an average weighted by the uncertainty of the laboratories to create a consensus value and weighted averages with different criteria [26, 37]; application of ANOVA and ISO 5725 for performance assessment of PT participants [42]; use of PT participants’ results to perform assessment of homogeneity and stability of the data rounds of comparison [44], among others.

Papers about method validation and estimation of measurement uncertainty

Papers classified within the approach of this section are linked to the use of PT in the validation of a method and to support the estimation of its uncertainty. It is possible to estimate measurement uncertainty through PT [20] using alternative approaches so that the comparison data can be combined with data from internal quality control of a laboratory, thus obtaining a combination of different sources of variability focusing on a reasonable estimate of the uncertainty of a trial. Different authors also comment about the use of PT in the validation of methods that have been modified from their original proposal and, after a comparison with other laboratories, may consider that the changes were consistent and appropriate [25]. This two uses of PT Schemes are not pointed in ISO/IEC 17043.

Still on the validation methods, it can be stated that PT results could be used as an alternative to meet certain requirements such as analytical precision, trueness and uncertainty [43]. Furthermore, the samples of PT could be used in internal quality control. This additional use of PT can help laboratories to reduce the financial impact of its quality assurance procedure [43].

The adequacy of performance assessment performed in a PT is linked to uncertainty of the assigned value. Within this context, it is possible to work with a “target uncertainty.” The importance of implementing the “target measurement uncertainty” was indicated in different areas (testing and calibration). For a proper comparison, it was recommended that uncertainty target was at least three times less than the uncertainty of the participating laboratory [53]. This way, the laboratory can identify whether or not its uncertainty is appropriate [64].

Since the publication of the Guide to the Expression of Uncertainty in Measurement (GUM), many projects have been carried out to develop an alternative practice when it is technically or economically difficult to obtain a suitable mathematical model of the measurement [62]. Many laboratories are also reluctant to apply the law of propagation of uncertainty with its apparent mathematical complexity. These alternative practices can use the experimental data available from laboratories, such as repeatability, reproducibility, control charts, PT, among others. The only point to be noted in this approach is the fact that the standard uncertainty used based on the PT may be higher, because this proposal takes into account all the variability introduced by the different analytical methods. A more promising method for estimating uncertainty would be to use a combination of PT data and internal validation data of the method or quality control [62].

There is a mathematical model that was tested to estimate uncertainty of a laboratory, relating it to the standard deviation of the measurement and with the concentration of an analysis. This model was evaluated through a meta-analysis considering different PT, where its wide application was evident. The proposed mathematical function may be represented by the square root of the quadratic sum of α and C·β, where C is the analyzed concentration [67]. The parameter α is connected to the detection limit of the method and β, to the relative accuracy of the method.

With these two parameters, a curve can be developed, where on the x-axis there is the mass fraction of the element that is being analyzed and on the y-axis, the standard deviation related to the concentration. Thus, it is possible to obtain the constants α and β of the mathematical model mentioned before and to obtain the standard deviation for reproducibility of the measurement system for any concentration value. This can be done with different analytical parameters. Obviously, a good estimate of model data depends on different concentrations of PT and preferably with a large number of participants. The reproducibility standard deviation is the major component of the standard uncertainty, from which the expanded uncertainty is obtained by multiplying with the coverage factor k; in most cases k = 2 is chosen for a confidence level of approximately 0.95 [67]. Other research on the same topic claim that this approach is useful and if applied appropriately makes available equations related to the performance of different analytical methods, besides the fact that the measurement uncertainty can be estimated for different concentrations [80]. It is worth highlight that these equations can be used to obtain an indication of the average quality of analytical results in a specific field and can be used by regulatory bodies to formulate legislation requirements according to the quality of existing measurement in the area [80].

Finally, other researchers indicate that the two most important concepts in metrology are certainly traceability of standards used and its measurement uncertainty, and its concepts are related to PT Schemes [75]. In areas such as chemistry and biology, many problems remain to be resolved to support international agreements related to these concepts. Therefore, NMI laboratories in these areas have developed strategies so that conclusions in PT are feasible and increasingly frequent [75], due to its importance and connection with traceability and uncertainty.

Papers about management and improvement of PT

The PT is developed by providers, who must also have proven their qualifications through an assessment of an accreditation body. These assessments are relatively recent, beginning through pilot programs, mainly in Europe, in 2005 [48, 88]. In Brazil, this activity became an official accreditation only in 2011, after the implementation of a pilot project by INMETRO.

International research conducted with 160 different providers from 32 countries show a strong tendency for accreditation of PT [47]. According to these surveys, it was found that this type of evaluation is based on various combinations of normative documents, which may illustrate a lack of harmonization of accreditation bodies. Furthermore, it was shown that some customers have an appeal to their suppliers to seek accreditation. However, among the providers consulted, less than half expect an improvement in their quality through accreditation and more than half expect a significant increase in their costs [47].

Another interesting approach is the possibility of organizing interlaboratory collaborative studies with a purpose of assessing the performance of the analytical test method and not only from laboratories [72]. Within this context, researchers recommend care in the management and conduct of a trial for purposes of performance assessment methods, as well as their statistical analysis. Issues such as the choice of participating laboratories and the designation of the assignment values are important. Therefore, it is clear that it is possible to establish a standard method for analysis through rounds of interlaboratory collaborative studies, with greater assurance that the developed method provides reproducibility in different operating conditions [72].

Requirements applicable to PT are similar to those considered in the production of reference materials [78]. The samples of PT should have a degree of homogeneity and stability for the purpose of identifying differences between the laboratories. Based on this logic, the process used to prepare the samples held by the provider must be appropriate and shall ensure the quality of the items that will be sent to laboratories in the comparison rounds [78]. Tests for homogeneity and stability are essential in this context.

Normally, PT are performed in rounds that occur during 1 year. Studies in the field of occupational medicine indicate that 28 % of PT run with 4 rounds per year [72]. Similar results were observed in hematology and microbiology, with a median of 3 rounds per year. The median of biochemistry was 6 rounds per year, where 33 % of the PT have intervals of 1 month. The number of samples per round varied between 1 (31 %) and over 20 (0.5 %), where most providers offer between 1 and 3 samples per round (83 %) [72].

The implementation of the PT has a wide area. Initially, they were most in demand in the area of calibration, being performed mainly by reference laboratories. The medical area also started with PT compulsory participation, due its importance. According to accreditation bodies, today the demand for PT in different areas is greater than its supply and availability. The expansion of PT is increasingly perceived in the field of chemical, biological, geological, agricultural tests and even in the veterinary area [51]. Nowadays, most of the PT done in the world is in medical areas.

Different international regulatory agencies also consider the PT as an appropriate way to ensure the reliability of laboratory results and, on several occasions, make participation in these activities compulsory [86]. Yet, research indicates that laboratories participating in PT over time tend to improve their results, as well as the providers improve the management and reliability of their programs [99, 101].

Providers also had to adapt and start work focused on better management of its activities, seeking compliance with standards such as ISO/IEC 17043 [117, 119, 120]. This standard addresses technical and managerial issues that should be followed by PT providers; however, it is still not compulsory to use in many countries. Meeting this standard, in an isolated way, when not assessed by a third part like an accreditation body, does not guarantee proper operation of the PT developed, since an adequate managerial capacity installed in companies and an appropriate technical knowledge on the subject are necessary.

Other selected references

The second stage of the systematic review focused on the pursuit of standards and guideline of renowned entities in the PT area. We selected the most-cited references in the articles that were considered in the previous step. Another 34 references were identified, from International Organization for Standardization (41.2 %), American Society for Testing and Materials (14.7 %), Asia Pacific Laboratory Accreditation Cooperation (14.7 %), International Union of Pure and Applied Chemistry (5.9 %), European Co-operation for Accreditation (5.9 %), European Federation of National Associations of Measurement, Testing and Analytical Laboratories (5.9 %), NORDTEST (2.9 %), Bureau International des Poids et Mesures (2.9 %), International Laboratory Accreditation Cooperation (2.9 %) and InterAmerican Accreditation Cooperation (2.9 %).

The selected references were classified into three approaches. The division performed is shown in Table 2. After the classification, a summary of the approach of these documents according to their classification is shown.

Table 2 Approaches and other selected publications

Definitions, management, operation and use of PT programs

Standards that address definitions of PT are mostly published by ISO. Some norms are for guidance [126, 128, 134137], addressing specific PT in technical areas such as tissues, microbiology, petroleum products, among others. There are, in this group, standards that are used to accredit laboratories [1, 129] and that address PT in the field of quality assurance of testing or calibration.

Other standards are also used in the accreditation of reference material producers and providers of PT [6, 124], the latter of which establishes the technical and management requirements that must be followed to conduct a PT appropriately. Reference material producers and PT providers are different types of organizations, and they should not be confounded. Among the surveyed standards, ISO/IEC 17043 is the most complete and is used globally by different providers in different areas [6].

In this category, there are also standards [130] and other documents published by organizations that establish major policies for the accreditation process for laboratories and providers PT [144146]. These documents establish the minimum frequency of participation in PT, which should be the policies of the bodies to assess inadequate results obtained in PT and how these factors may influence an accreditation process.

Statistical methods for PT performance assessment

Several standards and guideline documents have different approaches to PT performance assessment [122, 123, 125, 127, 131133, 138, 139]. Most documents converge in the use of the same indicators for performance rating, the most common being the z-score (and its variations as Z′-score, zeta-score, etc.) and E n. However, the method of calculation or estimation of reference values shows much divergence and relative lack of standardization. The standards usually present examples of the application of its procedures to set the assigned values, but they are general. It is common to need a “fit for purpose” in each specific PT Scheme developed.

Most documents propose the evaluation of repeatability, reproducibility and accuracy of the results of the participating laboratories in comparisons, but in a general way. Still, regarding the tests of homogeneity and stability of the items that are compared (samples or artifacts), we emphasize that the references do not provide details regarding how many analyses/parameters should be selected to consider testing representative and consistent. The documents cited in this section do not address in detail the influence that the probability distribution of the data may have on the results of PT.

Use of PT to estimate measurement uncertainty

Documents for estimation of the measurement uncertainty were also frequently referenced in the articles selected in this systematic review. Neither is focused only on PT, since they address methods for estimation of uncertainty in testing or calibration [140143].

Furthermore, some documents suggest alternative approaches to calculating uncertainty, considering the results of PT [141, 142]. These approaches should be selected carefully, as the result of uncertainty can be strongly influenced by the performance of the participants of the comparison. Still, these alternative approaches are recommended when there is little information on the sources of variation of the method or when getting values associated with measurement accuracy is complex.

Regarding to uncertainty measurement of the assigned values, we think that is a point to be improved. The standard ISO13528 gives a very simple approach to establishing the uncertainty of the assigned value when the provider uses consensus value. In this case, the uncertainty can vary drastically according to the number of laboratories that are in the round [127].

Identification of gaps to be exploited

We can see the importance of the topic and the increasing demand for participation in PT, whether it is required by the government, accreditation and conformity assessment bodies. Due to the numerous areas of laboratories, providers are not yet prepared to meet all existing demands. Still, there is a perceived need for the structuring of these organizations in terms of obtaining adequate standards in the area and agile management to meet the market demands. Several PT are developed in different countries and different areas, but approaches that assist management of providers with a view of projects were not found in the sources researched. The main area of the reference standard, ISO/IEC 17043, also does not address the issue of development and management of PT with the project vision. It is likely that this is an issue to be explored. This standard does not consider areas such as risk, costs, strategy and time management, i.e., which are typical from project management knowledge, and could be useful in PT Schemes.

Although publications related to the topic often address the link between method validation, measurement uncertainty and PT, it is clear that there is not a document that presents a logical interface between these themes. This ends up creating doubts and does not always clear up what the actual intended use of PT is.

Another important issue, discussed by different researchers, is the impact that the probability distribution of the data can have on performance assessment. These issues, in most cases, are not considered by the providers and may have a high impact on the statistical treatment of data, especially when working with consensus value (with references generated with data from the participants of the PT). Still, the standards of the area neither report details on this fact nor report procedures for assessing probability distributions obtained in PT.

It is noteworthy that a factor cited in different studies was the homogeneity and stability of the samples prepared in PT and the need for ensuring this point to increase confidence in the round of comparison. However, the standards and publications do not make clear what the criteria are for selection of parameters for these tests should be, as well as how many parameters would be representative for an adequate test of homogeneity and stability of the samples. This fact deserves attention, since a false sense of homogeneity or stability may compromise the trust of a PT.

Finally, the ISO Standards related to PT are, sometimes, general and not specific, because there are an enormous variety of measurement fields, national regulations and “fit for purpose” needs—one laboratory’s needs for accuracy and precision are not always the same as another’s.

Conclusions

This study presented a systematic review that covered the period from 2005 to 2012 (June) considering publications related to the theme PT. A total of 147 references were selected, including articles, standards and guideline documents.

Thus, it is considered that the research objective was achieved, since we analyzed the expertise and main practices related to the theme PT in the research sources listed above. These shortcomings were raised as follows: management of PT projects; analysis of the link between validation, PT and measurement uncertainty; preliminary evaluation of the probability distribution of the data from PT; selection of variables for testing homogeneity and stability. The shortcomings are not limited to these topics, though this analysis is based on the perception of the main factors analyzed. In future, researches or reviews about this theme are advisable to include published PT reports offered by international cooperation’s (for example IMEP and APLAC) and private schemes that are offered internationally.