PDF -An Evidence Profile for Software Engineering - Claes Wohlin - Evidence-based Software Engineering
Wait Loading...


PDF :1 PDF :2 PDF :3 PDF :4 PDF :5 PDF :6 PDF :7


Like and share and download

Evidence-based Software Engineering

An Evidence Profile for Software Engineering - Claes Wohlin

Evidence-based Software Engineering PDF Evidence based software engineering ResearchGate researchgate Evidence based software engineering Evidence based software engineering pdf PDF Evidence based software engineering IEEE Xplore ieeexplore ieee iel5 9201 29176 01317449

Related PDF

Evidence-based software engineering - ResearchGate

[PDF] Evidence based software engineering ResearchGate researchgate Evidence based software engineering Evidence based software engineering pdf
PDF

Evidence-based software engineering - IEEE Xplore

[PDF] Evidence based software engineering IEEE Xplore ieeexplore ieee iel5 9201 29176 01317449 pdf
PDF

Evidence-Based Software Engineering

[PDF] Evidence Based Software Engineering inf fu berlin de inst ag se S 091 Kitchenham EBSE pdf
PDF

Current State of Evidence-Based Software Engineering - School of

[PDF] Current State of Evidence Based Software Engineering School of scm keele ac uk ease keynote07 2 pdf
PDF

Evidence-Based Software Engineering for - Semantic Scholar

[PDF] Evidence Based Software Engineering for Semantic Scholar pdf s semanticscholar 5ef2c79e4d39d7af8a07f9901e5ba6d008af pdf
PDF

Evidence Based Software Engineering And Systematic Reviews

[PDF] Evidence Based Software Engineering And Systematic Reviews ftp primar evidence based software engineering and systematic reviews chapman hallcrc innovations in software engineering and software
PDF

An Evidence Profile for Software Engineering - Claes Wohlin

Abstract Evidence based software engineering has emerged as an important part of software engineering The need for empirical evaluation and hence
PDF

PDF Delft University of Technology Evidence Based Software Portfolio pure tudelft nl ws files 37694615 Thesis Hennie Huijgens final pdf PDF Evidence Based Software Portfolio Management CEUR Workshop ceur ws Vol 1469 paper4 pdf PDF Evidence Based Software Engineering And Systematic

PDF Evidence for adaptive changes in egg laying in Semantic Scholar pdf s semanticscholar 8b20e6f99e826ed7185fec2a37aab899e988 pdf PDF Parasitic manipulation Mivegec IRD mivegec ird images stories PDF files 0219

7 Sep 2010 syrup in food for seasoning and they used no salt for preserving food first recorded evidence is from 1623 when Étienne Brulé, who was cruel and deadly lesson of the Kinmount and Muskoka catastrophe Þorvaldsson

ótica da evidenciação voluntária dos elementos do Capital Intelectual nos Relatórios da Administração disponibilizados pelas companhias, utilizando como  Resumo O Capital Intelectual (CI), nos últimos anos, tornou se mais relevante para as organizações É notório que sua evidenciação interfira como nova fonte   Evidenciação do Capital

  1. A Evidenciação do Capital Intelectual por Companhias Abertas
  2. 1 EVIDENCIAÇÃO VOLUNTÁRIA DO CAPITAL INTELECTUAL NAS
  3. um estudo sobre a evidenciação de capital intelectual nos relatórios
  4. 4 Evidenciação do Capital Intelectual pelas Empresas do Setor de
  5. evidenciação de elementos do capital intelectual nos
  6. 1 Evidenciação do Capital Intelectual
  7. EVIDENCIAÇÃO DO CAPITAL INTELECTUAL NOS
  8. evidenciação de investimentos em capital humano nos relatórios da
  9. CAPITAL INTELECTUAL
  10. capital intelectual e rentabilidade das

PDF Condiciones del Contexto para el Emprendimiento Negocios UDDnegocios udd cl files 2011 05 GEM 5 AÑOS completo FINAL pdf PDF 2 principales aportes teóricos sobre emprendimiento Municipiosmunicipios unq edu ar modules MonografiaVersionFinal pdf PDF Emprendimiento e

Evidencias de la utilidad de la melatonina frente al envejecimiento y los procesos neurodegenerativos

Evidencias de la utilidad de la melatonina frente - Viguera Editores

La medicina actual está cambiando drásticamente, en los últimos años, durante exposiciones científicas se escucha con frecuencia la expresión “la evidencia evidencia y la importancia que ha tenido sobre la medicina conductual habilidades para la búsqueda y aplicación de la evidencia en el caso en turno, 

  1. Importancia de la medicina basada en evidencias en
  2. Medicina basada en la evidencia y su importancia en la medicina
  3. Jerarquización de la evidencia
  4. Niveles de evidencia y
  5. Evidencia de la utilidad de la monitorización
  6. MEDICINA BASADA EN EVIDENCIAS
  7. MEDICINA BASADA EN LA EVIDENCIA Dra
  8. que es la evidencia cientifica domingo comas
  9. Evidencias de la utilidad de la melatonina frente
  10. La crisis de la medicina basada en evidencias

Evidências em Otorrinolaringologia. Índice Temático 2014

Diagnóstico y Tratamiento de OTOSCLEROSIS Evidencias y

7 Sep 2007 Medicina Basada en la Evidencia o en Pruebas • Uso consciente, explícito Los resúmenes de la “web temática de la espalda” de la Fundación Kovacs Guías clínicas en Fisterra, fisterra guias2

  1. Cómo localizar la mejor evidencia científica
  2. Revista Portuguesa de Otorrinolaringologia e Cirurgia
  3. Balneoterapia e otorrinolaringologia
  4. Diagnóstico y Tratamiento de OTOSCLEROSIS Evidencias y
  5. actualización en otorrinolaringología pediátrica
  6. Guía de Atención Integral Basada en la Evidencia para Hipoacusia
  7. Otorrinolaringología
  8. Manual Latinoamericano de Guías Basadas en la Evidencia
  9. Guía de Atención Integral Basada en la Evidencia para
  10. Guía de Práctica Clínica GPC Amigdalectomía en Niños Evidencias

Evidencias en el tratamiento con hormona del crecimiento. Nuevas indicaciones

Diagnóstico, Tratamiento y Cuidado de la Salud en niñas y mujeres

4 Feb 2010 Evidencias en el tratamiento con hormona del crecimiento Nuevas indicaciones Emilio García García Unidad de Endocrinología Pediátrica La Hormona de crecimiento (GH) es la hormona de la glándula pituitaria En efecto hay evidencia de que la terapia GH

  1. Evidencias en el tratamiento con hormona del crecimiento
  2. Poblemas Éticos relacionados con la hormona de crecimiento y su
  3. deficiencia de hormona de crecimiento y otras indicaciones
  4. Efectividad y seguridad de la somatropina para el tratamiento
  5. Guía de práctica clínica Diagnóstico y tratamiento de la
  6. Indicaciones actuales para el uso de la hormona de
  7. Tratamiento y Cuidado de la Salud en niñas y mujeres
  8. EVIDENCIAS Y CONTROVERSIAS
  9. Diagnóstico y tratamiento de en el adulto
  10. Estado actual del tratamiento en Climaterio

PDF Evidenzbasierte Leitlinie der Deutschen Diabetes Gesellschaft deutsche diabetes gesellschaft de Leitlinien Evidenzbasierte Leitlinien EBL Alter 2004 pdf PDF S3 Leitlinie Therapie des Typ 1 Diabetes, 2 Auflage Deutsche deutsche

Home back Next

Software Engineering Barbara A.

Kitchenham 1,3 Tore Dybå 2,4 Magne Jørgensen 2 1 National...

Description

Evidence-based Software Engineering Barbara A.

Kitchenham1,3 [email protected]

Tore Dybå2,4 [email protected]

Magne Jørgensen2 [email protected]

National ICT Australia,

Locked Bag 9013 Alexandria,

NSW 1435,

Australia 2 Simula Research Laboratory,

Box 134,

NO-1325 Lysaker,

Norway 3 Dept.

Keele University,

Staffordshire ST5 5BG,

UK 4 Dept.

SINTEF ICT,

NO-7465 Trondheim,

Norway

Abstract Objective: Our objective is to describe how software engineering might benefit from an evidence-based approach and to identify the potential difficulties associated with the approach.

Method: We compared the organisation and technical infrastructure supporting evidence-based medicine (EBM) with the situation in software engineering.

We considered the impact that factors peculiar to software engineering (i.e.

the skill factor and the lifecycle factor) would have on our ability to practice evidence-based software engineering (EBSE).

Results: EBSE promises a number of benefits by encouraging integration of research results with a view to supporting the needs of many different stakeholder groups.

However,

we do not currently have the infrastructure needed for widespread adoption of EBSE.

The skill factor means software engineering experiments are vulnerable to subject and experimenter bias.

The lifecycle factor means it is difficult to determine how technologies will behave once deployed.

Conclusions: Software engineering would benefit from adopting what it can of the evidence approach provided that it deals with the specific problems that arise from the nature of software engineering.

Introduction In the last decade,

medical research had changed dramatically as a result of adopting an evidence-based paradigm.

In the late 80s and early 90s,

studies showed on the one hand that failure to organise medical research in systematic review could cost lives [5] and on the other hand that the clinical judgement of experts compared unfavourably with the results of systematic reviews [1].

Since the publication of these influential papers,

medical researchers have adopted the evidence-based paradigm.

Sackett et al.

the number of articles about evidence-based practice has grown from 1 publication in 1992 to about a thousand in 1998 and international interest has led to the development of 6 evidence-based journals specialising in systematic reviews.

The success of evidence-based medicine has prompted many other disciplines that provide services to,

members of the public to attempt to adopt a similar approach,

including for example psychiatry1,

We do not suggest that software engineers should adopt a new practice just because “everyone else is doing it”,

particularly since the evidence-based movement has its critics.

For example,

Hammersley points out that research is fallible,

relies on generalisations that may be difficult to interpret,

and is often insufficient for determining appropriate means for delivering best practice [6].

However,

we believe that a successful innovation in a discipline that,

attempts to harness scientific advances for the benefit of society,

Thus,

in this paper we discuss the possibility of evidence-based software engineering using an analogy with medical practice.

We describe the scientific and technical infrastructure needed to support EBSE.

We also identify two areas where the analogy with medicine breaks down.

This allows us to identify a number of serious problems that need to be resolved if EBSE is to become a reality.

www.med.nagoya-cu.ac.jp/psych.dir/ebpcenter.htm www.york.ac.uk/healthsciences/centres/evidence/cebn.htm 3 www.evidencenetwork.org 4 cem.dur.ac.uk/ebeuk/EBEN.htm 2

Why evidence is important in software engineering

practical experience and human values in the decision making process regarding the development and maintenance of software.

Initially it is worthwhile considering why evidence would be beneficial to software developers,

users and other stakeholders e.g.

certification bodies and the general public.

EBSE is potentially important because of the central place software intensive systems are starting to take in everyday life.

For example,

current plans for advanced life-critical systems such as drive-by-wire applications for cars and wearable medical devices have the potential for immense economic and social benefit but can also pose a major threat to industry,

If systems are reliable,

the quality of life of individual citizens will be enhanced.

However,

there are far too many examples of systems that have not only wasted large amounts of public money but have also caused harm to individual citizens (e.g.

the automated command and control system for the London Ambulance Service).

Individual citizens have a right to expect their governments to properly administer tax revenues used to commission new software systems and put in place controls to minimise the risk of such systems causing harm.

There are many strategies to improve the dependability of software involving the adoption of “better” software development procedures and practices.

At a high level,

the Capability Maturity Model and SPICE suggest procedures for improving the software production process.

In addition,

the professional bodies are establishing procedures for certification of individual software engineers.

However,

the high level process and the individual engineers are constrained by the specific technologies (methods,

tools and procedures) they use.

In most cases software is built with technologies for which we have insufficient evidence to confirm their suitability,

Thus,

it is difficult to be sure that changing software practices will necessarily be a change for the better.

It is possible that EBSE can provide the mechanisms needed to assist practitioners to adopt appropriate technologies and to avoid inappropriate technologies.

Thus EBSE would provide:

The goal of EBSE The goal of evidence-based medicine (EBM) is “the integration of best research evidence with clinical expertise and patient values” [14].

By analogy,

we suggest that the goal of evidence-based software engineering (EBSE) should be: to provide the means by which current best evidence from research can be integrated with

• A common goal for individual researchers and research groups to ensure that their research is directed to the requirements of industry and other stakeholder groups.

• A means by which industry practitioners can make rational decisions about technology adoption.

• A means to improve the dependability of software intensive systems,

as a result of better choice of development technologies.

• A means to increase the acceptability of softwareintensive systems that interface with individual citizens.

• An input to certification processes.

Practising EBSE Sackett et al.

These steps are shown in the second column of Table 1.

Although there are some detailed medical references,

it is easy to reformulate the steps to address evidence-based software engineering (see column 3 of Table 1).

In fact,

we would hazard a guess that at least part of the attraction that evidence-based medicine has for other disciplines is the ease with which the basic steps can be adapted to other fields.

However,

it is important to remember that even if high-level process steps for evidence-based practice appear to be similar for medicine and software engineering,

this does not guarantee that the underlying scientific,

technological and organisational mechanisms that support evidence-based medicine apply to evidence-based software engineering.

For this reason we consider each step in more detail below.

The first point to note is that Sackett et al.

For EBSE our viewpoint is likely to be somewhat different.

In software engineering organizations,

individual developers seldom have the option to pick and choose the technologies they are going to use.

Technology adoption is often decided either by project managers on a project by project basis,

or by senior managers on a departmental or organizational basis.

Furthermore,

our concern is not usually the specific task to which the technology is applied but the outcome of the project of which the task is a part.

Table 1.

Five steps used in Evidence-based Medicine and (by analogy) in Evidence-based Software Engineering.

Ste p 1 2 3

Evidence-based Medicine

Evidence-based Software Engineering

Converting the need for information (about prevention,

etc) into an answerable question.

Tracking down the best evidence with which to answer that question.

Critically appraising that evidence for its validity (closeness to the truth),

and applicability (usefulness in our clinical practice).

Converting the need for information (about development and maintenance methods,

management procedures etc.) into an answerable question.

Tracking down the best evidence with which to answer that question.

Critically appraising that evidence for its validity (closeness to the truth),

and applicability (usefulness in software development practice).

Integrating the critical appraisal with our software engineering expertise and with our stakeholders’ values and circumstances.

Evaluating our effectiveness and efficiency in executing Steps 1-4 and seeking ways to improve them both for next time.

Integrating the critical appraisal with our clinical expertise and with our patient's unique biology,

Evaluating our effectiveness and efficiency in executing Steps 1-4 and seeking ways to improve them both for next time.

Thus,

in addition to the viewpoint of the individual practitioner,

there are two other viewpoints that are important for EBSE in practice: 1.

That of a project manager who wants to achieve a favourable outcome for a particular project.

That of a senior manager who wants to improve the performance of a department or organization as a whole.

In addition,

we believe EBSE (and indeed EBM) place requirements on researchers: • To improve the standard of individual empirical studies and systematic reviews of such studies.

• To identify outcome measures that are meaningful to practitioners.

• To report their results in a manner that is accessible to practitioners.

• To perform and report replication studies.

Defining an answerable question Sackett et al.

The study factor (e.g.

The population (the disease group or spectrum of the well population).

The outcomes.

Medical practitioners are usually interested in all forms of outcomes,

so they usually concentrate on the first two parts.

This would be the same for EBSE.

EBM researchers point out that it is important that the question is broad enough to allow examination of variation in the study factor and across populations.

In EBSE,

the study factor would be the technology of interest.

The technology should not be specified at too high a level of abstraction e.g.

but must be general enough to identify the majority of relevant empirical studies,

Agile methods,

For some questions it may be necessary to be even more precise e.g.

Contract-based specifications,

Pair-programming,

or Statistically-derived estimation models.

It is even more difficult to determine the correct level of abstraction for specifying the population of interest.

The population of interest may be categorised in many dimensions based on experience of technology users,

types of problem addressed by the technology,

However,

even fairly broad categories may be counter-productive if useful empirical evidence is lost by restrictions imposed by such categorisation.

Finding the best evidence One of the reasons for formulating the question precisely is to help researchers and practitioners to find all relevant studies.

According to the Australian National Health and Medical Research Council,

there are over 20,000 journals in the biomedical field.

A major problem for EBM is finding relevant papers from the massive amount of published work.

Medical researchers and practitioners use a two-stage process:

They look for already published systematic reviews,

papers that have already assembled all relevant reports on a particular topic.

They use the question of interest to construct search strings aimed at finding relevant individual studies.

However,

they have a large amount of technological and scientific infrastructure to support them: • There are several organisations (in particular the international Cochrane Collaboration,

see www.cochrane.com) that assemble systematic reviews of studies of drug and medical procedures.

To provide a central information source for evidence,

the Cochrane Collaboration publishes systematic reviews in successive issues of The Cochrane Database of Systematic Reviews.

These reviews are continually revised both as new experimental results become available and as a result of valid criticisms of the reports.

The Cochrane Collaboration actively solicits comments on their reports (subject to published house rules).

• Some countries have established central abstracting services for medical research papers.

The largest and most well-known is the Medline data base (www.nlm.nih.gov),

which provides references and abstracts from 4600 biomedical journals.

• To reduce the problem of “publication bias”,

the Cochrane Collaboration provides a database for researchers to register that they are intending to perform a controlled trial.

Publication bias is the phenomena that more “positive” results are published than “negative” results.

This can lead to an overestimation of the effect size in systematic reviews and an under-reporting of risks.

The Cochrane Collaboration Groups use the register to follow up all trials whether or not they are published.

Although we have no equivalent to the Cochrane Collaboration,

there are many abstracting services that provide access to software engineering articles.

Organizations such as the IEEE,

with its database IEEE Xplore,

with its Digital Library provide access to databases of articles.

The articles are indexed by author names,

and keywords and usually have links to abstracts and sometimes access to the original articles.

Such indexing makes it easier to search for information regarding a problem area or find an answer to a specific question.

Critically appraising the evidence The work of the Cochrane Collaboration and other national medical organisations (e.g.

the Australian National Health and Medical Research Council) has

radically changed the nature of medical research.

Medical research has recognised that single studies (even the most rigorous double-blinded randomised controlled trials,

RCTs) are insufficient to properly qualify a medical treatment.

The emphasis now is on the accumulation of evidence from many independent experiments.

Critical appraisal in EBM has been supported by improved methodology both for systematic reviews and individual studies: • Several organisations have produced guidelines for systematic reviews and evaluating evidence.

The Cochrane collaboration publishes a handbook (The Cochrane Reviewers Handbook,

March 2003).

The Australian National Health and Medical Research Council publish a series of more general handbooks that consider experimental methods other than just RCTs (see www.health.gov.au/mrc).

Importantly the Australian NHMRC makes a distinction between collating experimental evidence and packaging the evidence into tailored guidelines for various stakeholders.

• Medical journals have pressed for improvements in the conduct and reporting of individual experiments.

A particular example is the CONSORT statement,

which defines the standards for randomised,

This statement has been adopted as the standard for reporting RCTs by all the most important medical journals.

This can be contrasted with the situation in empirical software engineering.

Currently evidence related to software engineering technologies that is available is: • Fragmented and limited.

Many individual research groups undertake valuable empirical studies.

However,

because the goal of such work is either individual publications and/or post-graduate theses,

there is sometimes little sense of overall purpose to such studies.

Without having a research culture that strongly advocates systematic reviews and replication,

it is easy for researchers to undertake research in their own areas of interest rather than contribute to a wider research agenda.

Currently,

there are no agreed standards for systematic reviews.

Thus,

although most PhD students undertake reviews of the “State of the Art” in their topic of interest,

the quality of such reviews is variable,

and they do not as a rule lead to published papers.

There is little appreciation of the value of systematic reviews,

there is only one Computing journal that solicits reviews (ACM Surveys).

Furthermore,

if we consider “meta-analysis”,

which is a more statistically rigorous form of systematic review,

there have been few attempts to apply meta-analytic

techniques to software engineering not least because of the limited number of replications.

In general there are few incentives to undertake replication studies in spite of their importance in terms of the scientific method [11].

There are no generally accepted guidelines or standard protocols for conducting individual experiments.

The recent dispute between Berry and Tichy [4] and Sobel and Clarkson [17] over the conduct of an experiment into formal methods [16] makes it clear that empirical software engineering is badly in need of guidelines and protocols.

Kitchenham et al.

However,

they do not address observational,

Furthermore,

because they attempt to address several different types of empirical study,

the guidelines are not as specific,

nor as detailed as the CONSORT statement.

Integrating the critical appraisal with software engineering expertise In EBM,

a doctor is expected to relate evidence to the needs of the specific patient.

For example,

the advice given to a patient with a particular disease may differ according to his/her age and gender and the severity of the symptoms he/she displays.

Although there are opportunities for individual software engineers and managers to adopt EBSE principles,

the decision to adopt a technology is often an organisational issue that is influenced by factors such as the organizational culture,

the experience and skill of the individual software developers,

and the extent of training required.

Thus,

to use EBSE in practice may be more demanding than EBM because the decision-making structure and adoption process is often more complex.

We believe that EBSE would work well in an organization that has a strong commitment to process improvement,

based on the recommendations in [3].

However,

currently this does not appear to be happening.

Research results are: • Not in wide-spread use in industry.

In our opinion,

researchers often address issues that are not perceived to be of relevance to industry or present their results in a way that is virtually incomprehensible to decision makers in industry.

• Not of perceived value to stakeholders.

Certification bodies,

and consumer groups should all be concerned about the quality of the techniques used to build software products.

It is likely that any trust such groups have in the quality

of software intensive products would be substantially undermined if they were aware that the choice of development techniques is based on fashion and hype rather than scientific evidence.

Evaluation of the process Sackett et al.

For EBSE,

this would involve propagating successful technologies throughout a company and preventing the spread of technologies that are unsuccessful.

This concept fits well with the goals of software process improvement.

However,

there is a broader level of feedback in medicine.

For example,

individual doctors are responsible for reporting unanticipated side-effects of drugs.

This contributes to the evidence associated with a particular treatment and may lead to further basic research.

It would be useful if this model could be applied in software engineering.

However,

there is little incentive for individual companies to assist their competitors by reporting good and bad experiences with new technologies.

In addition,

it is difficult for experiences with a technology to be disentangled from the particular context in which it was used.

Implications for EBSE It is clear that a full-scale implementation of EBSE is an extremely ambitious goal.

It cannot be achieved without extensive collaboration and long-term commitment among individual research groups worldwide,

and active support from other stakeholders such as practitioners in industry,

Furthermore it cannot be achieved without initial financial support from research funding agencies to enable the basic technological and methodological infrastructure to be established.

It is clear that individual practitioners and researchers can use some of the ideas of EBSE without extensive technical support.

However,

Sackett et al.

suggest that the support infrastructure is a major reason for the widespread adoption of the evidence-based paradigm in medicine [14].

Scientific foundations With enough resources,

technological and organisational support for evidence-based software engineering could be put in place.

However,

such support would be of little value if there were fundamental differences between medicine and software engineering

that would make evidence-based software engineering difficult or impossible.

The skill factor One area where there is a major difference between medicine and software engineering is that most software engineering methods and techniques must be performed by skilled software practitioners.

In contrast,

although medical practitioners are skilled individuals,

the treatments they prescribe (e.g.

medicines and other therapeutic remedies) do not usually require skill to administer or to receive.

Furthermore,

it is noticeable that evidence-based surgery,

which is far more analogous to software engineering than other types of medical practice,

is far less advanced than evidence-based medicine [18].

The reason why skill presents a problem is because it prevents adequate blinding.

In medical experiments (particularly drug-based experiments),

the gold standard experiment is a double-blind randomised controlled trial (RCT).

In a double-blind experimental trial neither the doctor nor the patient knows which treatment the patient is receiving.

The reason double-blinded trials are required is to prevent patient and doctors expectations biasing the results.

Such experimental protocols are impossible in software engineering experiments that rely on a subject performing a human-intensive task.

There are two complementary approaches we can adopt to address this issue: 1.

We can develop and adopt experimental protocols that reduce experimenter and subject bias.

We can accept that our experiments are bound to be less rigorous than medical trials and attempt to qualify our experiments appropriately.

5.1.1.

Experimental protocols.

Although we cannot usually blind experimenters or subjects,

we can use blinding in a number of ways to reduce the opportunity for bias by reducing the direct interaction between subjects and experimenters during the course of an experiment [8]: • Blind allocation to treatment groups.

When we run experiments to compare different techniques,

computerised methods can be used to automate the random allocation of subjects to each technique..

• Blind distribution of material.

Linked to blind allocation,

computers can be used to distribute experimental materials to subjects.

• Blind or automated marking.

If the task results cannot be linked directly to the treatment e.g.

where the subjects are asked to answer some questions that test their understanding of a software document,

the marker(s) should be blind to which treatment was used by the subjects (i.e.

not identify the subject or the treatment to the marker).

Occasionally marking can be computerised.

The results should be coded so the analyst does not know which treatment group is which.

Computerised systems can be used when information is required from subjects.

This can also improve the accuracy of such data.

Subjects will usually be more accurate in reporting information to a computer system,

particularly if they are guaranteed anonymity.

In addition,

we need to ensure that experimental designs allow for systematic subject difference due to skill,

or cross-over designs (where each subject acts as their own control).

Last but not least,

we should encourage replication studies by experimenters who have no vested interest in the outcome of the study.

However,

we need to make sure replications are not too similar.

It is important to vary experimental designs and experimental materials to avoid the risk of any common cause bias in replications [8].

5.1.2.

Evaluating experiment quality.

Even in EBM,

it is recognised that it is sometimes impossible to perform randomised trials and evidence from other types of experiment may need to be considered.

The Australian National Health and Medical Research Council have published guidelines for evaluating the quality of evidence [2].

They consider: • The strength of the evidence.

This has three elements: Level,

Quality,

Level relates to the choice of study design and is used as an indicator to which bias has been eliminated by design.

Quality refers to the methods used by the investigators to minimize bias within the study design.

Statistical Precision refers to the Pvalue or the confidence interval.

The distance of the estimated treatment effect from the null value and the inclusion of clinically important effects in the confidence interval.

The usefulness of the evidence in clinical practice,

particularly the appropriateness of the outcome measures used.

These criteria appear to be equally valid for software engineering evidence.

The lifecycle issue The other major difference between software engineering and medicine is that most software engineering techniques impact a part of the lifecycle in a

way that makes the individual effect of a technique difficult to isolate: • They interact with many other development techniques and procedures.

For example a design method depends on a preceding requirements analysis.

It must consider constraints imposed by the software and hardware platform and programming languages,

It must be integrated with appropriate coding and testing techniques.

Thus,

it would be difficult to confirm that a design technique had a significant impact on final product reliability.

In general,

it is difficult to determine a causal link between a particular technique and a desired project outcome when the application of the technique and the final outcome are temporally removed from one another,

and there are many other tasks and activities that could also affect the final outcome.

• The immediate outcomes of a software engineering technique will not necessarily have a strong relationship with final project outcomes.

if you are interested in the effect design techniques have on application reliability (i.e.

probability of failure in a given time period under defined operational conditions),

measures of the design product (or design process) have no obvious relationship with the desired outcome.

There are no good surrogate measures of product reliability that can be measured at the end of the design process.

There seem to be two major approaches to this issue: 1.

We can experiment with individual techniques isolated from o