<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0717-5000</journal-id>
<journal-title><![CDATA[CLEI Electronic Journal]]></journal-title>
<abbrev-journal-title><![CDATA[CLEIej]]></abbrev-journal-title>
<issn>0717-5000</issn>
<publisher>
<publisher-name><![CDATA[Centro Latinoamericano de Estudios en Informática]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0717-50002015000100005</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Simulation Based Studies in Software Engineering: A Matter of Validity]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Bernard Nicolau de França]]></surname>
<given-names><![CDATA[Breno]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Horta Travassos]]></surname>
<given-names><![CDATA[Guilherme]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidade Federal do Rio de Janeiro COPPE ]]></institution>
<addr-line><![CDATA[Rio de Janeiro ]]></addr-line>
<country>Brasil</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>04</month>
<year>2015</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>04</month>
<year>2015</year>
</pub-date>
<volume>18</volume>
<numero>1</numero>
<fpage>5</fpage>
<lpage>5</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_arttext&amp;pid=S0717-50002015000100005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_abstract&amp;pid=S0717-50002015000100005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_pdf&amp;pid=S0717-50002015000100005&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[CONTEXT: Despite the possible lack of validity when compared with other science areas, Simulation-Based Studies (SBS) in Software Engineering (SE) have supported the achievement of some results in the field. However, as it happens with any other sort of experimental study, it is important to identify and deal with threats to validity aiming at increasing their strength and reinforcing results confidence. OBJECTIVE: To identify potential threats to SBS validity in SE and suggest ways to mitigate them. METHOD: To apply qualitative analysis in a dataset resulted from the aggregation of data from a quasi-systematic literature review combined with ad-hoc surveyed information regarding other science areas. RESULTS: The analysis of data extracted from 15 technical papers allowed the identification and classification of 28 different threats to validity concerned with SBS in SE according Cook and Campbell’s categories. Besides, 12 verification and validation procedures applicable to SBS were also analyzed and organized due to their ability to detect these threats to validity. These results were used to make available an improved set of guidelines regarding the planning and reporting of SBS in SE. CONCLUSIONS: Simulation based studies add different threats to validity when compared with traditional studies. They are not well observed and therefore, it is not easy to identify and mitigate all of them without explicit guidance, as the one depicted in this paper.]]></p></abstract>
<abstract abstract-type="short" xml:lang="pt"><p><![CDATA[CONTEXTO: Apesar da possível falta de validade quando comparado com outras áreas da ciência, Estudos Baseados em Simulação (EBS) em Engenharia de Software (ES) têm apoiado a geração de resultados na área. Entretanto, como ocorre em quaisquer outros tipos de estudos experimentais, é importante identificar e tratar as ameaças à validade visando aumentar a qualidade da evidência e reforçar a confiança nos resultados. OBJETIVO: Identificar potenciais ameaças à validade de EBS em Engenharia de Software e sugerir formas de mitigá-las. MÉTODO: Aplicar técnicas de análise qualitativa em um conjunto de dados obtidos a partir da agregação dos resultados de uma quasi-revisão sistemática da literatura juntamente com informações coletadas por meio de uma pesquisa ad-hoc envolvendo outras áreas de pesquisa. RESULTADOS: A análise dos dados extraídos de 15 artigos permitiu a identificação e classificação de 28 diferentes ameaças à validade relacionadas a EBS em ES de acordo com as categorias de Cook e Campbell. Ainda, doze procedimentos para verificação e validação aplicáveis a EBS foram também analisados e organizadas em função da sua habilidade de identificar estas ameaças à validade. Tais resultados foram utilizados para disponibilizar um novo conjunto de diretrizes para o planejamento e relato de EB em ES. CONCLUSÕES: Estudos Baseados em Simulação acrescentam diferentes ameaças à validade quando comparados aos estudos tradicionais. Estas ameaças não são tratadas na literatura técnica existente e, portanto, não é trivial identificar e mitigar todas elas sem orientação explícita, como a apresentada neste artigo.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Simulation-based studies]]></kwd>
<kwd lng="en"><![CDATA[simulation models]]></kwd>
<kwd lng="en"><![CDATA[threats to validity]]></kwd>
<kwd lng="pt"><![CDATA[Estudos Baseados em Simulação]]></kwd>
<kwd lng="pt"><![CDATA[modelos de simulação]]></kwd>
<kwd lng="pt"><![CDATA[ameaças à validade]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p lang="en-US" class="western" align="center" style="margin-bottom: 0.5cm; font-variant: normal; font-style: normal; line-height: 100%; orphans: 0; widows: 0"><a name="_GoBack"></a> <font face="Verdana, sans-serif"><font size="4" style="font-size: 16pt"><b><font size="4" style="font-size: 14pt">Simulation Based Studies in Software Engineering: </font></b></font></font> </p>     <p lang="es-ES" class="western" align="center" style="margin-bottom: 0.5cm; font-variant: normal; font-style: normal; line-height: 100%; orphans: 2; widows: 2"> <font face="Verdana, sans-serif"><font size="4" style="font-size: 14pt"><b>A Matter of Validity</b></font></font></p>     <p lang="es-ES" class="western" align="center" style="margin-bottom: 0.5cm; line-height: 100%; orphans: 2; widows: 2"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span style="font-style: normal"><b>Breno Bernard Nicolau de Fran&ccedil;a, Guilherme Horta Travassos</b></span></font></font></span></p>     <p lang="es-ES" class="western" align="center" style="margin-bottom: 0.5cm; font-variant: normal; font-style: normal; line-height: 100%; orphans: 2; widows: 2"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Universidade Federal do Rio de Janeiro, COPPE,</font></font></p>     <p lang="es-ES" class="western" align="center" style="margin-bottom: 0.5cm; font-variant: normal; font-style: normal; line-height: 100%; orphans: 2; widows: 2"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Rio de Janeiro, Brasil, </font></font> </p>     <p lang="es-ES" class="western" align="center" style="margin-bottom: 0.5cm; line-height: 100%; orphans: 2; widows: 2"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><i>{<a class="western" href="mailto:bframca@cos.ufrj.br">bfranca</a>, <a class="western" href="mailto:ght@cos.ufrj.br">ght</a>}@cos.ufrj.br</i></font></font></p>     <p lang="en-US" class="western" align="left" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-top: 0.42cm; margin-bottom: 0.21cm; line-height: 100%; page-break-inside: avoid; orphans: 0; widows: 0; page-break-after: avoid"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>Abstract</b></font></font></p>     <p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">CONTEXT: Despite the possible lack of validity when compared with other science areas, Simulation-Based Studies (SBS) in Software Engineering (SE) have supported the achievement of some results in the field. However, as it happens with any other sort of experimental study, it is important to identify and deal with threats to validity aiming at increasing their strength and reinforcing results confidence. OBJECTIVE: To identify potential threats to SBS validity in SE and suggest ways to mitigate them. METHOD: To apply qualitative analysis in a dataset resulted from the aggregation of data from a </span><span lang="en-US"><i>quasi</i></span><span lang="en-US">-systematic literature review combined with </span><span lang="en-US"><i>ad-hoc</i></span> <span lang="en-US">surveyed information regarding other science areas. RESULTS: The analysis of data extracted from 15 technical papers allowed the identification and classification of 28 different threats to validity concerned with SBS in SE according Cook and Campbell&rsquo;s categories. Besides, 12 verification and validation procedures applicable to SBS were also analyzed and organized due to their ability to detect these threats to validity. These results were used to make available an improved set of guidelines regarding the planning and reporting of SBS in SE. CONCLUSIONS: Simulation based studies add different threats to validity when compared with traditional studies. They are not well observed and therefore, it is not easy to identify and mitigate all of them without explicit guidance, as the one depicted in this paper. </span></font></font> </p>     <p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Abstract (Portuguese)</b></span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">CONTEXTO: Apesar da poss&iacute;vel falta de validade quando comparado com outras &aacute;reas da ci&ecirc;ncia, Estudos Baseados em Simula&ccedil;&atilde;o (EBS) em Engenharia de Software (ES) t&ecirc;m apoiado a gera&ccedil;&atilde;o de resultados na &aacute;rea. Entretanto, como ocorre em quaisquer outros tipos de estudos experimentais, &eacute; importante identificar e tratar as amea&ccedil;as &agrave; validade visando aumentar a qualidade da evid&ecirc;ncia e refor&ccedil;ar a confian&ccedil;a nos resultados. OBJETIVO: Identificar potenciais amea&ccedil;as &agrave; validade de EBS em Engenharia de Software e sugerir formas de mitig&aacute;-las. M&Eacute;TODO: Aplicar t&eacute;cnicas de an&aacute;lise qualitativa em um conjunto de dados obtidos a partir da agrega&ccedil;&atilde;o dos resultados de uma quasi-revis&atilde;o sistem&aacute;tica da literatura juntamente com informa&ccedil;&otilde;es coletadas por meio de uma pesquisa ad-hoc envolvendo outras &aacute;reas de pesquisa. RESULTADOS: A an&aacute;lise dos dados extra&iacute;dos de 15 artigos permitiu a identifica&ccedil;&atilde;o e classifica&ccedil;&atilde;o de 28 diferentes amea&ccedil;as &agrave; validade relacionadas a EBS em ES de acordo com as categorias de Cook e Campbell. Ainda, doze procedimentos para verifica&ccedil;&atilde;o e valida&ccedil;&atilde;o aplic&aacute;veis a EBS foram tamb&eacute;m analisados e organizadas em fun&ccedil;&atilde;o da sua habilidade de identificar estas amea&ccedil;as &agrave; validade. Tais resultados foram utilizados para disponibilizar um novo conjunto de diretrizes para o planejamento e relato de EB em ES. CONCLUS&Otilde;ES: Estudos Baseados em Simula&ccedil;&atilde;o acrescentam diferentes amea&ccedil;as &agrave; validade quando comparados aos estudos tradicionais. Estas amea&ccedil;as n&atilde;o s&atilde;o tratadas na literatura t&eacute;cnica existente e, portanto, n&atilde;o &eacute; trivial identificar e mitigar todas elas sem orienta&ccedil;&atilde;o expl&iacute;cita, como a apresentada neste artigo.</span></font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Keywords: Simulation-based studies, simulation models, threats to validity.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Palavras-chave: Estudos Baseados em Simula&ccedil;&atilde;o, modelos de simula&ccedil;&atilde;o, amea&ccedil;as &agrave; validade.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="margin-left: 1.59cm; margin-right: 1.59cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Received: 2014-07-29 Revised: 2015-02-26 Accepted: 2015-02-26</span></font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>1 Introduction</b></span></font></font></span></h1>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Simulation-Based Studies (SBS) consist of a series of activities aiming at observing a phenomenon instrumented by a simulation model. Thomke <a id="br1">[</a><a href="#r1">1</a>] reported the adoption of this sort of study as an alternative strategy to support experimentation in different areas, such as automotive industry and drugs development. Criminology is another field where researches have taken place with the support of SBS <a id="br2">[</a><a href="#r2">2</a>]. </font></font> </p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">In the direction of these potential benefits, Software Engineering (SE) community has also presented some initiatives in using SBS to support investigation in the field. Indeed, apart from some interesting results, the SBS presented in the context of SE <a id="br3">[</a><a href="#r3">3</a>] allowed us to observe its initial maturity stage when compared with SBS concerned with the aforementioned areas. Lack of research protocols, </span><span lang="en-US"><i>ad-hoc</i></span> <span lang="en-US">experimental designs and output analysis, missing relevant information in the reports are some examples of observed issues into this context. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Based on the findings of our previous review <a id="br3">[</a><a href="#r3">3</a>] and on existing Empirical Software Engineering (ESE) guidelines for other investigation strategies, such as case studies and experiments, and simulation guidelines from other research areas, we proposed a preliminary set of guidelines aiming at providing guidance to researchers when reporting SBS into the SE context <a id="br4">[</a><a href="#r4">4</a>]. Later, we performed a first assessment of this set of reporting guidelines based on the approach presented in <a id="br5">[</a><a href="#r5">5</a>]. As a result, these guidelines have evolved to comprehend planning issues such as the problem, goal, context and scope definitions; model description and validation; experimental design and output analysis issues; the supporting environment and tools; and reporting issues such as background knowledge and related works, applicability of results, conclusions and future works. </font></font> </p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">One expected contribution of the guidelines&rsquo; application is the identification of potential threats to validity that may bias a SBS in software engineering. The identification of threats and their mitigation from the initial problem and goals definition to the output analysis is one of guideline's concerns, reducing the risks of misinterpreted results. However, in order to perceive such benefits we believe it can be worth organizing a body of knowledge concerned with threats to validity already reported by the SE community when performing SBS. In addition, it is also important to depict the differences between common threats to validity (as those usually observed at </span><span lang="en-US"><i>in vivo</i></span> <span lang="en-US">and </span><span lang="en-US"><i>in vitro </i></span><span lang="en-US">studies), and highlight those ones specifically identified at </span><span lang="la-VA"><i>in virtuo</i></span> <span lang="en-US">and </span><span lang="la-VA"><i>in silico</i></span> <span lang="en-US">studies. Therefore, we have conducted a secondary analysis of the data collected in <a id="br3">[</a><a href="#r3">3</a>] under the perspective of potential threats to validity found in SBS, which we are now presenting in this paper. As far as we are aware, there is no other work like this into the context of Experimental Software Engineering involving SBS. Such threats to validity compose the body of knowledge, organized as the new version of the proposed guidelines. Additionally, we have related these threats to Verification and Validation (V&amp;V) procedures for simulation models previously identified in the technical literature in order to illustrate how to deal with such threats in SBS. Finally, we deliver some recommendations for using such body of knowledge when planning and reporting SBS, which also are going to compose a bigger set of guidelines (in progress). </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The remaining sections of this methodological paper are organized as follows. Section 2 presents the background for our research. Section 3 presents the adopted research methodology. Section 4 presents the threats to validity identified through a qualitative analysis performed on a set of SBS, both in the SE technical literature and papers from other areas discussing this subject. Section 5 presents a list of technical V&amp;V procedures applicable for simulation models. Section 6 presents the analysis on how the threats and the V&amp;V procedure relate in order to provide more reliable SBS and deliver some recommendations in this sense. Finally, section 7 presents the final remarks and the way ahead.</font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>2 Background</b></span></font></font></span></h1>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">The work presented in this paper comprehends a broader effort in trying to organize a body of knowledge regarding SBS in the context of SE. Apart from earlier motivations and previous work on simulation models <a id="br6">[</a><a href="#r6">6</a>]<a id="br7">[</a><a href="#r7">7</a>], we undertook a systematic literature review aiming at characterizing how different simulation approaches have been used in SE studies Error: no se encontr&oacute; el origen de la referencia following the guidelines proposed by <a id="br8">[</a><a href="#r8">8</a>] and adopting the PICO <a id="br9">[</a><a href="#r9">9</a>] strategy to structure the search string. In this review, population, intervention and outcome dimensions are considered to support the answer of the research question. The comparison dimension is not used because, as far as we are aware, there is no baseline to allow comparison. Therefore, this secondary study is classified as </span><span lang="en-US"><i>quasi</i></span><span lang="en-US">-systematic literature review Error: no se encontr&oacute; el origen de la referencia. The search string had been calibrated by using nine control papers, previously identified through an ad-hoc review.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">We searched for simulation-based studies in SE (population), using simulation models as instruments based on different simulation approaches (intervention). From them, we expected to obtain characteristics (outcome) from both the simulation models and studies in which they were used as instruments.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">This way, we applied the search string in three digital search engines due the high coverage they usually offer: Scopus, EI Compendex and Web of Science. After applying the selection criteria by reading the titles and abstracts, we selected 108 studies including two other secondary studies regarding SBS. So, our inclusion criteria encompassed only papers available in the Web; written in English; discussing simulation-based studies; belonging to a Software Engineering domain; and mentioning one or more simulation models. Papers not meeting one of these criteria were excluded.</font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">The information extracted from these research papers included the simulation approach, model purpose and characteristics, tool support, the Software Engineering domain, verification and validation procedures used to evaluate the simulation model, the study purpose and strategy (controlled experiment, case study, among others), output analysis procedure and instruments, and main results including applicability of the approach and accuracy of results. Such information has been organized with the </span><span lang="en-US"><i>JabRef</i></span> <span lang="en-US">tool <a id="br10">[</a><a href="#r10">10</a>].</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">After full papers reading, from the selected 108 relevant research papers, only 57 SBS were found, distributed over 43 research papers. The remaining papers rely on simulation model proposals. In other words, it was not possible to identify an investigation context, with a well-defined problem and research questions for them. A quality assessment took place and the main criteria regarded the existence or not of relevant information in the reports. The overall quality assessment indicated a poor quality of reports regarding SBS due the lack of relevant information such as research goals, study procedures and strategy, and the experimental design.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">We identified a number of issues regarding reporting concerns, which led us to propose a set of reporting guidelines for simulation studies in SE Error: no se encontr&oacute; el origen de la referencia. Besides, we also observed issues regarding the methodological aspects involving the lack of (1) definition of research protocols for SBS, since aspects of research planning are usually overlooked when performing SBS; (2) proposals and application of V&amp;V procedures for simulation models (Ahmed et al <a id="br11">[</a><a href="#r11">11</a>] also mention this topic in a survey involving modeling and simulation practitioners regarding software processes); (3) analysis and mitigation of threats to validity in SBS, which is strongly related to the validity of SE simulation models;(4) definition of criteria for quality assessment of SBS and the type of evidence we can acquire from it; (5) replication in simulation-based studies, given the absence of relevant information on studies reports.</font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Given these methodological issues and challenges, we moved forward SBS planning needs for research protocols by starting with the basics, like the context, problem, and research goals and questions definition. However, as we advanced, some issues about how to deal with the model validity and potential threats to validity in simulation experiments came up. So, it made clear to us the need to identify the main and recurrent threats to validity in simulation-based studies and to understand how such threats can be mitigated. For that, we primarily based our search in the outcomes from the </span><span lang="en-US"><i>quasi-</i></span><span lang="en-US">systematic literature review. However, as previously observed in Error: no se encontr&oacute; el origen de la referencia, the terminology is not consensual and authors in this field rarely discuss threats to validity using terms such as &ldquo;threats to validity&rdquo; or related ones. Thus, we decided to apply a systematic approach to handle the threats descriptions under the same perspective. For that, we adopted the qualitative procedures that are going to be described in the next section.</span></font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>3 Research Methodology</b></span></font></font></span></h1>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">During the execution of the </span><span lang="en-US"><i>quasi</i></span><span lang="en-US">-systematic literature review <a id="br3">[</a><a href="#r3">3</a>], our interests were in characterizing simulation models and how SE researchers and/or practitioners use to organize, execute and report SBS. So, there was no focus on threats to validity at that moment.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Systematic Literature Reviews (SLR) appeared in Software Engineering in the early 2000s, inspired on the Evidence-Based paradigm <a id="br35">[</a><a href="#r35">35</a>]. Earlier works on this topic used to name all reviews performed with some systematic process as Systematic Reviews. However, many of them did not follow specific fundamental aspects or characteristics usually expected in systematic reviews, such as comparison among the outcomes w.r.t. their quality and possibilities of synthesis or aggregation. In this context, the term </span><span lang="en-US"><i>quasi</i></span><span lang="en-US">-systematic review <a id="br34">[</a><a href="#r34">34</a>] appeared as a definition for reviews following SLR guidelines, but not covering at least one aspect, which is the case for our review. So, the &ldquo;</span><span lang="en-US"><i>quasi</i></span>&rdquo; <span lang="en-US">term stands for the unfeasibility of comparing outcomes due to lack of knowledge on the field or specific domain of investigation, also limiting the definition of quality profile for the available evidence, based on a hierarchy for evidence in Software Engineering.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">After analyzing the content of 57 SBS, we proposed a preliminary set of reporting guidelines for SBS in Software Engineering, with the purpose of orientating researchers in such simulation studies activities <a id="br4">[</a><a href="#r4">4</a>]. In addition, we expect these guidelines can help researchers to identify (a priori) potential threats to the study validity. For that, in the current paper, we performed a secondary analysis over the 57 studies (distributed over 43 research papers), making use of some qualitative approach's procedures, namely the Constant Comparison Method <a id="br13">[</a><a href="#r13">13</a>] to identify common threats of validity across the studies. Additionally, we performed an additional </span><span lang="en-US"><i>ad-hoc</i></span> <span lang="en-US">review in order to identify whether other research areas outside SE have already discussed threats to validity in SBS, since we perceived the necessity for additional sources due to the terminology in SE simulation studies rarely refer to threats to validity using such terminology. In this opportunity, we identified and included in our analysis two research papers <a id="br2">[</a><a href="#r2">2</a>]<a id="br26">[</a><a href="#r26">26</a>] discussing threats to simulation studies validity. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The Constant Comparison Method (CCM) <a id="br13">[</a><a href="#r13">13</a>] is represented by many procedures intercalating both the data collection and analysis to generate a theory emerging from such collected and analyzed data. It is important to note we have no ambition at this work in generating theories, but to use the analysis procedures from CCM to support the identification of threats to simulation studies validity.</font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Concepts are the basic unit of analysis in CCM. To identify concepts, the researcher needs to break down the data and to assign labels to it. The labels are constantly revisited in order to assure a consistent conceptualization. Such analytic process is called </span><span lang="en-US"><i>coding</i></span><span lang="en-US">, and it appears in the method in three different types: </span><span lang="en-US"><i>open coding</i></span><span lang="en-US">, </span><span lang="en-US"><i>axial coding</i></span> <span lang="en-US">and </span><span lang="en-US"><i>selective coding</i></span><span lang="en-US">.</span></font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Open coding </i></span><span lang="en-US">is the analytic process by which data is break down and conceptually labeled in codes. The codes may represent actions, events, properties, and so on. It makes the researcher to rethink about the collected data under different interpretations. In </span><span lang="en-US"><i>open coding</i></span><span lang="en-US">, the concepts are constantly compared to each other to find similarities and then grouped together to form categories. On a higher level of abstraction, in </span><span lang="en-US"><i>axial coding</i></span><span lang="en-US">, categories are associated to their subcategories and such relationships are tested against the collected data. This is also constantly done as new categories emerge. Finally, the </span><span lang="en-US"><i>selective coding</i></span> <span lang="en-US">consists in the unification of all categories around a central core category and other categories needing more explanation are filled with descriptive detail.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">For data collection, it was necessary to define an additional information extraction form, containing the study environment, whether </span><span lang="en-US"><i>in virtuo </i></span><span lang="en-US">or </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US"><a id="br12">[</a><a href="#r12">12</a>], and the potential threats description (identified in the research papers as limitations, assumptions or threats to validity). The environment is important since </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">contexts are supposed to be risky, mainly by the involvement of human subjects. </span></font></font> </p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">This way, first we extracted the threats to validity descriptions and grouped them by paper. Only 13 out of 43 research papers contain relevant information regarding threats to validity. For the two additional research papers concerned with threats to simulation studies, we performed the data collection intercalated with the analysis of the ones obtained through the </span><span lang="en-US"><i>quasi-</i></span><span lang="en-US">systematic literature review. Different from the SE studies, we observed a shared consistency between the terminology used in these papers and the current terminology as presented in Error: no se encontr&oacute; el origen de la referencia, leading us to constantly review back the adopted SE terminology and search for discussions where it is possible to recognize threats to validity, limitations or assumptions.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">After that, we performed an initial (open) coding, assigning concepts to chunks of the extracted text. So, for each new code, we compare to the other ones to understand whether it was or not about the same concept.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In <a href="#f1">Figure 1</a>, we present the example of two threats descriptions (A and B). </font></font> </p>     <p lang="en-US" class="western" align="center" style="margin-top: 0.18cm; margin-bottom: 0.78cm; line-height: 0.39cm; page-break-after: avoid"> <a name="f1"> <img src="/img/revistas/cleiej/v18n1/1a05f1.jpg"> </a>     <br> <font face="Times New Roman, serif"><font size="2" style="font-size: 9pt"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>Figure 1. </b></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span style="font-weight: normal">Open coding example, including repeated codes.</span></font></font></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In the right side of <a href="#f1">Figure 1</a> are the codes assigned to chunks of text describing relevant aspects of the threats. For both descriptions, there is a common code assigned &ldquo;Poorly defined constructs and metrics&rdquo;. This codes lead to a threat defined in the axial code (highlighted text bellow the text description). The main idea in this part of the analysis relates to the surrogate measures defined for the interested constructs not really representing the concepts under investigation.</font></font></p>     <p lang="es-ES" class="western" align="center" style="margin-top: 0.18cm; margin-bottom: 0.78cm; line-height: 0.39cm; page-break-after: avoid"> <a name="f2"> <img src="/img/revistas/cleiej/v18n1/1a05f2.jpg"> </a>     <br> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Figure 2. </b></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><span style="font-weight: normal">Example of axial coding.</span></span></font></font></p>     ]]></body>
<body><![CDATA[<p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Furthermore, we reviewed the codes and then started to establish relationships among codes through reasoning about the threat description to generate the categories, which are the threats to validity. This way, each reasoning is written as a threat to validity, which the category represents the name of the threat in the next section. For instance, in <a href="#f2">Figure 2</a>, we present an example of an emerged code from the interaction of three other codes.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">For the threat presented in <a href="#f2">Figure 2</a>, the inconclusive results for software development and the use of the model as object of study limit the results to the model by itself, not allowing extrapolating behaviors from the model to explain the real phenomena. It shows one of the implications of not having information regarding the model validity.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Finally, we grouped these open codes into four major categories (axial coding), namely conclusion, internal, construct and external validity, based on the classification for threats to experimental validity proposed byError: no se encontr&oacute; el origen de la referencia, but that could be extended in case we understand that it was needed. We did not perform selective coding, since the main goal was to identify and categorize the threats to validity.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">This way, the main result of this secondary analysis is a list containing the potential threats to SBS validity, labeled using the grounded codes and organized according the classification proposed by Cook and Campbell, as presented in <a id="br14">[</a><a href="#r14">14</a>].</font></font></p>     <p lang="en-US" class="western" align="left" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Additionally, we performed an analysis by matching threats to validity and V&amp;V procedures for simulation models. The bases for the matching analysis are both the input and focus of each V&amp;V procedure and threat to validity. The goal of such analysis is to identify whether the procedures can fully prevent from threats occurrences. Finally, deliver some recommendations on how to avoid them, all grounded on the findings of the systematic review and additional information collected from the literature on Simulation.</font></font></p> <h1 lang="en-US" class="western" align="justify" style="margin-bottom: 0.21cm; font-variant: normal"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>4 Threats to Simulation Studies Validity</b></font></font></h1>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">The identified threats to validity are organized according to the classification presented in <a id="br14">[</a><a href="#r14">14</a>], in the following subsections. The title (in bold) for each threat to validity reflects the generated codes (categories) in the qualitative analysis. It is important to notice that we did not analyze threats of validity for each study, but only collected the reported ones. Indeed, it is possible to observe other potential threats to validity in each study, but we decided not to judge them based on the research paper only. For sake of avoiding repeating threats already discussed in others Experimental Software Engineering forums, we will concentrate on threats more related to </span><span lang="en-US"><i>in virtuo </i></span><span lang="en-US">and </span><span lang="en-US"><i>in silico </i></span><span lang="en-US">studies and not discussed on SE papers yet.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">From the 28 identified threats to validity, we can distribute them into the subsets of conclusion validity (four), internal validity (ten), construct validity (ten) and external validity (four). We have not found nor were able to classify any threat to a different subset. The SE technical literature has already discussed most of the identified threats to validity regarding </span><span lang="en-US"><i>in virtuo </i></span><span lang="en-US">studies, which strongly relates to the presence of human subjects &ldquo;disturbing in some sense&rdquo; the study. The expression &ldquo;disturbing in some sense&rdquo; concerns with the not controllable aspects of human behavior that we typically address in internal validity issues. On the other hand, threats to </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">experiments concentrate more on construct validity. This way, one may be tempted to point out this perspective as more critical. However, some threats can be more severe depending on the simulation goals.</span></font></font></p> <h2 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">4.1 Conclusion Validity</span></font></font></h2>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">This validity refers to the statistical confirmation (significance) of a relationship between the treatment and the outcome, in order to draw correct conclusions about such relations. Threats to conclusion validity involve the use of inappropriate instruments and assumptions to perform the simulation output analysis, such as wrong statistical tests, number of required scenarios and runs, independence between factors, among others. For instance, stochastic simulations always deal with pseudo-random components representing uncertainty of elements or behaviors of the real world. Therefore, experimenters need to verify whether the model is able to reproduce such behavior across and within simulation scenarios due to the actual model configuration or caused by internal and natural variation. The main threats to conclusion validity identified in SBS are:</font></font></p> <ul> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Considering 	only one observation when dealing with stochastic simulation</b></span><span lang="en-US">, 	</span><span lang="en-US"><b>rather than central tendency and 	dispersion measures</b></span> <span lang="en-US"><a id="br2">[</a><a href="#r2">2</a>]: different 	from the threats previously mentioned, we observed it into </span><span lang="en-US"><i>in 	silico</i></span> <span lang="en-US">context, where the whole 	experiment happens into the computer environment: the simulation 	model. It involves the use of a single run or measure to draw 	conclusions about a stochastic behavior. Given such nature, it has 	some intrinsic variation that may bias the results if not properly 	analyzed. We present an example of this threat from <a id="br2">[</a><a href="#r2">2</a>], where the 	authors say, &ldquo;</span><span lang="en-US"><i>If the simulation 	contains a stochastic process, then the outcome of each run is a 	single realization of a distribution of outcomes for one set of 	parameter values. Consequently, a single outcome could reflect the 	stochastic process, rather than the theoretical processes under 	study. To be sure that the outcome observed is due to the process, 	descriptive statistics are used to show the central tendency and 	dispersion of many runs</i></span>&rdquo;<span lang="en-US">.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Not 	using statistics when comparing simulated to empirical distributions</b></span> 	<span lang="en-US"><a id="br2">[</a><a href="#r2">2</a>]: also observed into the </span><span lang="en-US"><i>in 	silico</i></span> <span lang="en-US">context, this threat involves 	the use of inappropriate procedures for output analysis. It should 	be avoided comparing single values from simulated to empirical 	outcomes. It is recommended to use proper statistical tests or 	measures to compare distributions with a certain level of 	confidence.</span></font></font></p>     ]]></body>
<body><![CDATA[</ul>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">We also observed other threats to conclusion validity at </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">environments, for instance, a small population sample hampering the application of statistical tests <a id="br16">[</a><a href="#r16">16</a>], which is similar to the one mentioned by Wohlin et al <a id="br14">[</a><a href="#r14">14</a>] as &ldquo;Low statistical power&rdquo;. Besides, we identified the uneven outcome distribution (high variance) due to purely random subjects assignment <a id="br16">[</a><a href="#r16">16</a><a id="br17">-</a><a href="#r17">17</a>], which is mentioned in <a id="br14">[</a><a href="#r14">14</a>] as &ldquo;Random heterogeneity of subjects&rdquo;.</span></font></font></p> <h2 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">4.2 Internal Validity</span></font></font></h2>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">This validity refers to the assurance that the treatment causes the outcome, rather than an uncontrolled external factor, i.e., avoid the indication of a false relationship between treatment and outcome when there is none. As the experimental setting in SBS often relies on different input parameters configurations, the uncontrolled factors may be unreliable supporting data, human subjects manipulating the model when performing </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">experiments or bias introduction by the simulation model itself. Events or situations that may impose threats in these inputs are to skip data collection procedures or to aggregate different context data, not giving an adequate training for subjects or lacking knowledge regarding the simulated phenomenon, and the lack of explanation for the phenomenon occurrence, respectively. Thus, the main internal validity identified threats in SBS are:</span></font></font></p> <ul> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Inappropriate 	experimental design (missing factors) <a name="br16">[</a><a href="#r16">16</a><a name="br19">-</a><a href="#r19">19</a>]: </b></span><span lang="en-US">apart 	from disturbing factors, the experimental design plays an important 	role on the definition of which variables (both </span><span lang="en-US"><i>in 	virtuo</i></span> <span lang="en-US">and </span><span lang="en-US"><i>in 	silico</i></span> <span lang="en-US">experiments) are relevant to 	answer the research questions. We observed this threat occurring 	only into </span><span lang="la-VA"><i>in virtuo</i></span> <span lang="en-US">context, 	all of them from replications of the same research protocol, 	regarding to unexpected factors related to human subjects 	manipulating the simulation models. It is not common to miss factors 	on </span><span lang="la-VA"><i>in silico</i></span> <span lang="en-US">studies, 	especially in SE simulations where models are mainly limited in 	number or input parameters. However, it is important to be caution 	when dropping out factors to simplify the experimental design, as in 	fractional factorial designs.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Simulation 	model simplifications (assumptions) forcing the desired outcomes 	<a name="br2">[</a><a href="#r2">2</a><a name="br20">,</a><a href="#r20">20</a><a name="br21">,</a><a href="#r21">21</a><a name="br22">,</a><a href="#r22">22</a><a name="br23">,</a><a href="#r23">23</a><a name="br24">,</a><a href="#r24">24</a></b></span><span lang="en-US">]: this is the most 	recurrent threat reported in the analyzed papers. Always identified 	into the </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">context, 	it is concerned with the simulation model itself. In this threat, 	the simulation model contains assumptions implemented in a way that 	they impact directly on the response variables. Or establishing the 	intended behavior or hypothesis as truth directly from the input to 	output variables, or giving no chance to alternative results to 	occur. For instance, in one of the six studies we observed this 	threat (reported as an assumption) the authors say </span></font></font> 	</p>     </ul>     <p lang="es-ES" class="western" align="justify" style="margin-left: 2.5cm; margin-bottom: 0.21cm; line-height: 100%"> &ldquo;<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>In order to introduce the Test-First Development practice into the FLOSS simulation model, we make the following assumptions: (1) The average time needed to write a line of production code increases; (2) The number of defects injected during coding activities decreases; (3) The debugging time to fix a single defect decreases</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES">&rdquo;</span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="margin-left: 1.27cm; margin-bottom: 0.21cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In this case, it is possible to observe that the hypotheses (or beliefs) that Test-First Development productivity for coding decreases, the quality increases, and the maintenance time decreases are directly introduced in the model as assumptions. It goes in the wrong direction of SBS, where there is a theory with a defined mechanism that explains a phenomenon, i.e., how these interactions between variables occur. In such case, there is no room for simulation, since the outcomes are predictable without run the simulations. Such a black box (without mechanisms) approach is the typical situation where in vitro experiments are more applicable.</font></font></p> <ul> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Different 	datasets (context) for model calibration and experimentation <a id="br25">[</a><a href="#r25">25</a>]: 	</b></span><span lang="en-US">it is difficult to realize how 	external or disturbing factors may influence a controlled computer 	environment (</span><span lang="en-US"><i>in silico</i></span><span lang="en-US">). 	Nevertheless, the supporting dataset, often required by the 	simulation models, may disturb the results whether data from 	different contexts have been compared. This is the case when 	calibrating the simulation model with a specific dataset, reflecting 	the context of a particular project, product, or organization and 	using the same calibration to run experiments for another 	(different) context. For example, try to use cross-company data to 	simulate the behavior of a specific company.</span></font></font></p>     </ul>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">We also observed other seven threats to internal validity, regarding </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">studies, similar to the ones already mentioned in <a id="br14">[</a><a href="#r14">14</a>]. It is the case of lack of SE knowledge hiding possible implications due to unknown disturbing factors <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>], insufficient time to subjects&rsquo; familiarization with the simulation tool and premature stage of the simulation tool (instrumentation effect) <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>]. Also, non-random subjects&rsquo; dropout after the treatment application (mortality) <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>], different number of simulation scenarios (instruments) for each treatment <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>] and available time to their performing <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>], maturation effect by the application of same test both before and after treatments <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>] and different level of expertise required by the instruments for both control and treatments groups (instrumentation effect) <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>].</span></font></font></p> <h2 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">4.3 Construct Validity</span></font></font></h2>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">This validity refers to the assurance that experimental setting (simulation model variables) correctly represents the theoretical concepts (constructs), mostly observed into the </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">context, where the simulation model plays the main role in the study. Threats to construct validity may occur due to the lack of model variables precision and relationships definition (and their respective equations), representing human properties, software products or processes, so the collected measures do not actually represent the desired characteristics. Davis et al <a id="br26">[</a><a href="#r26">26</a>] claim that the nature of simulation models tends to improve construct validity, since it requires formally defined constructs (and their measurement) and algorithmic representation logic for the theoretical mechanism, which explains the phenomenon under investigation. However, we could observe some threats to construct validity into the context of SBS, which are:</span></font></font></p> <ul> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Naturally 	different treatments (unfair) comparison <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>]</b></span><span lang="en-US">: 	this happens when comparing simulation models to any other kind of 	model not only in terms of their output variables, but also in 	nature, like analytic models. We observed this threat occurring only 	into </span><span lang="la-VA"><i>in virtuo</i></span> <span lang="en-US">context, 	all of them from replications of the same research protocol.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Inappropriate 	application of simulation <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>]</b></span><span lang="en-US">: in 	the </span><span lang="la-VA"><i>in virtuo</i></span> <span lang="en-US">context, 	it is possible to identify situations where the model building can 	be more effective than the model usage, considering that SBS 	involves both parts. We observed this threat occurring only into </span><span lang="la-VA"><i>in 	virtuo</i></span> <span lang="en-US">context, all of them from 	replications of the same research protocol.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Inappropriate 	cause-effect relationships definition <a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: 	this threat is associated to the proper implementation of the causal 	relationships between simulation model constructs explaining the 	mechanism under study.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Inappropriate 	real-world representation by model parameters <a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: 	the choice of input parameters should reflect real-world situations, 	assuming suitable values that can be observed in practice and are 	worthy for the analysis.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Inappropriate 	model calibration data and procedure <a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: 	it involves, as the previous one, data used to perform the study, 	mainly to instantiate the simulation model, i.e., to calibrate the 	model using data from the corresponding real world. It may cause 	unrealistic distributions or equations, scaling the effects up or 	down.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Hidden 	underlying model assumptions <a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: if 	assumptions are not explicit in model description, results may be 	misinterpreted or bias the conclusions, and may not be possible to 	judge at what extent they correspond to the actual phenomena. </span></font></font> 	</p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Invalid 	assumptions regarding the model concepts <a id="br27">[</a><a href="#r27">27</a>]</b></span><span lang="en-US">: 	this threat regards to the validity of the assumptions made in the 	model development. Once they are invalid, the conclusions may also 	be corrupted. Every assumption made on a simulation model must be 	checked later, it is not an adequate &ldquo;device&rdquo; by which 	one can reduce model complexity or scope.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>The 	simulation model does not capture the corresponding real world 	building blocks and elements <a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: 	this threat concerns with model compliance with real world 	constructs and phenomenon representation. If there is no evidence of 	theoretical mechanism&rsquo;s face validity, it is possible that the 	simulation model has been producing right outcomes, through wrong 	explanations.</span></font></font></p> 	<li/>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>The 	lack of evidence regarding model validity reduces the findings only 	to the simulation model <a id="br28">[</a><a href="#r28">28</a>]</b></span><span lang="en-US">: This 	threat regards to simulation studies where a simulation model is 	chosen without proper information about its validity. Therefore, no 	conclusion can be draw about the phenomenon, but only about the 	model itself. Hence, the simulation model plays the role of an 	object of study, rather than an instrument. As an example, the 	authors in <a id="br28">[</a><a href="#r28">28</a>] say: &ldquo;</span><span lang="en-US"><i>Though the 	experimentation described herein was originally undertaken with the 	idea that it might reveal something about the software production 	systems modeled, the results do not support conclusions about 	software development </i></span><span lang="en-US">[inconclusive 	results]</span><span lang="en-US"><i>. Therefore, we refrained from 	making inferences about software development and drew conclusions 	only about the models. Since our findings pertain only to the 	models, no particular level of model validation has been assumed 	</i></span><span lang="en-US">[lack of validity evidence].&rdquo;</span></font></font></p>     </ul>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">We can also identify inappropriate measurements for observed constructs in SBS <a id="br27">[</a><a href="#r27">27</a>]. Wohlin et al. <a id="br14">[</a><a href="#r14">14</a>] has already reported it as &ldquo;inadequate preoperational explication of constructs&rdquo;, and it was the only threat observed in both </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">and </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">contexts.</span></font></font></p> <h2 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">4.4 External Validity</span></font></font></h2>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">This validity involves the possibility of generalization of results outside the experimental settings scope. In simulation studies, it is particularly interesting to know if different simulation studies can reproduce similar results (called simulated external validity <a id="br2">[</a><a href="#r2">2</a>]) or it can predict real-world results (called empirical external validity <a id="br2">[</a><a href="#r2">2</a>]). For instance, a software process simulation model not being able to reproduce the results observed in one organization or not being able to obtain consistent results across different calibration datasets. Thus, the five identified (all concerned with the </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">context) threats to external validity are:</span></font></font></p> <ul> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Simulation 	results are context-dependent, since there is a need for calibration 	<a id="br20">[</a><a href="#r20">20</a>]</b></span><span lang="en-US">: simulation modeling involves the 	definition of both conceptual and executable models. Therefore, to 	run simulations, the model needs to be calibrated using data 	representing the context in which the experimenter will draw 	conclusions. Results are as general as the supporting data. In other 	words, simulation results are only applicable to the specific 	organization, project, or product data.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Simulation 	may not be generalizable to other same phenomena simulations <a id="br2">[</a><a href="#r2">2</a>]</b></span><span lang="en-US">: 	this threat refers to the emulation of a theoretical mechanism 	across different simulations. Such simulations may differ in terms 	of calibration and input parameters, but the results are only 	generalizable if they appear in such different settings. In other 	words, the mechanism has to explain the phenomenon under different 	configurations to achieve such external validity.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Simulation 	results differ from the outcomes of empirical observations <a id="br2">[</a><a href="#r2">2</a><a id="br20">,</a><a href="#r20">20</a>]</b></span><span lang="en-US">: 	when simulation outcomes sufficiently differ from empirical 	outcomes, we may say that simulated results have no external 	validity. One example of such threat in <a id="br20">[</a><a href="#r20">20</a>]: &ldquo;First, the 	results are only partly consistent with empirical evidence about the 	effects of performing V&amp;V activities. While code quality can 	always be improved by adding V&amp;V activities, it is not always 	true that adding V&amp;V activities in earlier development is better 	than adding them in later phases&rdquo;.</span></font></font></p> 	<li/>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm; line-height: 100%"> 	<font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>Simulation 	model not based on empirical evidence <a id="br26">[</a><a href="#r26">26</a><a id="br29">,</a><a href="#r29">29</a>]</b></span><span lang="en-US">: 	if the model constructs and propositions are all conjectural, i.e., 	with no ground in field studies or empirical experiments, integrally 	or partially, it is very important to invest effort on validation 	procedures, since the model itself cannot show any external validity 	<a id="br26">[</a><a href="#r26">26</a>].</span></font></font></p>     </ul> <h2 lang="en-US" class="western" align="justify" style="margin-bottom: 0.21cm"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">4.5 Lifecycle Perspective</font></font></h2>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">A different perspective for the discussed threats to validity in simulation studies can be assumed according to the lifecycle of SBS. Such studies are often <a id="br30">[</a><a href="#r30">30</a><a id="br31">-</a><a href="#r31">31</a>] organized in an iterative process (<a href="#f3">Figure 3</a>) comprehending the phenomenon observation (or data collection), the simulation model (conceptual and executable) development and validation, model experimentation (planning and execution of simulation experiments) and output analysis. Other activities may appear in specific processes, but these are the traditional ones.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="en-US" class="western" align="center" style="margin-top: 0.18cm; margin-bottom: 0.78cm; line-height: 0.39cm; page-break-after: avoid"> <a name="f3"> <img src="/img/revistas/cleiej/v18n1/1a05f3.jpg"> </a>     <br> <font face="Times New Roman, serif"><font size="2" style="font-size: 9pt"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>Figure 3. </b></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span style="font-weight: normal">Simulation studies lifecycle and threats to validity.</span></font></font></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">For instance, consider a software process simulation model (SPSM) aiming at identifying process bottlenecks that can compromise the project schedule and containing pseudo-random variables to define the probability of success for a certain verification activity (review or test), The likelihood of success is based on an empirical distribution of historical effectiveness and efficiency records of the applied verification technique. In case of any verification activity succeed (i.e., identify defects on the verified artifact) there will be a correction effort to be added. An experimental design for the analysis of how verification effectiveness and efficiency impact on the project schedule will require more than one single simulation run for each scenario, in order to capture the internal variation of verification success rate, and the output analysis have to use proper statistical instruments to perform comparisons among scenarios considering the multiple runs.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In general, the effects of any threat are perceived in the output analysis stage. However, some of them can be identified in steps before. </font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Threats to conclusion validity tend to show up at downstream activities, more specifically on experimental design and output analysis (see Figure 3). The statistical expertise plays an important role for their occurrence, since design of the simulation experiment and the output analysis are strongly related <a id="br32">[</a><a href="#r32">32</a>]. For the previously mentioned case, a threat to conclusion validity can be the use of a single run for each of two scenarios (using high and low success rates). This choice will not allow the experimenter to determine which scenario performs better, since the results depend on the amount of variation in the empirical distribution defining the pseudo-random variables.</font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Planning of simulation experiments may also impose threats to internal validity. This type of threat has many possible causes, and it may be avoided or identified at all stages of the lifecycle (see <a href="#f3">Figure 3</a>). Since data collected for calibration is one of the sources regarding threats to internal validity, the first stage of the simulation lifecycle should be performed systematically and with caution, allowing triangulation of data from different sources in order to assess data quality and validity. We observed threats to internal validity mainly into the </span><span lang="en-US"><i>in virtuo</i></span> <span lang="en-US">context, e.g., when human subjects pilot simulation models. Actually, all identified threats came from one single study protocol, which is replicated across different populations <a id="br16">[</a><a href="#r16">16</a><a id="br19">-</a><a href="#r19">19</a>]. In this case, it is clear that the experimental design impacts negatively on the study results, since every threat to internal validity is identified in all replications. Besides, different parts of the design have contributed to this scenario, from the selection of a software process simulator and an analytical model (COCOMO) to represent levels of the learning instrument factor, which are not comparable from a learning perspective, to several instrumentation effects, for instance the premature stage of the simulation tool.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">At </span><span lang="en-US"><i>in silico</i></span> <span lang="en-US">perspective, we can use our fictional SPSM to illustrate a potential threat to internal validity. Such threat regards the simulation model uses a historical dataset for its calibration, including the generation of the pseudo-random variables for verification effectiveness and efficiency, and consequently the generation of an executable model, but the simulation experiment uses data from a new project involving a new team with different background and expertise, for the input parameters. To compare such distinct contexts do not allow the determination of what is really causing the main effect, since the context may influence the outputs.</span></font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Threats to </span><span lang="en-US"><i>construct</i></span> <span lang="en-US">validity are all associated with the model development and validation activities (see <a href="#f3">Figure 3</a>). In this phase, the conceptualization of constructs into model variables and propositions in terms of relationships between variables represents the translation of observations to a simulation language. Such translation should be carefully performed, using as much as possible domain experts to verify the lack of important real world variables. As an example, we expose the threat to </span><span lang="en-US"><i>construct</i></span> <span lang="en-US">validity regarding the inappropriate observed constructs measurement in the SPSM case. This threat concerns the alignment of the measurement program with the simulation model development. The model variables, such as the verification effectiveness and efficiency, need to be associated to metrics defined in the measurement plan for the software projects under investigation. It supports that every model variable and relationship can be tracked to the collected data, also avoiding attempts to incorrectly tie a different surrogate metric for a model variable, under the risk of biasing or hiding contextual information in the output analysis.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Apart from the lack of empirical evidence to support the simulation model development, threats to external validity are difficult to be identified before output analysis (see <a href="#f3">Figure 3</a>). From the four threats we identified, three of them can be identified when analyzing the simulation output. On the other hand, if the model development is broken down into multiple iterations, the model developer can detect model increment that is inserting the unexpected behavior. For instance, if the fictional SPSM has been developed under multiple iterations, and in the second iteration the model does not replicate the reference behavior from the organization dataset, the second increment variables and relationships (or their equations) are assuming or implementing a wrong construct or relationship.</font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>5 Verification and Validation of Simulation Models</b></span></font></font></span></h1>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Among possible approaches to avoid the occurrence of the threats to validity mentioned in previous section, we have the procedures adopted to verify and validate the simulation model and the experimental design. It is reflection of the nature of computer-based controlled environment, where the simulation model execution enables observing the phenomenon under investigation. This way, the only possible changes are in the input data or the simulation model. Consequently, the validity aspects concentrate on both the simulation model and data validities. For the scope of this paper, we are considering mainly the issues regarding the model validity affecting the study validity. In addition, it is relevant to mention that we made no analysis regarding the possible interaction among these threats to validity, in the sense that mitigating one threat may impose on the occurrence of others. However, we believe that threats related to model validity, specifically those that can be mitigated by the use of V&amp;V procedures, do not present this sort of property, since these procedures when performed together increase the level of validity, having no impact in the results of applying any of them. Maybe other kind of threats, like the one caused by issues on the experimental design or supporting data may present side effects.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Since the SBS validity is highly affected by the simulation model validity, using a model that cannot be considered valid will bring invalid results, regardless the mitigation actions applied to deal with other possible validity threats. In other words, the simulation model itself represents the main threat to the study validity.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In <a id="br3">[</a><a href="#r3">3</a>], we identified nine verification and validation (V&amp;V) procedures applied to simulation models in the context of SE, in 52 different research papers (included in Appendix A <a id="br3">[</a><a href="#r3">3</a>]). Besides, we merged these procedures with the ones existing in <a id="br15">[</a><a href="#r15">15</a>], which are twelve V&amp;V procedures often performed for discrete-event simulation models in several domains. In fact, Sargent <a id="br15">[</a><a href="#r15">15</a>] presents fifteen procedures for V&amp;V. However, we understand that three of them are useful instruments to perform verification and validation activities, rather than procedures or techniques. These three procedures regard the use of animations to graphically display the model behavior, operational graphics to present values for the model variables and outputs, and traces of the simulation runs to describe the whole variables changing in every cycle. This way, <a href="#t1">Table 1</a> presents the merge from the remaining thirteen procedures with the ones identified in the systematic literature review. The merge process was based on the reasoning about the procedures&rsquo; descriptions, where some of them were grouped together.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The procedure &ldquo;Comparison to Other Models&rdquo; was identified in both the review and the list presented at <a id="br15">[</a><a href="#r15">15</a>]. Besides, we merged the software testing related procedures together in the procedure &ldquo;Testing structure and model behavior&rdquo;, where we grouped &ldquo;Degenerate Tests&rdquo; and &ldquo;Extreme Condition Tests&rdquo;, from <a id="br15">[</a><a href="#r15">15</a>].</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Face validity is an expert-based evaluation approach. However, it does not have a systematic script or a set of steps. A review, an interview or even a survey may work in the same way, asking the expert about how reasonable that model and its outputs are. </font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Most of comparisons among simulated and actual data rely on historical or predictive validation. Sargent <a id="br15">[</a><a href="#r15">15</a>] also mentions a group called &ldquo;Historical Methods&rdquo;, which is composed by three V&amp;V approaches for simulation models: Rationalism; Empiricism, that &ldquo;requires every assumption and outcome to be empirically validated&rdquo;; and Positive Economics, that &ldquo;requires that the model be able to predict the future, rather than concerned with model&rsquo;s assumptions or causal relationships (mechanism)&rdquo;.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">We agree that Rationalism may contribute in V&amp;V of simulation models. However, for the empiricism, it has a general description and seems to be just a characteristic or a type of verification, since it can be reworded as the Historical Validation or Predictive Validation procedures, for instance. It is also true for the Positive Economics, being a matter of perspective or abstraction. Finally, Sargent <a id="br15">[</a><a href="#r15">15</a>] also presents the &ldquo;Multistage Validation&rdquo; procedure that consists in performing the &ldquo;Historical Methods&rdquo;, namely, Rationalism, Empiricism and Positive Economics sequentially.</font></font></p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">As an example of application of such V&amp;V procedures, Abdel-Hamid <a id="br21">[</a><a href="#r21">21</a>] submitted his model to several of them. The basis for developing his Software Project Integrated Model, using the System Dynamics (SD) approach, was field interviews with software project managers in five organizations, supplemented by an extensive database of empirical findings from the technical literature. Additionally, the author performed tests to verify the fit between the rate/level/feedback structure of the model and the essential characteristics of the real software projects dynamics. The project managers involved in the study confirmed this fit. However, the paper does not contain procedure descriptions for the tests and reviews. Besides, the results were not reported either. So, one may ask among other questions, &ldquo;</span><span lang="en-US"><i>What kinds of test were performed? How many discrepancies were identified by the project managers?</i></span>&rdquo;</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Another performed procedure is the comparison against reference behaviors. In this case, the author textually and graphically describes the behavior and presents the model representation using System Dynamics diagrams. The reference behavior in this case is the 90% syndrome, where developers use to miscalculate the required effort for a task and always underestimate it.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In addition, the simulation results in <a id="br21">[</a><a href="#r21">21</a>] were plotted in sequence run charts to compare against the expected behavior. Thus, the results seem to indicate the fit between the reference behavior and simulation results. Reference behaviors reproduced by the model included a diverse set of behavior patterns observed both in the organizations studied as well as reported in the literature. </font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The author also reports extreme condition simulations, i.e., to &ldquo;test whether the model behaves reasonably under extreme conditions or extreme policies&rdquo; <a id="br21">[</a><a href="#r21">21</a>]. </font></font> </p>     ]]></body>
<body><![CDATA[<p lang="en-US" class="western" align="center" style="margin-top: 0.78cm; margin-bottom: 0.18cm; line-height: 0.39cm; page-break-after: avoid"> <font face="Times New Roman, serif"><font size="2" style="font-size: 9pt"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b><a name="t1">Table</a> 1:</b></font></font> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span style="font-weight: normal">Verification and Validation Procedures for Simulation Models</span></font></font></font></font>     <br> <img src="/img/revistas/cleiej/v18n1/1a05t1.jpg"> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Additionally, the author conducted a case study at NASA. According to him, the DE-A project case study, which was conducted after the model was completely developed, forms an important element in validating model behavior as NASA was not part of the five organizations studied during model development. <a id="br21">[</a><a href="#r21">21</a>]</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">It is important to note, as also pointed out by the author, that one of these procedures alone may not provide enough validity for this model. However, taking them together can represent a solid group of positive results <a id="br21">[</a><a href="#r21">21</a>].</font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>6 Recommendations for the improvement of Simulation Studies</b></span></font></font></span></h1>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Considering the V&amp;V procedures mentioned in the previous section, now we relate them to the threats to validity identified in the context of SE simulation studies (section 4). The goal of such matching is (1) to provide explanation about how to avoid different bias imposed by the threats through performing specific V&amp;V procedures and (2) to highlight the using of such procedures cannot avoid all the threats to simulation studies validity. From these explanations, we make some recommendations to guide researchers for SBS planning.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">One general threat, not directly related to any specific recommendation given on this section, concerns the lack of evidence regarding the model validity that reduces the findings only to the simulation model. Obviously, one can avoid such threat by successfully applying a subset of the V&amp;V procedures presented in <a href="#t1">Table 1</a>. The main issue is that every attempt to validate a simulation model should be available to enable a proper output analysis from the experimenter perspective.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">It is possible to divide the V&amp;V procedures presented in the previous section into two perspectives: black and white box. The Face Validity procedure is the only one from <a href="#t1">Table 1</a> with a white box perspective. Such procedure enables the investigation of internal properties and behaviors of a simulation model, rather than dealing with it as a black box, in which just the combinations of input and output are evaluated. Usually, experts review the simulation model using their own knowledge using both conceptual (cause-effect diagrams, process descriptions, and other notations or languages) and executable models (calibrated models, simulation tools and outputs) to discuss their understandings with the model developers in terms of variables, relationships, and behaviors. Among the expected results, we can point out unrealistic model assumptions and simulation scenarios, misfit between concepts and measurements, unexpected output patterns and behaviors, and others. It can be worthwhile to perform this V&amp;V procedure in two different moments: one still in the conceptual model development to avoid bias of desired results and when analyzing the matching between input and output values. Thus, threats to construct validity, involving the mechanisms explaining the phenomenon captured by the simulation model, are potentially identifiable by domain experts in advance. Examples of such threats are the failure on capturing the corresponding real world building blocks and elements, and inappropriate definition of cause-effect relationships.</font></font></p>     <center> 	<table width="462" cellpadding="4" cellspacing="0"> 		<col width="452"> 		<tr> 			<td width="452" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				1</b></i></span><span lang="en-US"><i>. Make use of Face Validity 				procedures, involving domain experts, to assess the plausibility 				of both conceptual, executable models and simulation outcomes, 				using proper diagrams and statistical charts as instruments 				respectively.</i></span></font></font></p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">To ground the model propositions or causal relationships on empirical evidence can also help to mitigate the second threat, what sounds good to have at least one empirical evidence report regarding the embedded cause-effect relationships, showing some external validity <a id="br26">[</a><a href="#r26">26</a>].</span></font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Ara&uacute;jo </span><span lang="en-US"><i>et al</i></span> <span lang="en-US"><a id="br6">[</a><a href="#r6">6</a>] performed a set of systematic literature reviews aiming at reinforcing the validity of their SD model for observation of software evolution. In that opportunity, the reviews supported the identification of sixty reports of evidence for different relationships among the characteristics (e.g., eight reports of evidence for the relationship between characteristics Complexity and Maintainability) defined in their model.</span></font></font></p>     <center> 	<table width="475" cellpadding="4" cellspacing="0"> 		<col width="465"> 		<tr> 			<td width="465" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				2.</b></i></span> <span lang="en-US"><i>Try to support model 				(causal) relationships, as much as possible, with empirical 				evidence to reinforce their validity and draw conclusions that 				are more reliable.</i></span></font></font></p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Using </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Face Validity</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">in combination with </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Sensitivity Analysis</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">can assist the proper selection of model&rsquo;s input parameters. Sensitive parameters should be made accurate prior to using the simulation model. </span></font></font> </p>     <center> 	<table width="475" cellpadding="4" cellspacing="0"> 		<col width="465"> 		<tr> 			<td width="465" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				3.</b></i></span> <span lang="en-US"><i>Use results from 				Sensitivity Analysis to select valid parameters&rsquo; settings 				when running simulation experiments, rather than model &ldquo;fishing&rdquo;.</i></span></font></font></p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">In the same sense, Face Validity can be used along with the Rationalism to assess the model&rsquo;s assumptions regarding the underlying concepts. The concern with assumptions verification tends to make them explicit. However, when the model assumptions are hidden or not clearly stated, no Face Validity can be applied. For these cases, procedures like </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Comparison to Reference Behaviors</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">and </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Testing Structure and Model Behavior</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">are more suitable. The baseline or expected behaviors can give insights about the hidden model assumptions are affecting its results. Additionally, when using these two black box approaches, the design for the validation experiments need to involve the most sensitive parameters regarding the specific model assumptions. For instance, a SPSM assumes that requirements are always independent from each other (see <a id="br23">[</a><a href="#r23">23</a>] for a concrete example). In this case, validation experiments need to involve scenarios, and consequently input parameters, that enable the experts to observe whether outcomes are similar enough to expected behaviors in which they are confident about the dependency between requirements, so that they can accept such assumption.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The use of Rationalism to develop or verify the simulation model can be hampered due to lack of proved (or assumed as truth) assumptions, particularly in Software Engineering context. Thus, this procedure should be combined with empirical evidence, which is a similar approach to the &ldquo;Historical Methods&rdquo; mentioned by Sargent in <a id="br15">[</a><a href="#r15">15</a>].</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">In discrete-event models, it is usual to assume theoretical distributions to define how often events are dispatched. Such kind of assumption can be tested using the Event Validity procedure to verify if occurrences of the simulation model are similar to those of the real phenomenon, like events representing defect detection rates, requirements change requests, and others. </font></font> </p>     <center> 	<table width="462" cellpadding="4" cellspacing="0"> 		<col width="452"> 		<tr> 			<td width="452" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				4.</b></i></span> <span lang="en-US"><i>Always verify model 				assumptions, so the results of simulated experiments can get more 				reliable.</i></span></font></font></p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">From the black box perspective, </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Comparison to Reference Behaviors</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">can also help to identify situations where simulation results differ from the outcomes of empirical observations. However, for those cases in which there is a mismatch between the simulated and empirical outcomes, procedures like </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Historical Data Validation</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">and </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Predictive Validation</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"> </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">are more suitable, as long as enough data is available and both simulation output and empirical data share a common measurement context.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Black box approaches may also assist on data validity issues. Often, simulation models have a calibration procedure, and using it inappropriately may cause strange behaviors or invalid results. Turing tests may help with these situations, once these simulated results should resemble the actual ones. If a phenomenon expert cannot identify such a difference, the results have an acceptable degree of confidence. Another possibility is to use other models as a comparison baseline instead of experts, for that Comparison to Other Models. Specific for event-driven simulation, Event Validity procedure can help on improving data validity of input distributions or pseudo-random variables.</font></font></p>     <center> 	<table width="462" cellpadding="4" cellspacing="0"> 		<col width="452"> 		<tr> 			<td width="452" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				5.</b></i></span> <span lang="en-US"><i>When comparing actual and 				simulated results, be aware about data validity and that data 				under comparison came from the same or similar measurement 				contexts. </i></span></font></font> 				</p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">For stochastic simulations, these models have their particularities and the main difference to be validated is the amount of internal variation on the outcomes. The threat of considering only one observation when dealing with stochastic simulation, rather than central tendency and dispersion measures can bias or blind the user or experimenter on the interpretation of results. The V&amp;V procedure &ldquo;</span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i>Internal Validity</i></span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES">&rdquo; </span></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">(the term adopted by Sargent <a id="br15">[</a><a href="#r15">15</a>] is overloaded with the Cook and Campbell as presented in <a id="br14">[</a><a href="#r14">14</a>] classification of threats to validity, but they have complete different meanings) helps on the understanding and measuring the amount of internal variation of stochastic models by running the model with the same input configuration and calculating both central and dispersion statistics. The results should be compared to real phenomenon observations to understand whether the both amounts of variation are proportional. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Performing one procedure or another can bring some validity to the study. The simulation models should be valid, based on evidence regarding its validity. It is important for not reducing the findings only to the simulations themselves.</font></font></p>     <center> 	<table width="475" cellpadding="4" cellspacing="0"> 		<col width="465"> 		<tr> 			<td width="465" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				6.</b></i></span> <span lang="en-US"><i>Make use of proper 				statistical tests and charts to analyze outcomes from several 				runs, compare to actual data and to quantify the amount of 				internal variation embedded in the (stochastic) simulation model, 				augmenting the precision of results. </i></span></font></font> 				</p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Once understood that V&amp;V procedures may help to perform more confident simulation studies, it should also be pointed out that they are not silver bullets. We still can mention a series of threats that do not directly relate to such procedures, but to the adopted experimental design for the study and the output analysis procedures and instruments. For instance, threats regarding conclusion validity like considering only one observation when dealing with stochastic simulation and not using proper statistics when comparing simulated to empirical distributions (already considered in Recommendation 6). To mitigate threats like these, the experimenter needs a clear understanding of what to observe in the outcomes and the available statistical instruments to perform such analysis, since single values are neither able to capture the real trends and variance in stochastic simulations nor difference between actual data and simulations.</span></font></font></p>     ]]></body>
<body><![CDATA[<center> 	<table width="462" cellpadding="4" cellspacing="0"> 		<col width="452"> 		<tr> 			<td width="452" valign="top" style="border: 1px solid #00000a; padding-top: 0.05cm; padding-bottom: 0.05cm; padding-left: 0.11cm; padding-right: 0.1cm"> 				    <p lang="es-ES" class="western" align="justify"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><i><b>Recommendation 				7.</b></i></span> <span lang="en-US"><i>When designing the 				simulation experiment, consider as factors (and levels) not only 				the simulation model&rsquo;s input parameters, but also internal 				parameters, different sample datasets and versions of the 				simulation model, implementing alternative strategies to be 				evaluated. </i></span></font></font> 				</p> 			</td> 		</tr> 	</table> </center>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Additionally, threats to external validity like simulation results are context-dependent, since there is a need for calibration and the possibility of not generalizing the results to other simulations of the same phenomena are other examples of threats not handled by V&amp;V procedures. In such cases, the experimental design should provide scenarios exploring different situations that one behavior is consistent across different contexts, through different datasets in which is possible to observe the phenomenon under investigation, and scenarios, using balanced combinations of factors and levels in the adopted experimental design. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Other threats still may not be mitigated using V&amp;V procedures, but carefully planning the simulation experiments, tying the goals to research questions and to design and also verifying the feasibility of adopting simulation as alternative support for experimentation. That is what happens for threats such as: missing factors; different datasets (context) for model calibration and experimentation; naturally different treatments (unfair) comparison; and inappropriate use of simulation.</font></font></p>     <p lang="en-US" class="western" align="center" style="margin-top: 0.78cm; margin-bottom: 0.18cm; line-height: 0.39cm; page-break-after: avoid"> <font face="Times New Roman, serif"><font size="2" style="font-size: 9pt"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b><a name="t2">Table</a> 2: </b></font></font><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span style="font-weight: normal">Threats to validity associated to each recommendation</span></font></font> </font></font>     <br> <img src="/img/revistas/cleiej/v18n1/1a05t2.jpg"> </p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">At last, there is a recurrent threat to internal validity that is hard to identify: the simulation model simplifications (assumptions) forcing desired outcomes, rather than producing them based on the determination of proper scenarios and explained by a causal chain of events and actions. It configures a threat to internal validity since it reflects the model developer embedding the desired behavior into the simulation model, not allowing different results to occur by setting different scenarios. In other words, there is now way of assuring that the treatment (represented by the input parameters) is really causing the outcomes. It sounds like to know the answer for the research questions before running the simulation model and having no explanation, from the simulation results, for why that behavior was observed. From the viewpoint of simulation outputs compared to empirical observations, this one does not represent any threat. When all empirical and simulated and values are statistically similar, everything seems to be perfect. The problem lies on such limited black box view. The reason for reaching the desired output cannot be explained by a reasonable causal model or mechanism, but an explicitly generation from the input parameters to the output variables. So, one is not able of explain how to get such outcomes in real life, since there is no mechanism for a theoretical explanation. In summary, there is no way of making interventions to reproduce such behavior in real world, because the reasoning is missing and the result has probably occurred by chance. Comparison-based procedures cannot capture this type of threat. Just white box procedures like </span><span lang="en-US"><i>Face Validity</i></span> <span lang="en-US">involving simulation experts may help to identify such threat.</span></font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The adoption of all these recommendations has an impact in the effort and costs of the simulation study. It gets clear when realizing that the hours spent with domain experts in meetings for model reviews and also the effort demand by some data collection procedures, for example in Predictive Validation, and gathering evidence to reinforce causal relationships may be over. Thus, what is important is that both the goals for the simulation study and the expected benefits should drive the balance between the efforts for mitigating threats to validity and the risk of not doing so. Such benefits may be expressed in terms of meaningfulness (quality) of results and conclusions or amount the risks to take actions for implementing results.</font></font></p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Although discussed in this section, we present the association among threats to validity, recommendations on how to deal with them, and V&amp;V procedures (<a href="#t2">Table 2</a>). In addition, it is relevant to highlight that V&amp;V procedures cannot mitigate four threats, since they are related to other planning issues such as simulation feasibility, data collection and experimental design definition.</font></font></p> <h1 lang="es-ES" class="western" align="justify" style="margin-bottom: 0.21cm"> <span style="font-variant: normal"><font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US"><b>7 Final Remarks</b></span></font></font></span></h1>     <p lang="es-ES" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">Taking simulation as a complementary research strategy for the evolution of Software Engineering knowledge, mainly in contexts where </span><span lang="en-US"><i>in vivo</i></span> <span lang="en-US">or </span><span lang="en-US"><i>in vitro</i></span> <span lang="en-US">experiments are unfeasible or risky, researchers should be aware about possible threats involved in this sort of study. The results reported on this paper advance the current state in ESE, by exposing such threats to SBS validity and matching them to V&amp;V procedures for simulation models. Besides, seven recommendations, all of them grounded in technical literature acquired data, emerged for planning the tasks intending to reduce the possibility of occurrence of threats to validity. </span></font></font> </p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">We believe that the identification and compilation of such threats complemented by their discussion and analysis offers an evolved perspective that can contribute for the maturity of SBS, where most of time the main tasks have been performed </span><span lang="en-US"><i>ad-hoc</i></span> <span lang="en-US">due the lack of orientation, especially regarding model experimentation. Additionally, the possibility of detecting some of these threats by using V&amp;V procedures; the understanding of how to avoid them; and presenting a set of recommendations configure an interesting contribution. As far as we are aware, there is no other work offering this sort of discussion in the experimental software engineering technical literature. </span></font></font> </p>     <p lang="en-US" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">The organization of knowledge available in the technical literature regarding simulation studies in SE through secondary studies has directed our efforts. This organization involves synthesis and knowledge representation as guidelines for the planning and reporting of SBS, which is not a simple task. </font></font> </p>     <p lang="es-ES" class="western" align="justify" style="text-indent: 0.64cm; margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="en-US">As future directions, we are investigating how the </span><span lang="en-US"><i>Design of Experiments</i></span> <span lang="en-US">can contribute to improve the quality and confidence of simulation based studies in SE. Not only in the perspective presented by <a id="br28">[</a><a href="#r28">28</a>] and <a id="br33">[</a><a href="#r33">33</a>], but also as an enabler to explore more ambitious results than just anticipating </span><span lang="en-US"><i>in vitro</i></span> <span lang="en-US">and </span><span lang="en-US"><i>in vivo</i></span> <span lang="en-US">experiments.</span></font></font></p>     <p lang="en-US" class="western" align="left" style="margin-right: 0.02cm; margin-top: 0.42cm; margin-bottom: 0.21cm; line-height: 100%; page-break-inside: avoid; orphans: 0; widows: 0; page-break-after: avoid"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>Acknowledgements</b></font></font></p>     <p lang="en-US" class="western" align="justify" style="margin-bottom: 0.5cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt">Authors thank CNPq (Grants 141152/2010-9 and 304795/2010-0) for supporting this research.</font></font></p>     <p lang="en-US" class="western" align="left" style="margin-right: 0.02cm; margin-top: 0.42cm; margin-bottom: 0.21cm; line-height: 100%; page-break-inside: avoid; orphans: 0; widows: 0; page-break-after: avoid"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><b>References</b></font></font></p>     <!-- ref --><p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r1">[</a><a href="#br1">1</a>] S. Thomke, Experimentation Matters: Unlocking the Potential of New Technologies for Innovation. Harvard Business School Press, Boston, 2003.    </font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r2">[</a><a href="#br2">2</a>] J. E. Eck and L. Liu, &ldquo;Contrasting simulated and empirical experiments in crime prevention,&rdquo; J Exp Criminol, vol. 4, pp. 195-213, 2008.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r3">[</a><a href="#br3">3</a>] B. B. N. de Fran&ccedil;a and G. H. Travassos, &ldquo;Are We Prepared for Simulation Based Studies in Software Engineering Yet?&rdquo; CLEI electronic journal, vol. 16, no. 1, paper 8, Apr. 2013.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r4">[</a><a href="#br4">4</a>] B. B. N. de Fran&ccedil;a and G. H. Travassos, &ldquo;Reporting guidelines for simulation-based studies in software engineering,&rdquo; in Proc. 16th International Conference on Evaluation &amp; Assessment in Software Engineering, pp. 156 &ndash; 160, Ciudad Real, Spain, 2012.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r5">[</a><a href="#br5">5</a>] B. A. Kitchenham, H. Al-Kilidar, M. Ali Babar, M. Berry, K. Cox, J. Keung, F. Kurniawati, M. Staples, H. Zhang, L. Zhu, &ldquo;Evaluating guidelines for reporting empirical software engineering studies,&rdquo; Empirical Software Engineering, vol.13, no. 1, pp. 97-121. 2008.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r6">[</a><a href="#br6">6</a>] M. A. P. Ara&uacute;jo, V. F. Monteiro, G. H. Travassos, &ldquo;Towards a model to support in silico studies of software evolution,&rdquo; in Proc. of the ACM-IEEE international symposium on empirical software engineering and measurement, pp. 281-290, New York, 2012.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r7">[</a><a href="#br7">7</a>] M. O. Barros, C. M. L. Werner, G. H. Travassos, &ldquo;A system dynamics metamodel for software process modeling,&rdquo; Software Process: Improvement and Practice, vol. 7, no. 3-4, pp. 161-172, 2002.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"><a id="r8">[</a><a href="#br8">8</a>] J. Biolchini, P. G.Mian, A. C. Natali, G. H. Travassos, &ldquo;Systematic Review in Software Engineering: Relevance and Utility,&rdquo; PESC-COPPE/UFRJ, Brazil. Tech. Rep. <a class="western" href="http://www.cos.ufrj.br/uploadfiles/es67905.pdf">http://www.cos.ufrj.br/uploadfiles/es67905.pdf</a>. 2005.</span></font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r9">[</a><a href="#br9">9</a>] M. Pai, M. McCulloch, J. D. Gorman, &ldquo;Systematic Reviews and meta-analyses: An illustrated, step-by-step guide,&rdquo; The National Medical Journal of India, vol. 17, n.2, 2004.</font></font></p>     <!-- ref --><p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"><a id="r10">[</a><a href="#br10">10</a>] JabRef reference manager, available at: <a class="western" href="http://jabref.sourceforge.net/">http://jabref.sourceforge.net</a>.    </span></font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r11">[</a><a href="#br11">11</a>] R. Ahmed, T. Hall, P. Wernick, S. Robinson, M. Shah, &ldquo;Software process simulation modelling: A survey of practice,&rdquo; Journal of Simulation, vol. 2, pp. 91 &ndash; 102, 2008.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r12">[</a><a href="#br12">12</a>] G. H. Travassos and M. O. Barros, &ldquo;Contributions of In Virtuo and In Silico Experiments for the Future of Empirical Studies in Software Engineering&rdquo;, in Proc. WSESE03, Fraunhofer IRB Verlag, Rome, 2003.</font></font></p>     ]]></body>
<body><![CDATA[<!-- ref --><p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r13">[</a><a href="#br13">13</a>] J. Corbin and A. Strauss, Basics of qualitative research: Techniques and procedures for developing grounded theory. Sage Publications. 2007.    </font></font></p>     <!-- ref --><p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r14">[</a><a href="#br14">14</a>] C. Wohlin, P. Runeson, M. Host, M. Ohlsson, B. Regnell, A. Wesslen,&nbsp;Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers. 2000.    </font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r15">[</a><a href="#br15">15</a>] R. G. Sargent, &ldquo;Verification and Validation of Simulation Models,&rdquo; in Proc. of the Winter Simulation Conference, pp. 166 &ndash; 183, Baltimore. 2010.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r16">[</a><a href="#br16">16</a>] D. Pfahl, M. Klemm, G. Ruhe, &ldquo;A CBT module with integrated simulation component for software project management education and training,&rdquo; Journal of Systems and Software, vol. 59, n. 3, pp. 283 &ndash; 298. 2001.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r17">[</a><a href="#br17">17</a>] D. Pfahl, O. Laitenberger, J. Dorsch, G. Ruhe, &ldquo;An Externally Replicated Experiment for Evaluating the Learning Effectiveness of Using Simulations in Software Project Management Education,&rdquo; Empirical Software Engineering, vol. 8, n. 4, pp. 367 &ndash; 395. 2003.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r18">[</a><a href="#br18">18</a>] D. Rodr&iacute;guez, M. &Aacute;. Sicilia, J. Cuadrado-Gallego, D. Pfahl, &ldquo;e-learning in project management using simulation models: A case study based on the replication of an experiment,&rdquo; IEEE Transactions on Education, vol. 49, n. 4, pp. 451&ndash;463. 2006.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r19">[</a><a href="#br19">19</a>] D. Pfahl, O. Laitenberger, G. Ruhe, J. Dorsch, T. Krivobokova, &ldquo;Evaluating the learning effectiveness of using simulations in software project management education: Results from a twice replicated experiment,&rdquo;. Infor. and Software Technology, vol. 46, n. 2, pp. 127&ndash;147. 2004.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r20">[</a><a href="#br20">20</a>] V. Garousi, K. Khosrovian, D. Pfahl, &ldquo;A customizable pattern-based software process simulation model: Design, calibration and application,&rdquo; Software Process Improvement and Practice, vol. 14, n. 3, pp. 165 &ndash; 180. 2009.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r21">[</a><a href="#br21">21</a>] Abdel-Hamid, T, &ldquo;Understanding the &ldquo;90% syndrome&rdquo; in software project management: A simulation-based case study,&rdquo; Journal of Systems and Software, vol. 8, pp. 319-330. 1988.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r22">[</a><a href="#br22">22</a>] T. Thelin, H. Petersson, P. Runeson, C. Wohlin, &ldquo;Applying sampling to improve software inspections,&rdquo; Journal of Systems and Software, vol. 73, n. 2, pp. 257 &ndash; 269. 2004.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r23">[</a><a href="#br23">23</a>] M. Melis, I. Turnu, A. Cau, G. Concas, &ldquo;Evaluating the impact of test-first programming and pair programming through software process simulation,&rdquo; Software Process Improvement and Practice, vol. 11, pp. 345 &ndash; 360. 2006.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r24">[</a><a href="#br24">24</a>] I. Turnu, M. Melis, A. Cau, A. Setzu, G. Concas, K. Mannaro, &ldquo;Modeling and simulation of open source development using an agile practice,&rdquo; Journal of Systems Architecture, vol. 52, n. 11, pp. 610 &ndash; 618, 2006.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r25">[</a><a href="#br25">25</a>] F. Alvarez, G. A. Cristian, &ldquo;Applying simulation to the design and performance evaluation of fault-tolerant systems,&rdquo; in Proc. of the IEEE Symposium on Reliable Distributed Systems, pp. 35&ndash;42, Durham, 1997.</font></font></p>     <!-- ref --><p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r26">[</a><a href="#br26">26</a>] Davis, J. P.; Eisenhardt, K. M.; Bingham, C. B.: Developing Theory Through Simulation Methods. Academy of Management Review, vol. 32, n. 2, pp. 480-499. 2007.    </font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r27">[</a><a href="#br27">27</a>] B. Stopford, S. Counsell, &ldquo;A Framework for the Simulation of Structural Software Evolution,&rdquo; ACM Transactions on Modeling and Computer Simulation, vol. 18, 2008.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r28">[</a><a href="#br28">28</a>] D. X. Houston, S. Ferreira, J. S. Collofello, D. C. Montgomery, G. T. Mackulak, D. L. Shunk, &ldquo;Behavioral characterization: Finding and using the influential factors in software process simulation models,&rdquo; Journal of Systems and Software, vol. 59, pp. 259-270, 2001.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r29">[</a><a href="#br29">29</a>] H. Rahmandad, D. M. Weiss, &ldquo;Dynamics of concurrent software development,&rdquo; System Dynamics Review, vol. 25, n. 3, pp. 224&ndash;249, 2009.</font></font></p>     ]]></body>
<body><![CDATA[<p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r30">[</a><a href="#br30">30</a>] O. Balci, &ldquo;Guidelines for successful simulation studies,&rdquo; in Proc. Winter Simulation Conference, pp. 25-32, 1990.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r31">[</a><a href="#br31">31</a>] C. Alexopoulos, &ldquo;Statistical analysis of simulation output: State of the art,&rdquo; in Proc. Winter Simulation Conference , pp. 150 &ndash; 161, Dec. 2007. DOI = 10.1109/WSC.2007.4419597.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><span lang="es-ES"><a id="r32">[</a><a href="#br32">32</a>] J. P. C. Kleijnen, S. M. Sanchez, T. W. Lucas, T. M. Cioppa, &ldquo;State-of-the-Art Review: A User&rsquo;s Guide to the Brave New World of Designing Simulation Experiments,&rdquo; INFORMS Journal on Computing, vol. 17, n. 3, pp. 263-289, 2005. <a class="western" href="http://dx.doi.org/10.1287/ijoc.1050.0136">http://dx.doi.org/10.1287/ijoc.1050.0136</a> </span></font></font> </p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r33">[</a><a href="#br33">33</a>] W. W. Wakeland, R. H. Martin, D. Raffo, &ldquo;Using Design of Experiments, sensitivity analysis, and hybrid simulation to evaluate changes to a software development process: A case study,&rdquo; Software Process Improvement and Practice, vol. 9, pp. 107&ndash;119, 2004.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r34">[</a><a href="#br34">34</a>] G. H. Travassos, P. S. M. dos Santos, P. G. M. Neto, J. Biolchini, &ldquo;An environment to support large scale experimentation in software engineering,&rdquo; in Proc. of the 13th IEEE International Conference on Engineering of Complex Computer Systems, pp. 193-202, Mar. 2008.</font></font></p>     <p lang="es-ES" class="western" align="left" style="margin-left: 0.5cm; text-indent: -0.5cm; margin-bottom: 0.1cm; line-height: 100%"> <font face="Verdana, sans-serif"><font size="2" style="font-size: 10pt"><a id="r35">[</a><a href="#br35">35</a>] B. A. Kitchenham, T. Dyba, M. J&oslash;rgensen, &ldquo;Evidence-Based Software Engineering,&rdquo; in Proc. 26th ICSE, p.273-281, May, 2004.</font></font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Thomke]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[Experimentation Matters: Unlocking the Potential of New Technologies for Innovation]]></source>
<year>2003</year>
<publisher-loc><![CDATA[Boston ]]></publisher-loc>
<publisher-name><![CDATA[Harvard Business School Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Eck]]></surname>
<given-names><![CDATA[J. E.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Contrasting simulated and empirical experiments in crime prevention]]></article-title>
<source><![CDATA[J Exp Criminol]]></source>
<year>2008</year>
<volume>4</volume>
<page-range>195-213</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[de França]]></surname>
<given-names><![CDATA[B. B. N]]></given-names>
</name>
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Are We Prepared for Simulation Based Studies in Software Engineering Yet?]]></article-title>
<source><![CDATA[CLEI electronic journal]]></source>
<year>2013</year>
<volume>16</volume>
<numero>1</numero>
<issue>1</issue>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[de França]]></surname>
<given-names><![CDATA[B. B. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Reporting guidelines for simulation-based studies in software engineering]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Proc. 16th International Conference on Evaluation and Assessment in Software Engineering]]></conf-name>
<conf-date>2012</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kitchenham]]></surname>
<given-names><![CDATA[B. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Al-Kilidar]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Ali Babar]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Berry]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Cox]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Keung]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Kurniawati]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Staples]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Zhu]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evaluating guidelines for reporting empirical software engineering studies]]></article-title>
<source><![CDATA[Empirical Software Engineering]]></source>
<year>2008</year>
<volume>13</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>97-121</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Araújo]]></surname>
<given-names><![CDATA[M. A. P.]]></given-names>
</name>
<name>
<surname><![CDATA[Monteiro]]></surname>
<given-names><![CDATA[V. F.]]></given-names>
</name>
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Towards a model to support in silico studies of software evolution]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Proc. of the ACM-IEEE international symposium on empirical software engineering and measurement]]></conf-name>
<conf-date>2012</conf-date>
<conf-loc>New York </conf-loc>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Barros]]></surname>
<given-names><![CDATA[M. O.]]></given-names>
</name>
<name>
<surname><![CDATA[Werner]]></surname>
<given-names><![CDATA[C. M. L.]]></given-names>
</name>
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A system dynamics metamodel for software process modeling]]></article-title>
<source><![CDATA[Software Process: Improvement and Practice]]></source>
<year>2002</year>
<volume>7</volume>
<numero>3-4</numero>
<issue>3-4</issue>
<page-range>161-172</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Biolchini]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Mian]]></surname>
<given-names><![CDATA[P. G.]]></given-names>
</name>
<name>
<surname><![CDATA[Natali]]></surname>
<given-names><![CDATA[A. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Systematic Review in Software Engineering: Relevance and Utility]]></source>
<year>2005</year>
<publisher-name><![CDATA[PESC-COPPE/UFRJ]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pai]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[McCulloch]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Gorman]]></surname>
<given-names><![CDATA[J. D.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Systematic Reviews and meta-analyses: An illustrated, step-by-step guide]]></article-title>
<source><![CDATA[The National Medical Journal of India]]></source>
<year>2004</year>
<volume>17</volume>
<numero>2</numero>
<issue>2</issue>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="">
<source><![CDATA[JabRef reference manager, available at]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ahmed]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Hall]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Wernick]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Robinson]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Shah]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA['Software process simulation modelling: A survey of practice]]></article-title>
<source><![CDATA[Journal of Simulation]]></source>
<year>2008</year>
<volume>2</volume>
<page-range>91 - 102</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Barros]]></surname>
<given-names><![CDATA[M. O.]]></given-names>
</name>
</person-group>
<source><![CDATA[Contributions of In Virtuo and In Silico Experiments for the Future of Empirical Studies in Software Engineering]]></source>
<year>2003</year>
<publisher-loc><![CDATA[Rome ]]></publisher-loc>
<publisher-name><![CDATA[Verlag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Corbin]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Strauss]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Basics of qualitative research: Techniques and procedures for developing grounded theory]]></source>
<year>2007</year>
<publisher-name><![CDATA[Sage Publications]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wohlin]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Runeson]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Host]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Ohlsson]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Regnell]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
</person-group>
<source><![CDATA[A. Wesslen, Experimentation in Software Engineering: An Introduction. Kluwer Academic Publishers]]></source>
<year>2000</year>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sargent]]></surname>
<given-names><![CDATA[R. G.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Verification and Validation of Simulation Models]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Proc. of the Winter Simulation Conference]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>Baltimore </conf-loc>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pfahl]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Klemm]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Ruhe]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A CBT module with integrated simulation component for software project management education and training]]></article-title>
<source><![CDATA[Journal of Systems and Software]]></source>
<year>2001</year>
<volume>59</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>283 - 298</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pfahl]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Laitenberger]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
<name>
<surname><![CDATA[Dorsch]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Ruhe]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An Externally Replicated Experiment for Evaluating the Learning Effectiveness of Using Simulations in Software Project Management Education]]></article-title>
<source><![CDATA[Empirical Software Engineering]]></source>
<year>2003</year>
<volume>8</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>367 - 395</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rodríguez]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Sicilia]]></surname>
<given-names><![CDATA[M. Á.]]></given-names>
</name>
<name>
<surname><![CDATA[Cuadrado-Gallego]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Pfahl]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA['e-learning in project management using simulation models: A case study based on the replication of an experiment]]></article-title>
<source><![CDATA[IEEE Transactions on Education]]></source>
<year>2006</year>
<volume>49</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>451-463</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pfahl]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Laitenberger]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
<name>
<surname><![CDATA[Ruhe]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Dorsch]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Krivobokova]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evaluating the learning effectiveness of using simulations in software project management education: Results from a twice replicated experiment]]></article-title>
<source><![CDATA[Infor. and Software Technology]]></source>
<year>2004</year>
<volume>46</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>127-147</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Garousi]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[Khosrovian]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Pfahl]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A customizable pattern-based software process simulation model: Design, calibration and application]]></article-title>
<source><![CDATA[Software Process Improvement and Practice]]></source>
<year>2009</year>
<volume>14</volume>
<page-range>165 - 180</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Abdel-Hamid]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Understanding the '90 syndrome' in software project management: A simulation-based case study]]></article-title>
<source><![CDATA[Journal of Systems and Software]]></source>
<year>1988</year>
<volume>8</volume>
<page-range>319-330</page-range></nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Thelin]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Petersson]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Runeson]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Wohlin]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Applying sampling to improve software inspections]]></article-title>
<source><![CDATA[Journal of Systems and Software]]></source>
<year>2004</year>
<volume>73</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>257 - 269</page-range></nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Melis]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Turnu]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Cau]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Concas]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evaluating the impact of test-first programming and pair programming through software process simulation]]></article-title>
<source><![CDATA[Software Process Improvement and Practice]]></source>
<year></year>
<volume>11</volume>
<page-range>345 - 360</page-range></nlm-citation>
</ref>
<ref id="B24">
<label>24</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Turnu]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Melis]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Cau]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Setzu]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Concas]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Mannaro]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Modeling and simulation of open source development using an agile practice]]></article-title>
<source><![CDATA[Journal of Systems Architecture]]></source>
<year></year>
<volume>52</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>610 - 618</page-range></nlm-citation>
</ref>
<ref id="B25">
<label>25</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alvarez]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Cristian]]></surname>
<given-names><![CDATA[G. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Applying simulation to the design and performance evaluation of fault-tolerant systems]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Proc. of the IEEE Symposium on Reliable Distributed Systems]]></conf-name>
<conf-date>1997</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B26">
<label>26</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Davis]]></surname>
<given-names><![CDATA[J. P]]></given-names>
</name>
<name>
<surname><![CDATA[Eisenhardt]]></surname>
<given-names><![CDATA[K. M]]></given-names>
</name>
<name>
<surname><![CDATA[Bingham]]></surname>
<given-names><![CDATA[C. B]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Developing Theory Through Simulation Methods]]></article-title>
<source><![CDATA[Academy of Management Review]]></source>
<year>2007</year>
<volume>32</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>480-499</page-range></nlm-citation>
</ref>
<ref id="B27">
<label>27</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Stopford]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Counsell]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Framework for the Simulation of Structural Software Evolution]]></article-title>
<source><![CDATA[ACM Transactions on Modeling and Computer Simulation]]></source>
<year>2008</year>
<volume>18</volume>
</nlm-citation>
</ref>
<ref id="B28">
<label>28</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Houston]]></surname>
<given-names><![CDATA[D. X.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferreira]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Collofello]]></surname>
<given-names><![CDATA[J. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Montgomery]]></surname>
<given-names><![CDATA[D. C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Journal of Systems and Software]]></source>
<year>2001</year>
<volume>59</volume>
<page-range>259-270</page-range></nlm-citation>
</ref>
<ref id="B29">
<label>29</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rahmandad]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Weiss]]></surname>
<given-names><![CDATA[D. M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Dynamics of concurrent software development]]></article-title>
<source><![CDATA[System Dynamics Review]]></source>
<year>2009</year>
<volume>25</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>224-249</page-range></nlm-citation>
</ref>
<ref id="B30">
<label>30</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Balci]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Guidelines for successful simulation studies]]></article-title>
<source><![CDATA[]]></source>
<year>1990</year>
<conf-name><![CDATA[ Proc. Winter Simulation Conference]]></conf-name>
<conf-loc> </conf-loc>
<page-range>25-32</page-range></nlm-citation>
</ref>
<ref id="B31">
<label>31</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alexopoulos]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Statistical analysis of simulation output: State of the art]]></article-title>
<source><![CDATA[]]></source>
<year>Dec.</year>
<month> 2</month>
<day>00</day>
<conf-name><![CDATA[ Proc. Winter Simulation Conference]]></conf-name>
<conf-loc> </conf-loc>
<page-range>150 - 161</page-range></nlm-citation>
</ref>
<ref id="B32">
<label>32</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kleijnen]]></surname>
<given-names><![CDATA[J. P. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Sanchez]]></surname>
<given-names><![CDATA[S. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Lucas]]></surname>
<given-names><![CDATA[T. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Cioppa]]></surname>
<given-names><![CDATA[T. M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[State-of-the-Art Review: A User’s Guide to the Brave New World of Designing Simulation Experiments]]></article-title>
<source><![CDATA[INFORMS Journal on Computing]]></source>
<year>2005</year>
<volume>17</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>263-289</page-range></nlm-citation>
</ref>
<ref id="B33">
<label>33</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wakeland]]></surname>
<given-names><![CDATA[W. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Martin]]></surname>
<given-names><![CDATA[R. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Raffo]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Using Design of Experiments, sensitivity analysis, and hybrid simulation to evaluate changes to a software development process: A case study]]></article-title>
<source><![CDATA[Software Process Improvement and Practice]]></source>
<year>2004</year>
<volume>9</volume>
<page-range>107-119</page-range></nlm-citation>
</ref>
<ref id="B34">
<label>34</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Travassos]]></surname>
<given-names><![CDATA[G. H.]]></given-names>
</name>
<name>
<surname><![CDATA[dos Santos]]></surname>
<given-names><![CDATA[P. S. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Neto]]></surname>
<given-names><![CDATA[P. G. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Biolchini]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An environment to support large scale experimentation in software engineering]]></article-title>
<source><![CDATA[]]></source>
<year>Mar.</year>
<month> 2</month>
<day>00</day>
<conf-name><![CDATA[ Proc. of the 13th IEEE International Conference on Engineering of Complex Computer Systems]]></conf-name>
<conf-loc> </conf-loc>
<page-range>193-202</page-range></nlm-citation>
</ref>
<ref id="B35">
<label>35</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kitchenham]]></surname>
<given-names><![CDATA[B. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Dyba]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Jørgensen]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evidence-Based Software Engineering]]></article-title>
<source><![CDATA[]]></source>
<year>May,</year>
<month> 2</month>
<day>00</day>
<conf-name><![CDATA[ Proc. 26th ICSE]]></conf-name>
<conf-loc> </conf-loc>
<page-range>273-281</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
