<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0717-5000</journal-id>
<journal-title><![CDATA[CLEI Electronic Journal]]></journal-title>
<abbrev-journal-title><![CDATA[CLEIej]]></abbrev-journal-title>
<issn>0717-5000</issn>
<publisher>
<publisher-name><![CDATA[Centro Latinoamericano de Estudios en Informática]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0717-50002012000200007</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[On the Analysis of Human and Automatic Summaries of Source Code]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Moreno]]></surname>
<given-names><![CDATA[Laura]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Aponte]]></surname>
<given-names><![CDATA[Jairo]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad Nacional de Colombia  ]]></institution>
<addr-line><![CDATA[Bogotá ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Universidad Nacional de Colombia  ]]></institution>
<addr-line><![CDATA[Bogotá ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>08</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>08</month>
<year>2012</year>
</pub-date>
<volume>15</volume>
<numero>2</numero>
<fpage>2</fpage>
<lpage>2</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_arttext&amp;pid=S0717-50002012000200007&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_abstract&amp;pid=S0717-50002012000200007&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.edu.uy/scielo.php?script=sci_pdf&amp;pid=S0717-50002012000200007&amp;lng=en&amp;nrm=iso"></self-uri></article-meta>
</front><body><![CDATA[ <div class="maketitle">    <font face="Verdana" size="2">On the Analysis of Human and Automatic Summaries of Source Code</font>    <div class="author"> <font face="Verdana" size="2"> <span class="cmbx-12">Laura Moreno</span>     <br>   <span class="cmr-12">Universidad Nacional de Colombia, Departamento de Ingenier&iacute;a de Sistemas e Industrial,</span>     <br>                                      <span class="cmr-12">Bogot&aacute;, Colombia</span>     <br>                            <span class="cmti-12"><a href="mailto:lvmorenoc@unal.edu.co">lvmorenoc@unal.edu.co</a> </span><br class="and">  <span class="cmbx-12">Jairo Aponte</span>     <br>   <span class="cmr-12">Universidad Nacional de Colombia, Departamento de Ingenier&iacute;a de Sistemas e Industrial,</span>     <br>                                      <span class="cmr-12">Bogot&aacute;, Colombia</span>     <br>                                   <span class="cmti-12"><a href="mailto:jhapontem@unal.edu.co">jhapontem@unal.edu.co</a> </span>   </font></div>  <font face="Verdana" size="2">      <br>      <br>  </font>  </div>           ]]></body>
<body><![CDATA[<div class="abstract">     <div class="center"> <font face="Verdana" size="2">     <br>  </font>      <p> </p>      <div class="minipage">     <div class="center"> <font face="Verdana" size="2">     <br>  </font>      <p> </p>      <p><font face="Verdana" size="2"><span class="cmbx-10">Abstract</span></font></p>  </div>   <font face="Verdana" size="2">       <br>  </font>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2">Within the software engineering field, researchers have investigated whether it is possible and useful to summarize software artifacts, in order to provide developers with concise representations of the content of the original artifacts. As an initial step towards automatic summarization of source code, we conducted an empirical study where a group of Java developers provided manually written summaries for a variety of source code elements. Such summaries were analyzed and used to evaluate some summarization techniques based on Text Retrieval.<br class="newline">  This paper describes what are the main features of the summaries written by developers, what kind of information should be (ideally) included in automatically generated summaries, and the internal quality of the summaries generated by some automatic methods. </font> </p>      <div class="center"> <font face="Verdana" size="2">     <br>  </font>      <p> </p>      <p><font face="Verdana" size="2"><span class="cmbx-10">Spanish abstract</span></font></p>  </div>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">En el campo de ingenier&iacute;a de software, los investigadores han estudiado si es posible y &uacute;til resumir artefactos de software, con el fin de proveer a los desarrolladores con representaciones concisas del contenido de los artefactos originales. Como un primer paso hacia el resumen autom&aacute;tico de c&oacute;digo fuente, realizamos un estudio emp&iacute;rico en el que un grupo de desarrolladores de Java proporcion&oacute; res&uacute;menes escritos manualmente de diferentes elementos de c&oacute;digo fuente. Dichos res&uacute;menes fueron analizados y utilizados para evaluar algunas t&eacute;cnicas de resumen basadas en Recuperaci&oacute;n de Informaci&oacute;n de Textos. Este documento reporta las principales caracter&iacute;sticas de los res&uacute;menes escritos por los desarrolladores, el tipo de informaci&oacute;n que debe ser (idealmente) incluido en los res&uacute;menes generados autom&aacute;ticamente, y la calidad interna de algunos res&uacute;menes generados a trav&eacute;s de m&eacute;todos autom&aacute;ticos.</font></p>  </div>  </div>   </div>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2"><span class="cmbx-10">Keywords: </span>Software comprehension, source code, summarization, text retrieval.<br class="newline">       <br>   <span class="cmbx-10">Spanish keywords: </span>Comprensi&oacute;n de software, c&oacute;digo fuente, res&uacute;menes, recuperaci&oacute;n de informaci&oacute;n de textos.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>   <font face="Verdana" size="2">Received 2011-12-15, Revised 2012-05-16, Accepted 2012-05-16 </font>     </p>      <p><font face="Verdana" size="2"><span class="titlemark">1   </span> <a id="x1-10001"></a>Introduction</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">As a broad concept, summarization is the process of reducing large volumes of information in entities like texts, speeches, or films, to short abstracts comprising the main points or the gist in a concise form <span class="cite">(<a href="#c1">1</a>)</span><a name="c1."></a>. Currently, one of the most promising applications of summarization is its use as a complement or a second level of abstraction for text retrieval tools, since they often return a large number of documents that overwhelm their users. For instance, automated summarizing tools are needed by internet users who would like to utilize summaries as an instrument for knowing the structure or content of the returned documents in advance, and eventually, be able to effectively filter out irrelevant results.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Recently, software engineering researchers have begun to explore the use of summarization technologies, mainly as a potential instrument for supporting software comprehension tasks. These tasks are key developer activities during software evolution, accounting for more than half of the time spent on software maintenance <span class="cite">(<a href="#c2">2</a>)</span><a name="c2."></a>. In the case of source code artifacts, developers are often faced with software systems with thousands or millions of lines of code, and before attempting any changes to those systems, they must find and understand some specific parts of them. We argue that offering developers descriptions of source code entities can reduce the time and effort needed to browse files, locate and understand the part of the system that they need to modify. Ideally, such summaries will be informative enough to be used for filtering irrelevant artifacts, and even, as substitutes of the detailed reading of full artifacts. Even when they do not convey enough information to replace the originals, they could be useful <span class="cmti-10">indicative summaries</span>, a type of abstracts built with the purpose of indicating to the user which documents would be worth studying in more detail. In the worst scenario, developers would have to read the summary and the original artifact. Even in this case, this extra reading would be helpful, since the summary can provide a preview of the original document (e.g., its structure or an initial idea of its content).&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The major challenge in automatic software summarization is to handle mixed software artifacts such as source code, where information is encoded in a different way than in natural text documents. One key issue that we need to address is determining what is relevant in source code documents, and therefore, should be included in the summaries. The answer may be different for various types of source code entities (e.g., class vs. method) <span class="cite">(<a href="#c3">3</a>)</span><a name="c3."></a>, and also may differ between programming languages. Some feasible ways to address this issue are (1) to study how developers create summaries of source code artifacts, (<a href="#c2">2</a>) to analyze the summaries generated by them, and (<a href="#c3">3</a>) to use their expertise for determining what information should be included in the summaries of source code artifacts.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">A second aspect of software summarization research is the evaluation of the quality and usefulness of automatically generated summaries. It is essential to know if the generated abstracts are useful supporting development or maintenance tasks and how this positive effect can be assessed. Moreover, some metrics are required to determine if the summaries convey the most relevant information in the original artifact. As a result, there have been considered two broad types of evaluation: <span class="cmti-10">extrinsic </span>and <span class="cmti-10">intrinsic </span>evaluation <span class="cite">(<a href="#c4">4</a>)</span><a name="c4."></a>. The former one aims at determining whether the summaries are good instruments to support real-user&rsquo;s work, whereas the latter one measures internal properties of the abstracts such as semantic informativeness, coherence, and redundancy.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">From a research standpoint, intrinsic evaluation is important because it allows us to assess the results of a summarization system, compare the results of different approaches, and identify and understand the drawbacks of a particular summarization procedure. In this kind of evaluation, the quality of a summary can be established mainly through two approaches. In the first one, the <span class="cmti-10">peer summary </span>(i.e. the summary being evaluated) is reviewed and rated by human judges, using some pre-established guidelines. The second alternative is to measure the similarity between the peer summary and some reference abstract given by experts, which is often called the <span class="cmti-10">gold standard</span> <span class="cmti-10">summary</span>.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">In this paper, we present the results of an empirical study where a group of developers (<a href="#c1">1</a>) generated two types of manually-written summaries for various kinds of source code artifacts, and (<a href="#c2">2</a>) answered questions about what they think should be included in a summary. Additionally, we propose the use of these human-generated summaries, and some well-known text retrieval measures to carry out an intrinsic evaluation of several automatic summarization approaches.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The remaining of the paper is organized as follows. Section <a href="#x1-20002">2</a> presents the problem and research questions we want to answer. The details of the conducted empirical study are described in Section <a href="#x1-30003">3</a>. Section <a href="#x1-70004">4</a> sums up the more important results of the study, and discusses possible explanations and their implications for automatic summarization of source code. Section <a href="#x1-160005">5</a> discusses related work, and Section <a href="#x1-170006">6</a> draws some conclusions and remarks regarding extensions for this research.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>     </p>      <p><font face="Verdana" size="2"><span class="titlemark">2   </span> <a id="x1-20002"></a>Definition and Research Questions of the Case Study</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">The goal of this experiment was to analyze code summaries written by developers, and use these summaries as a test-bed to carry out a comparative evaluation of Text Retrieval (TR) techniques, when they are used as automatic code summarizers. The quality focus was on improving tool support for software comprehension tasks, as well as, providing a stable evaluation framework to measure whether automatic generated summaries convey the most relevant information in the original software artifacts. The perspective was of researchers who need to gain insight into (<a href="#c1">1</a>) how developers analyze and summarize various kinds of source code entities; (<a href="#c2">2</a>) what structural elements they consider should be included in a summary; and (<a href="#c3">3</a>) how intrinsic summary evaluation methods can be used to evaluate automatic code summarization approaches.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Therefore, within this case study the following research questions were formulated:&nbsp;</font></p>      <p>      </p>  <dl class="description">    <dd><font face="Verdana" size="2">     <span class="cmbx-10">RQ1</span> </font></dd>    <dd class="description"><font face="Verdana" size="2">How long are the summaries generated by developers?   </font>      </dd>    <dd><font face="Verdana" size="2">     <span class="cmbx-10">RQ2</span> </font></dd>    <dd class="description"><font face="Verdana" size="2">What type of structural information do developers include in their summaries?   </font>      </dd>    <dd><font face="Verdana" size="2">     <span class="cmbx-10">RQ3</span> </font></dd>    <dd class="description"><font face="Verdana" size="2">What are the main characteristics of the terms most selected by developers? Can they be considered      as gold-standard summaries for generating and evaluating automatic summarization tools?   </font>      </dd>    <dd><font face="Verdana" size="2">     <span class="cmbx-10">RQ4</span> </font></dd>    <dd class="description"><font face="Verdana" size="2">How good are text retrieval techniques as automatic summarizers of source code artifacts? How much      do these automatic summaries resemble the human-generated ones?</font></dd>  </dl>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Answers to RQ1 and RQ2 will give us valuable information about the data that should be (ideally) included in automatically generated summaries, and how they can be created. RQ3 and RQ4 aim at exploring TR techniques as code summarizers, and intrinsic evaluation methods as suitable approaches to measure the quality of their outcomes.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">3   </span> <a id="x1-30003"></a>Context of the Case Study</font></p>   <font face="Verdana" size="2">       ]]></body>
<body><![CDATA[<br>  </font>      <p><font face="Verdana" size="2">This section begins with a description of the resources selected to perform the empirical study, i.e., the system and the participants. Then, it presents the layout of the experiment.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">3.1   </span> <a id="x1-40003.1"></a>Objects</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">The system selected to carry out the experiment was aTunes, an open source project that manages and plays audio files. It is a small-medium sized Java system whose application domain is easy to understand and often interesting for almost any developer. Table <a href="#x1-40011">1</a> sums up the main features of the selected version. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>   </p>  <hr class="float">     <div class="float">        ]]></body>
<body><![CDATA[<div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-40011">Table&nbsp;1: </a></span><span class="content">Main characteristics of aTunes system</span></font></div>  <font face="Verdana" size="2">      <br>   </font>       <div class="pic-tabular"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a07t1.jpg"></font></div>       </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Since we are interested in analyzing how various types of source code entities are summarized, and we think the content and structure of summaries are affected by the size and type of the summarized artifact, we selected two methods, two classes, two groups of methods (each group of methods consists of three methods, which, as a calling sequence, implement a specific feature of the system) and one package.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">We focused on entities dealing with the business logic (not GUI or data layer classes), and excluded too short entities, such as methods with less than 10 lines, classes with less than 3 attributes or a few number of short methods, and packages with only one class. Table <a href="#x1-40022">2</a> shows the basic features of the selected artifacts. It is important to point out that the aTunes version selected for the study contains very few heading comments; in fact, the artifacts used in the study do not contain such comments, but they do contain some inline comments. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>   </p>  <hr class="float">     <div class="float">        ]]></body>
<body><![CDATA[<div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-40022" href="/img/revistas/cleiej/v15n2/2a07t2.jpg">Table&nbsp;2: </a></span><span class="content">Features Of The Artifacts Selected From aTunes System</span></font></div>  <font face="Verdana" size="2">      <br>       </font>       </div>  <hr class="endfloat">    </div>           <p><font face="Verdana" size="2"><span class="titlemark">3.2   </span> <a id="x1-50003.2"></a>Subjects</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">The subjects of this study were twelve graduate and undergraduate students from the Computer Science Department at Universidad Nacional de Colombia. Senior undergraduate students were recruited mainly from two courses: Software Engineering and Software Architecture; both courses are part of the second half of the undergraduate degree program in Computer Science. Only two of the participants were master students who were also working as professional Java developers. We excluded from the analysis the subjects who did not complete the tasks, and those who made mistakes due to the misunderstanding of experiment instructions. This way, the conducted analysis is based on the responses of only nine of the subjects.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">On average, the subjects had 4.6 years of programming experience. Regarding their knowledge of Java programming language, they reported an experience of 2.2 years on average, and considered their skills as satisfactory, good or very good, in all the cases. Just two of the subjects evaluated their experience in understanding and evolving systems as less than satisfactory. Since the experiment was carried out using the Eclipse IDE, we asked them about their abilities with this programming environment. Three of them considered their expertise with this particular IDE as poor or very poor. However, we think this issue did not have a major impact in the experiment&rsquo;s results because they used only searching and browsing commands, and they also mentioned extensive experience with NetBeans, a similar programming environment. Finally, subjects assessed their English proficiency at least as satisfactory, in all cases.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">3.3   </span> <a id="x1-60003.3"></a>Experiment layout</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">As a preliminary experiment setup, Eclipse and aTunes were installed on each computer used in the experiment, and the following documents were created: </font>      </p>  <ul class="itemize1">        <li class="itemize"><font face="Verdana" size="2">A description of the main functionality of aTunes that was used as an introduction to the system.      </font>      </li>        <li class="itemize"><font face="Verdana" size="2">Slides for explaining the tasks to do within each stage of the experiment.      </font>      </li>        <li class="itemize"><font face="Verdana" size="2">A form to collect information related to the programming experience and skills of each participant.      </font>      </li>        <li class="itemize"><font face="Verdana" size="2">Forms to collect the summaries of each source code artifact.      </font>      </li>        <li class="itemize"><font face="Verdana" size="2">Forms to collect feedback at the end of each session of the experiment.</font></li>      ]]></body>
<body><![CDATA[</ul>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Participants attended three sessions; each one lasted around 2 hours. All sessions began with a brief explanation of the tasks to do. In particular, a general explanation of the whole experiment was given at the beginning of the initial session.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The first session was a training session, where participants filled out an individual form about their English and programming skills. After that, a 10-minutes presentation of the software system was given, which included a demo of its most important functionality. The remaining time was spent by the subjects reading documentation of the system, and getting familiar with the organization of the source code.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">During the second session, subjects generated English sentence-based summaries for each artifact using the forms we prepared for this task. A <span class="cmti-10">sentence-based </span>summary is an unrestricted natural language description of the software entity (<span class="cmti-10">abstractive summary</span>), and within this experiment, it was used as a sanity-check instrument, i.e., a basic test to quickly evaluate that the answers of a participant did not contain elementary mistakes or impossibilities, or were not based on invalid assumptions. Each subject finished the session answering the post-experiment questionnaire regarding the tasks done and the kind of analysis performed.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">During the last session, participants summarized the artifacts in Spanish and then, they created term-based summaries. A <span class="cmti-10">term-based summary </span>is a set of unique words or identifiers selected from the source code of the software entity (<span class="cmti-10">extractive summary</span>). They selected relevant terms for each artifact, by enclosing them within source code files, using a set of predefined tags. Finally, each subject answered the post-experiment questionnaire regarding the tasks done and the usefulness of various parts of the code when doing term-based summarization.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">4   </span> <a id="x1-70004"></a>Results and Discussion</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p>    </p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><span class="titlemark">4.1   </span> <a id="x1-80004.1"></a>Brief description of artifacts&rsquo; content</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">As stated in section <a href="#x1-40003.1">3.1</a>, four types of artifacts were used in the experiment. Although all these entities are composed by a great number of identifiers and keywords, just some of them are useful to describe the content of the artifact. Actually, within a source code unit some terms are repeated constantly in different identifiers. For example, the method M1 is formed by 231 terms, but only 70 of those are unique. The same information for the other artifacts is presented in Table <a href="#x1-80013">3</a>. The length in this case refers to the amount of terms in the artifact, including keywords and identifiers, after a process of splitting. For instance, the method name <span class="cmti-10">timeInSeconds </span>is transformed into the terms <span class="cmti-10">time In Seconds</span>. As another example, the variable name <span class="cmti-10">DEFAULT_LANGUAGE_FILE</span>, is split in <span class="cmti-10">DEFAULT LANGUAGE FILE</span>. The rationale behind splitting is that we want to know which is the exact vocabulary used within each artifact. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>   </p>  <hr class="float">     <div class="float">        <div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-80013">Table&nbsp;3</a>: </span><span class="content">Identifiers and unique terms of each selected artifact sorted by length</span></font></div>  <font face="Verdana" size="2">      <br>   </font>       <div class="pic-tabular"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a07t3.jpg"></font></div>       </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       ]]></body>
<body><![CDATA[<br>  </font>      <p>   <font face="Verdana" size="2">At first glance, it can be observed that the ratio between unique terms and length is very low. Furthermore, this ratio decreases as the artifact increases its size. In detail, the ratio is higher for methods possibly because they encapsulate functions over specific objects. Sequences also perform specific functions but according to the classes which their methods belong to, the ratio of unique identifiers can increase: if the methods belong to the same class the ratio is lower, otherwise the ratio is higher. A possible explanation to this situation is that the greater amount of involved classes in an artifact, the more themes are touched by it, and by extension, there are more unique terms in its source code. </font>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">4.2   </span> <a id="x1-90004.2"></a>General description of developers&rsquo; summaries</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">In this phase of the study, we focused the analysis on summaries written in English, in order to compare and contrast them with term-based summaries and also with the declaration of the artifacts, both of these written in English. Once again, the length of the summary is calculated as the amount of split terms it contains.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Regarding the sentence-based summaries, there is no significant relationship between their size and the length or type of artifact they describe (p-value for Pearson&rsquo;s correlation = 0.359). Even the average of unique terms in this kind of summaries is almost the same for all type of artifacts (Table <a href="#x1-90014">4</a>). In the case of sequences, however, the length of the summaries is slightly greater, as well as their number of unique terms; this indicates that sequences are harder to describe or deserve more detailed descriptions. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>  </p>  <hr class="float">     <div class="float">        ]]></body>
<body><![CDATA[<div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-90014">Table&nbsp;4: </a> </span><span class="content">Length properties of sentence- and term-based summaries by artifact</span></font></div>  <font face="Verdana" size="2">      <br>   </font>       <div class="minipage">     <div class="pic-tabular"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a07t4.jpg"></font></div>  </div>       </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">On the other hand, for term-based summaries we found direct relationships with respect to lengths (p-value for Pearson&rsquo;s correlation <img src="/img/revistas/cleiej/v15n2/2a074x.png" alt="&lt;  " class="math"> 0.01). Specifically, developers tend to mark more terms when they are analyzing packages, fewer terms when the artifacts are classes or sequences, and even fewer terms when they are dealing with methods, as shown in Table <a href="#x1-90014">4</a>. This situation indicates that all kind of artifacts cannot be summarized with the same fixed number of terms: as granularity level increases, the amount of terms needed to describe an artifact decreases. The high standard-deviation values obtained in this case indicate that developers hardly ever mark similar number of terms. </font>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">4.3   </span> <a id="x1-100004.3"></a>Origin of relevant terms</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">In a post-experiment questionnaire, we asked the developers about the usefulness of structural information from source code when doing term-based summarization. To that end, the participants rated different locations of source code (e.g. class name, attribute type, attribute name, etc.) on a 1-to-4 Likert scale <span class="cite">(<a href="#c5">5</a>)</span><a name="c5."></a>, where <span class="cmti-10">1 </span>represented <span class="cmti-10">totally</span> <span class="cmti-10">useless </span>and <span class="cmti-10">4 </span>represented <span class="cmti-10">very useful</span>. We did not used the 5-level scale in order to avoid the tendency to mark non-committal answers (i.e., neither useful nor useless).&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Not surprisingly, through the questionnaire we found that when summarizing methods, classes and packages, their respective names were considered as the most useful parts of code. Other locations equally useful when describing methods were invoked methods, which together with methods names, were assessed as very useful when dealing with classes. In the case of sequences (and similar to methods), the parts of code considered as most useful were the names of the invoked methods and the variables and parameters names. We also noticed two striking facts on the questionnaire:&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>      </p>  <ol class="enumerate1">        <li class="enumerate" id="x1-10002x1"><font face="Verdana" size="2">Source code comments were not considered valuable, especially in packages and sequences cases, where      they were rated as totally useless information. </font>      </li>        <li class="enumerate" id="x1-10004x2"><font face="Verdana" size="2">Methods and classes names were considered useful when summarizing all four types of artifacts.</font></li>      </ol>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Furthermore, packages were recognized as more difficult to summarize than other artifacts, and only packages names, classes names and methods names were useful information, whereas the rest of the locations were marked as moderately or totally useless. This may be the cause of the tendency to mark a greater amount of terms within that type of artifacts, and additionally suggests that a multi-document approach, where each class of the package would be treated as an individual document, can be adequate for summarizing packages.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">In order to go deeper into the origin of relevant terms and contrast the questionnaire answers, we identified where every marked term belonging to term-based summaries came from. These origins were classified in the same categories used in the questionnaire (Table <a href="#x1-100055">5</a>). We discovered that developers constantly marked the local variable names when summarizing all artifacts, even for packages, where they had been classified as moderately useless information. We also noticed that terms from comments were hardly used when building summaries for sequences, classes and packages; this proves the uselessness of this location reported by developers. In the case of methods, however, terms in comments were even more frequently chosen than attributes names and parameter types, although these latter ones were ranked as very useful within this kind of artifact.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Additionally for methods summaries, the origin extraction confirmed the usefulness of method call names and method names given by developers to those parts of code; nevertheless, we observed that the names of their parameters were also frequently used when summarizing them. In contrast, well-ranked locations such as parameters types, classes names and attributes types, were not used at all when summarizing methods.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">With regard to sequences, developers considered methods names, invoked methods names and local variables names as relevant, and actually, this was confirmed by the terms marked for their summaries. It is worth mentioning that although attributes types, parameters types and methods returns types were considered as useful in the questionnaire, they were never used when summarizing sequences.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">In the case of classes summarization, developers rated their names as very useful information, which was confirmed by means of origins extraction. These same names were considered useful for all artifacts, but actually, they were rarely marked when summarizing methods and sequences. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      ]]></body>
<body><![CDATA[<p>   </p>  <hr class="float">     <div class="float">        <div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-100055" href="/img/revistas/cleiej/v15n2/2a07t5.jpg">Table&nbsp;5</a>: </span><span class="content">Percentage of terms&rsquo; origins marked by developers</span></font></div>  <font face="Verdana" size="2">      <br>       </font>       </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Concerning packages, their names together with the names of classes and methods were constantly used to summarize this type of artifact, just as mentioned by developers in the questionnaire. Nonetheless, other locations such as the names of attributes, variables and parameters, which were ranked as useless by developers, were in fact often used in packages summaries.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Surprisingly, the type of variables, attributes and parameters were barely used in most of the cases, albeit they were considered useful when summarizing methods and sequences. Even so, sequences and classes kept the correlation between the scores given to parts of code by developers, and the real proportion of terms&rsquo; locations in term-based summaries. This means that only such kinds of artifacts preserved a high coherence between developers&rsquo; opinions about the usefulness of the locations, and their actual summarization choices.&nbsp;</font></p>      <p> <font face="Verdana" size="2">Finally, additional categories not included in the questionnaire were used by developers, such as literal data allocated in string constants or systems logs, which were marked more times than other origins previously considered. For instance, the literal texts &rdquo;Exporting process done&rdquo; and &rdquo;Exporting songs&rdquo; were marked by some developers to describe the sequence S2. </font> </p>      <p><font face="Verdana" size="2"><span class="titlemark">4.4   </span> <a id="x1-110004.4"></a>Approximation to gold standard summaries</font></p>   <font face="Verdana" size="2">       <br>  </font>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2">As shown in Table <a href="#x1-110016">6</a>, from the set of unique terms in a sentence-based summary around 35% are provided by the declaration of the artifact, no matter its type. Apparently, this suggests that extractive approaches are not enough to generate the summaries automatically, given the great amount of new terms within the description provided by developers.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Nonetheless, the overlap between the term-based summaries (which are comprised exclusively by terms found in the declaration of the artifacts) and the terms in sentence-based summaries, reveals that relevant terms can be extracted from source code as the basis for a short description. Here, the relevancy of a term is defined by the amount of developers who chose it to be part of the summary, i.e., the agreement among subjects. Since in the experiment participated nine developers, the relevancy scale goes from 0 to 9, where 0 represents totally irrelevant and 9 means totally relevant. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>   </p>  <hr class="float">     <div class="float">        <div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-110016" href="/img/revistas/cleiej/v15n2/2a07t6.jpg">Table&nbsp;6</a>: </span><span class="content">Average overlap between declaration, sentence- and term-based summaries by artifact </span>   </font></div>  <font face="Verdana" size="2">      <br>      </font>      </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">For the terms in the last mentioned overlap, we found that about 75% of them were chosen by five or more developers. This percentage could increase if we take into account the use of synonyms in the free-form summaries; as a case in point, the term <span class="cmti-10">encode </span>found in method M1, was replaced by words such as <span class="cmti-10">transform </span>and <span class="cmti-10">convert </span>in the sentence-based summaries. Thus, some text retrieval techniques might be suitable for identifying the most prominent terms within source code artifacts, as was proposed by <span class="cite">(<a href="#c6">6</a>)</span><a name="c6."></a>.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>   <font face="Verdana" size="2">In each single artifact, the relevant terms in the intersection between term- and sentence-based summaries could be considered as a <span class="cmti-10">gold standard summary</span>, i.e., a reference or an ideal description that contains the important information of the entity under analysis. Nevertheless, the results show that the terms chosen by five or more developers in the term-based summaries form, in fact, a better approximation to those standard summaries. Some of their properties are presented in Table <a href="#x1-110027">7</a>. It can be noticed that the length of these summaries depends on the type of artifact they describe, and also on the length of such artifact (p-value for Pearson&rsquo;s correlation <img src="/img/revistas/cleiej/v15n2/2a077x.png" alt="&lt;  " class="math"> 0.01). Therefore, the gold summaries of packages and classes are larger than those that describe sequences and methods.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">About the terms&rsquo; origins, the proportions of term-based summaries remain stable for ideal summaries, with little exceptions. For example, in the case of sequences, attributes names and literal texts do not take part of gold summaries. The same situation occurs with variables names for classes and packages. The principal terms&rsquo; locations for each artifact are presented in Table <a href="#x1-110027">7</a>. </font>    </p>      <div class="table">  <font face="Verdana" size="2">      <br>  </font>      <p>   </p>  <hr class="float">     <div class="float">        <div class="caption"><font face="Verdana" size="2"><span class="id"><a id="x1-110027" href="/img/revistas/cleiej/v15n2/2a07t7.jpg">Table&nbsp;7</a>: </span><span class="content">Properties of gold standard, term-based summaries by artifact</span></font></div>  <font face="Verdana" size="2">      <br>        </font>        </div>  <hr class="endfloat">    </div>   <font face="Verdana" size="2">       <br>  </font>      <p>   <font face="Verdana" size="2">Since they represent the core of both types of abstracts, the gold standard summaries obtained in the experimental study represent the main target of the automatic summarizer we aim to achieve. Moreover, they are suitable to assess its results through intrinsic evaluation measures <span class="cite">(<a href="#c4">4</a>)</span>. </font>    </p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><span class="titlemark">4.5   </span> <a id="x1-120004.5"></a>Evaluating automatically generated summaries</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">Usually in text processing, the quality of summaries&rsquo; content is determined by comparing the <span class="cmti-10">peer summary </span>(i.e., the summary to be evaluated), with an ideal summary (i.e., a gold standard summary). In the specific case of extractive summaries, the primary metrics to perform such task are <span class="cmti-10">precision </span>and <span class="cmti-10">recall</span>. These metrics are based on the relevant content of the summary, and are defined as following: </font>    </p>  <center class="par-math-display"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a079x.png" alt="precision = |summarygold-&cap;-summarypeer|                   |summarypeer|" class="par-math-display"></font></center>   <font face="Verdana" size="2">       <br>  </font>      <p>    </p>  <center class="par-math-display"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a0710x.png" alt="recall = |summarygold &cap;-summarypeer|               |summarygold|" class="par-math-display"></font></center>   <font face="Verdana" size="2">       <br>  </font>      <p> </p>      <p>   <font face="Verdana" size="2">The range of both metrics is <img src="/img/revistas/cleiej/v15n2/2a0711x.png" alt="[0,1]  " class="math">. A precision value equal to one means that all the terms in the peer summary are relevant, although there could be relevant terms missing. On the other hand, a recall value equal to one means that the peer summary contains all the relevant terms, though it could also contain some irrelevant terms. In general, the lower the length of the peer summary, the higher the precision; whereas, the higher this length, the higher the recall.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">A third metric, called F-score, measures the balance between precision and recall: </font>    </p>  <center class="par-math-display"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a0712x.png" alt="      -precision-&sdot;recall- F = 2&sdot;precision+ recall" class="par-math-display"></font></center>   <font face="Verdana" size="2">       <br>  </font>      ]]></body>
<body><![CDATA[<p> </p>      <p>   <font face="Verdana" size="2">The highest value reached by this harmonic mean is an indicator of the best achievable combination of the metrics it involves.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">4.5.1   </span> <a id="x1-130004.5.1"></a>Text Retrieval Techniques in Source Code Summarization</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">In <span class="cite">(<a href="#c3">3</a>)</span>, three techniques from Text Retrieval were proposed to summarize source code artifacts: the Vector Space Model (VSM), Latent Semantic Indexing (LSI), and a combination of VSM and lead summarization. This latter approach is based on the hypothesis that the first sentences of a document are a good summary of it.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The <span class="cmti-10">Vector Space Model </span>is one of the most common algebraic models in Text Retrieval for representing text corpora. It assumes a corpus as a set of documents <img src="/img/revistas/cleiej/v15n2/2a0713x.png" alt="D  " class="math">, from which is extracted the set of terms <img src="/img/revistas/cleiej/v15n2/2a0714x.png" alt="T  " class="math">, i.e., the vocabulary. Then, it represents this corpus as a matrix <img src="/img/revistas/cleiej/v15n2/2a0715x.png" alt="M|T|&times; |D| " class="math">, where the row <img src="/img/revistas/cleiej/v15n2/2a0716x.png" alt="i  " class="math"> corresponds to the term <img src="/img/revistas/cleiej/v15n2/2a0717x.png" alt="ti &isin; T  " class="math">, and the column <img src="/img/revistas/cleiej/v15n2/2a0718x.png" alt="j  " class="math"> corresponds to the document <img src="/img/revistas/cleiej/v15n2/2a0719x.png" alt="dj &isin; D  " class="math">. In this sense, the value in the cell <img src="/img/revistas/cleiej/v15n2/2a0720x.png" alt="i,j  " class="math"> is the weight of the term <img src="/img/revistas/cleiej/v15n2/2a0721x.png" alt="vi  " class="math"> in the document <img src="/img/revistas/cleiej/v15n2/2a0722x.png" alt="dj  " class="math">.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The basic weighting scheme is the Boolean-based, which assigns one to the cell <img src="/img/revistas/cleiej/v15n2/2a0723x.png" alt="mi,j  " class="math"> if the term <img src="/img/revistas/cleiej/v15n2/2a0724x.png" alt="ti  " class="math"> occurs in the document <img src="/img/revistas/cleiej/v15n2/2a0725x.png" alt="dj  " class="math">, or zero otherwise. Other weighting schemes consider local and global weights, i.e., the contribution of the term <img src="/img/revistas/cleiej/v15n2/2a0726x.png" alt="ti  " class="math"> to the document <img src="/img/revistas/cleiej/v15n2/2a0727x.png" alt="dj  " class="math"> and to the entire set of documents <img src="/img/revistas/cleiej/v15n2/2a0728x.png" alt="D  " class="math">. For example, the popular scheme <span class="cmti-10">tf-idf </span>determines the weight of the term <img src="/img/revistas/cleiej/v15n2/2a0729x.png" alt="ti  " class="math"> by multiplying its frequency of occurrence in the document <img src="/img/revistas/cleiej/v15n2/2a0730x.png" alt="dj  " class="math">, by its inverse document frequency, as following: </font>     </p>  <center class="par-math-display"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a0731x.png" alt="mi,j = f (ti,dj)&times; idf (ti)" class="par-math-display"></font></center>   <font face="Verdana" size="2">       <br>  </font>      <p> </p>      ]]></body>
<body><![CDATA[<p>   <font face="Verdana" size="2">where </font>    </p>  <center class="par-math-display"> <font face="Verdana" size="2"> <img src="/img/revistas/cleiej/v15n2/2a0732x.png" alt="           (                   ) idf (ti) = log --------|D-|-------              |{d : ti &isin; d &and;d &isin; D}|" class="par-math-display"></font></center>   <font face="Verdana" size="2">       <br>  </font>      <p> </p>      <p>   <font face="Verdana" size="2">When summarizing source code, the documents are code artifacts such as methods or classes. The <img src="/img/revistas/cleiej/v15n2/2a0733x.png" alt="k  " class="math"> terms with the highest weight in the vector <img src="/img/revistas/cleiej/v15n2/2a0734x.png" alt="d  j  " class="math"> are the ones conforming the summary of the document. This <img src="/img/revistas/cleiej/v15n2/2a0735x.png" alt="k  " class="math"> value is usually called <span class="cmti-10">constant threshold</span>.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The <span class="cmti-10">Latent Semantic Indexing </span>(LSI) is based on a dimensionally reduced version of the vector space produced by VSM, in order to recover the underlying semantic in the corpus. Therefore, LSI uses <span class="cmti-10">Singular Value Decomposition</span> (SVD) to decompose the matrix <img src="/img/revistas/cleiej/v15n2/2a0736x.png" alt="M  " class="math"> into the left and right singular matrices <img src="/img/revistas/cleiej/v15n2/2a0737x.png" alt="U  " class="math"> and <img src="/img/revistas/cleiej/v15n2/2a0738x.png" alt="V  " class="math"> (which represent the terms and documents, respectively), and a diagonal matrix of singular values <img src="/img/revistas/cleiej/v15n2/2a0739x.png" alt="&Sigma;  " class="math">. Then, <img src="/img/revistas/cleiej/v15n2/2a0740x.png" alt="M  = U&Sigma;V * " class="math">, where <img src="/img/revistas/cleiej/v15n2/2a0741x.png" alt="V* " class="math"> is the transpose of <img src="/img/revistas/cleiej/v15n2/2a0742x.png" alt="V  " class="math">. The dimensions of these matrices can be reduced by choosing the <img src="/img/revistas/cleiej/v15n2/2a0743x.png" alt="C  " class="math"> first columns of <img src="/img/revistas/cleiej/v15n2/2a0744x.png" alt="U  " class="math"> and <img src="/img/revistas/cleiej/v15n2/2a0745x.png" alt="V  " class="math">, and the highest <img src="/img/revistas/cleiej/v15n2/2a0746x.png" alt="C  " class="math"> singular values in <img src="/img/revistas/cleiej/v15n2/2a0747x.png" alt="&Sigma;  " class="math">, which leads to <img src="/img/revistas/cleiej/v15n2/2a0748x.png" alt="M   = U  &Sigma; V *   C    C  C C  " class="math">, i.e., to the approximation of the matrix <img src="/img/revistas/cleiej/v15n2/2a0749x.png" alt="M  " class="math">.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The corpus representation produced by LSI allows to compute the similarity between terms and documents. The summary of the document <img src="/img/revistas/cleiej/v15n2/2a0750x.png" alt="d &isin; D  j  " class="math"> is formed by the <img src="/img/revistas/cleiej/v15n2/2a0751x.png" alt="k  " class="math"> terms in <img src="/img/revistas/cleiej/v15n2/2a0752x.png" alt="T  " class="math"> with highest cosine similarity with the vector of the document <img src="/img/revistas/cleiej/v15n2/2a0753x.png" alt="d  j  " class="math">.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">According to the results of an informal evaluation, where humans assessed the output of some TR-based summarizers, in <span class="cite">(<a href="#c3">3</a>)</span> it was concluded that the combination of lead summarization and VSM (from now on called <span class="cmti-10">lead+VSM</span>) produces better summaries than LSI and VSM by itself. Broadly, the <span class="cmti-10">lead</span> <span class="cmti-10">summaries </span>consist of the first <img src="/img/revistas/cleiej/v15n2/2a0754x.png" alt="k  " class="math"> terms that appear in the target documents. In the case of source code artifacts, these first terms often contain the artifact type and name, which are rarely found in the VSM summaries. Thus, the combined summaries contain complementary information from both techniques.&nbsp;</font></p>      <p>   </p>  <hr class="figure">     <div class="figure">   <font face="Verdana" size="2">       <br>    </font>        ]]></body>
<body><![CDATA[<div class="caption"><font face="Verdana" size="2"><span class="id"> <a name="x1-130011" href="/img/revistas/cleiej/v15n2/2a07f1.jpg">Figure&nbsp;1:</a> </span><span class="content">Average precision for VSM, LSI, lead+VSM, lead and random summaries. The <img src="/img/revistas/cleiej/v15n2/2a0755x.png" alt="x  " class="math">-axis represents the length of the summary, and the <img src="/img/revistas/cleiej/v15n2/2a0756x.png" alt="y  " class="math">-axis represents average precision values</span></font></div>  <font face="Verdana" size="2">&nbsp;    <br>  </font>      <p>   </p>  </div>  <hr class="endfigure">         <p><font face="Verdana" size="2"><span class="titlemark">4.5.2   </span> <a id="x1-140004.5.2"></a>Evaluating Text Retrieval Techniques through Intrinsic Measures</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">In order to evaluate the aforementioned techniques in software summarization, we computed the precision, recall and F-score metrics of their resulting summaries. For performing this task, we utilized the gold standard summaries described in section <a href="#x1-110004.4">4.4</a>. Besides, we considered two baseline summarization methods, namely lead and <span class="cmti-10">random</span>. This latter one generates summaries consisting of <img src="/img/revistas/cleiej/v15n2/2a0757x.png" alt="k  " class="math"> terms randomly chosen from the target documents.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">In <span class="cite">(<a href="#c3">3</a>)</span>, there are considered methods and classes summaries of length 5 and 10. In a different fashion, we evaluated summaries that vary its length from 5 to 20, since we were interested in analyze the influence of the length in the quality content metrics, and also, because we considered other kinds of code artifacts (i.e., sequences and packages). Thus, our objects of study were summaries composed by five to twenty terms, generated by VSM, LSI, lead+VSM, lead, and random methods, of each artifact described in Table <a href="#x1-40022">2</a>.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">For these summarization techniques, we observed that, as usual, the precision decreased as the recall and length of the summaries increased. However, in exceptional cases (artifacts M1, C2 and S2) there was an upward trend in the precision of LSI summaries, when increasing the length of the summaries. As expected, the precision of random summaries was low for all kind of artifacts and lengths, although in several cases LSI summaries had the lowest precision values, which confirms the results obtained in <span class="cite">(<a href="#c3">3</a>)</span>. This fact was clearly observed in the artifacts C2 and P1, where LSI summaries had lower precision than random summaries. It was noticeable that in average, lead, VSM and lead+VSM had significantly higher precision and recall than random and LSI, no matter the kind of artifact that was being summarized. This fact can be observed in Fig. <a href="#x1-130011">1</a> and Fig. <a href="#x1-140012">2</a>.&nbsp;</font></p>      <p>   </p>  <hr class="figure">     <div class="figure">   <font face="Verdana" size="2">       ]]></body>
<body><![CDATA[<br>    </font>        <div class="caption"><font face="Verdana" size="2"><span class="id"><a name="x1-140012" href="/img/revistas/cleiej/v15n2/2a07f2.jpg">Figure&nbsp;2</a>: </span><span class="content">Average recall for VSM, LSI, lead+VSM, lead and random summaries. The <img src="/img/revistas/cleiej/v15n2/2a0758x.png" alt="x  " class="math">-axis represents the length of the summary, and the <img src="/img/revistas/cleiej/v15n2/2a0759x.png" alt="y  " class="math">-axis represents average recall values</span></font></div>  <font face="Verdana" size="2">&nbsp;    <br>  </font>      <p>   </p>  </div>  <hr class="endfigure"> <font face="Verdana" size="2">     <br>  </font>      <p>   <font face="Verdana" size="2">When analyzing precision, we found that it was low for methods summaries having more than 10 terms, and for sequences and classes summaries having more than 15 terms. Considering the ranges where precision values were high, we found that in the method case, lead+VSM technique achieved the best results. This same technique together with lead got good precision values for sequences summaries. In the case of classes and packages, lead+VSM, lead and VSM summaries had similar precision, making it difficult to determine which technique was better for these kinds of artifacts. Furthermore, such techniques had a high precision (above <img src="/img/revistas/cleiej/v15n2/2a0760x.png" alt="0.5  " class="math">), even for long summaries.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Regarding recall values, once again lead+VSM, lead and VSM outperformed LSI and random, and in some cases, random summaries achieved higher recall than LSI summaries (e.g., for artifacts C2 and P1). In addition, for every summarization technique, the recall values remained constant for the summaries consisting of more than 15 terms, with few exceptions, such as the lead+VSM summaries of the artifacts C1, S1 and P1, which continue increasing when <img src="/img/revistas/cleiej/v15n2/2a0761x.png" alt="k &gt; 15  " class="math">.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">All these results suggest that in order to get the gist of source code artifacts automatically, the length of a term-based summary should be in the interval <img src="/img/revistas/cleiej/v15n2/2a0762x.png" alt="[10,20]  " class="math">. For methods, the number of terms in the summary is nearer to the lower bound (<a href="#c10">10</a>), while for packages, this number is nearer to the upper bound (20). This means that automatic summaries are approximate <img src="/img/revistas/cleiej/v15n2/2a0763x.png" alt="25%  " class="math"> longer than the gold standard summaries, which is not an issue if they capture the intent of the code and remain shorter than the artifact they describe. These results were confirmed by the F-score values, which presented acceptable and stable values in the range <img src="/img/revistas/cleiej/v15n2/2a0764x.png" alt="[10,20]  " class="math"> for all kinds of artifacts when the summaries were generated by lead+VSM, lead and VSM techniques. The average F-score values are presented in Fig. <a href="#x1-140023">3</a>.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Additionally, the intrinsic evaluation also showed that LSI based on tf-idf and random are not appropriate techniques to summarize source code artifacts, which confirms the results in <span class="cite">(<a href="#c3">3</a>)</span>. Although in average lead+VSM outperformed lead and VSM, none of these techniques is specially suitable or unsuitable for summarizing an specific kind of artifact. In fact, the performance of these three techniques is similar in all cases.&nbsp;</font></p>      <p>   </p>  <hr class="figure">     ]]></body>
<body><![CDATA[<div class="figure">  <font face="Verdana" size="2">      <br>    </font>        <div class="caption"><font face="Verdana" size="2"><span class="id"> <a name="x1-140023" href="/img/revistas/cleiej/v15n2/2a06f3.jpg">Figure&nbsp;3:</a> </span><span class="content">Average f-score for VSM, LSI, lead+VSM, lead and random summaries. The <img src="/img/revistas/cleiej/v15n2/2a0765x.png" alt="x  " class="math">-axis represents the length of the summary, and the <img src="/img/revistas/cleiej/v15n2/2a0766x.png" alt="y  " class="math">-axis represents average f-score values</span></font></div>  <font face="Verdana" size="2">      <br> &nbsp; </font>     <p>   </p>  </div>  <hr class="endfigure">         <p><font face="Verdana" size="2"><span class="titlemark">4.6   </span> <a id="x1-150004.6"></a>Threats to validity</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">As in any empirical study in software engineering, we cannot generalize the outcomes. Therefore, we consider these results only as useful heuristics to guide the development of automated summarization and documentation tools.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The number of participants is always an issue for this type of experiment and in our case, nine developers is clearly a small group. Moreover, although subjects reported some experience in programming and evolving systems, they cannot be considered as professional developers. We plan to work with other research groups, and perform similar but larger studies involving more experienced subjects in order to gain more confidence in the results.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Equally important, we selected only two methods, two method sequences, two classes and one package from a single system. While we tried to vary their properties, they may not necessarily be the most representative of each type of artifact. Moreover, aTunes system has high quality, self-explanatory identifiers, and a very simple and clear domain. Therefore, we cannot estimate what would be the results for systems with poor identifier naming or a more complex domain.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>   <font face="Verdana" size="2">During the summarization sessions carried out, developers had to study several times the same artifacts and write three different types of summaries for each of them. In consequence, the presence of a learning effect is possible. We did not try to measure or mitigate it.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">5   </span> <a id="x1-160005"></a>Related Work</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">The automatic summarization of natural language text has been widely investigated by researchers and many approaches have been proposed, which are based mostly on Text Retrieval (TR), machine learning, and natural language processing techniques <span class="cite">(<a href="#c1">1</a>)</span>. The summarization of software artifacts is only at the beginning, but there are promising results. For instance, abstracts of bug report discussions, generated using conversation-based classifiers, were proposed as a suitable instrument during bug report triage activities <span class="cite">(<a href="#c7">7</a>)</span><a name="c7."></a>; the summarization of the content of large execution traces was suggested as a tool that can help programmers to understand the main behavioral aspects of a software system <span class="cite"><a href="#c8">(8</a>)<a name="c8."></a></span>.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Regarding automatic summarization of source code, in <span class="cite">(<a href="#c9">9</a>)</span><a name="c9."></a> it was proposed an abbreviated and accurate description of the effect of a software change on the run time behavior of a program, in order to help developers validating software changes and understanding modifications. High level descriptions of software concerns were designed for raising the level of abstraction and improving the productivity of developers, while working on evolution tasks <span class="cite">(<a href="#c10">10</a>)</span><a name="c10."></a>. The text retrieval based approaches for source code summarization, first introduced in <span class="cite">(<a href="#c6">6</a>)</span>, were applied for summarizing whole source code artifacts, with the purpose of aiding developers in comprehension tasks <span class="cite">(<a href="#c3">3</a>)</span>. A form of structural summarization of source code has also been proposed in <span class="cite">(<a href="#c11">11</a>)</span><a name="c11."></a>, which presented two techniques, i.e., the software reflection model and the lexical source model extraction for a lightweight summarization of software. These two techniques are complementary to the approaches we investigated and we envision combining them in the near future.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">By the same token, some TR techniques are used in <span class="cite">(<a href="#c12">12</a>)<a name="c12."></a></span> to cluster source code and relevant terms from each cluster are extracted to form labels. A similar approach is used in <span class="cite">(<a href="#c13">13</a>)<a name="c13."></a></span>, where TR is used to extract the most relevant set of terms to a group of methods returned as the result of a search. These terms are treated as attributes, which are used to cluster the methods. In each case, the labels and attributes can be considered as (partial) summaries.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Another related research thread is on source code tagging and annotations <span class="cite">(<a href="#c14">14</a>)<a name="c14."></a></span>. These mechanisms could support developers to create and represent manual summaries of the code (in addition to comments).&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Although several alternatives have been explored to summarize various types of software artifacts, the evaluation of the generated summaries has been mostly informal. For example, <span class="cite">(<a href="#c15">15</a>)<a name="c15."></a></span> presents an approach to summarize methods by identifying and lexicalizing the most relevant units. The generated summaries in this case were evaluated by asking developers how much accurate, adequate and concise those descriptions were.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">An exception to this informal situation is <span class="cite">(<a href="#c7">7</a>)</span>, where bug reports summaries were evaluated by using intrinsic measures such as precision, recall, F-score and pyramid precision, to assess the informativeness, redundancy, irrelevant content and coherence. Then, these results were compared against scores assigned by human judges to the same features. However, from a practical point of view, this study is considered as text-summarization, and  therefore, its evaluation mode is not really novel.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>   <font face="Verdana" size="2">In that sense, the term-based summaries generated by <span class="cite">(<a href="#c6">6</a>)</span> from source code using information-retrieval techniques were evaluated using the Pyramid method. Also, the descriptions of source code produced in <span class="cite">(<a href="#c3">3</a>)</span> underwent intrinsic-online evaluation for assessing the agreement between developers.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><span class="titlemark">6   </span> <a id="x1-170006"></a>Conclusions and Future Work</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">The presented case study analyzed two kinds of summaries created by Java developers for several source code entities, with the purpose of studying how programmers create descriptions of source code. Besides, we asked developers to provide answers to questions about what they think should be included in a summary.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">When developers create natural language descriptions of source code artifacts, the length is similar for all types of entities. We obtained slightly longer summaries for the case of sequences of calling methods. This result may indicate that this kind of artifact is harder to describe or deserves more detailed explanations.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">On the other hand, the length of a term-based summary is correlated with the length of the artifact it summarizes. This result suggests that term-based summaries (extractive summaries) are inherently less informative than sentence-based summaries, and therefore, they are not enough to fully describe source code artifacts. Such fact is corroborated by the low percentage of words used in sentence-based summaries that correspond to terms selected within term-based summaries.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Consequently, despite textual information is essential, automatic code summarizers cannot exclusively rely on the identification of relevant terms contained in software entities. The precision, recall, and F-score values achieved by some TR-based techniques show that the semantic information by itself is not enough to generate high-quality code summaries. However, the outcomes of these techniques can be considered a good starting point for source code summarization, and they can be improved using structural information and natural language processing tools.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">The experiment also gave us clues about what should be included in a summary. For instance, local variable names can be considered as useful pieces of information for describing all types of entities; names and invoked method names are quite relevant for summarizing methods; invoked method names and variable names are relevant for explaining sequences of calling methods; the name of a class is essential for describing its purpose. The results also suggest that summarization of packages is often problematic. This could indicate that packages cannot be considered as units, and in consequence, a multi-document approach, where a package would be conceived as a group of related documents (i.e., classes), is more appropriate.&nbsp;</font></p>      <p>   <font face="Verdana" size="2">Overall, the results obtained represent valuable information for building and evaluating automatic summarization tools. The gold-standard summaries characterize the main target of our envisioned summarizer, which will consider structural and textual information of artifacts. Since the text retrieval methods studied in this opportunity achieved only acceptable results, we plan to investigate and apply other text retrieval techniques in code summarization, and some multi-document summarization approaches for large artifacts as packages. Moreover, new user studies will be conducted to assess the effect of summaries on several development and maintenance tasks.&nbsp;</font></p>      ]]></body>
<body><![CDATA[<p>    </p>      <p><font face="Verdana" size="2"><a id="x1-180006"></a>Acknowledgment</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p><font face="Verdana" size="2">We express our gratitude to Andrian Marcus and Sonia Haiduc, members of the SEVERE (<a href="http://www.cs.wayne.edu/%7Esevere/" class="url"><span class="cmtt-10">http://www.cs.wayne.edu/~severe/</span></a>) research group at Wayne State University, for their support and advice. Also, a special thanks to the students at Universidad Nacional de Colombia who are participating as Java developers within the user studies related to source code summarization research.&nbsp;</font></p>      <p>    </p>      <p><font face="Verdana" size="2"><a id="x1-190006"></a>References</font></p>   <font face="Verdana" size="2">       <br>  </font>      <p>     </p>      <div class="thebibliography">           <p><font face="Verdana" size="2"><span class="biblabel"><a name="c1"></a>   (<a href="#c1.">1</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>K.&nbsp;S.  Jones,  &ldquo;Automatic  summarising:  The  state  of  the  art,&rdquo;  <span class="cmti-10">Information  Processing  and</span>     <span class="cmti-10">Management: an International Journal</span>, vol.&nbsp;43, no.&nbsp;6, 2007. </font>     </p>            ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><span class="biblabel"><a name="c2"></a>   (<a href="#c2.">2</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>T.&nbsp;A. Corbi, &ldquo;Program understanding: challenge for the 1990&rsquo;s,&rdquo; <span class="cmti-10">IBM Systems Journal</span>, vol.&nbsp;28, pp.     294&ndash;306, June 1989. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c3"></a>   (<a href="#c3.">3</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>S.&nbsp;Haiduc, J.&nbsp;Aponte, L.&nbsp;Moreno, and A.&nbsp;Marcus, &ldquo;On the use of automated text summarization     techniques  for  summarizing  source  code,&rdquo;  in  <span class="cmti-10">WCRE  &rsquo;10:  Proceedings  of  the  2010  17th  Working</span>     <span class="cmti-10">Conference on Reverse Engineering</span>.   Washington, DC, USA: IEEE Computer Society, 2010. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c4"></a>   <a href="#c4.">(4</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>J.&nbsp;Steinberger  and  K.&nbsp;Jeek,  &ldquo;Text  summarization:  An  old  challenge  and  new  approaches,&rdquo;  in     <span class="cmti-10">Foundations of Computational Intelligence</span>, ser. Studies in Computational Intelligence, A.&nbsp;Abraham, A.-E. Hassanien, D.&nbsp;Leon, and V.&nbsp;Sn&aacute;el, Eds. Springer Berlin / Heidelberg, 2009, vol. 206, pp. 127&ndash;149. </font> </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c5"></a>   <a href="#c5.">(5</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>R.&nbsp;Likert, &ldquo;A technique for the measurement of attitudes,&rdquo; <span class="cmti-10">Archives of Psychology</span>, vol.&nbsp;22, no. 140,     pp. 1&ndash;55, 1932. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c6"></a>   (<a href="#c6.">6</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>S.&nbsp;Haiduc,  J.&nbsp;Aponte,  and  A.&nbsp;Marcus,  &ldquo;Supporting  program  comprehension  with  source  code     summarization,&rdquo; in <span class="cmti-10">ICSE &rsquo;10: Proceedings of the 32nd ACM/IEEE International Conference on Software</span>     <span class="cmti-10">Engineering</span>, vol.&nbsp;2.   New York, NY, USA: ACM, 2010, pp. 223&ndash;226. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c7"></a>   (<a href="#c7.">7</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>S.&nbsp;Rastkar, G.&nbsp;C. Murphy, and G.&nbsp;Murray, &ldquo;Summarizing software artifacts: a case study of bug     reports,&rdquo;  in  <span class="cmti-10">ICSE &rsquo;10:  Proceedings  of  the  32nd  ACM/IEEE International  Conference  on  Software</span>     <span class="cmti-10">Engineering</span>.   New York, NY, USA: ACM, 2010, pp. 505&ndash;514. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c8"></a>   (<a href="#c8.">8</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>A.&nbsp;Hamou-Lhadj and T.&nbsp;Lethbridge, &ldquo;Summarizing the content of large traces to facilitate the     understanding of the behaviour of a software system,&rdquo; in <span class="cmti-10">ICPC &rsquo;06: Proceedings of the 14th IEEE</span>     <span class="cmti-10">International Conference on Program Comprehension</span>.  Washington, DC, USA: IEEE Computer Society,     2006. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c9"></a>   (<a href="#c9.">9</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>R.&nbsp;Buse  and  W.&nbsp;R.  Weimer,  &ldquo;Automatically  documenting  program  changes,&rdquo;  in  <span class="cmti-10">ASE  &rsquo;10:</span>     <span class="cmti-10">Proceedings of the IEEE/ACM International Conference on Automated Software Engineering</span>.    New     York, NY, USA: ACM, 2010, pp. 33&ndash;42. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c10"></a>  (<a href="#c10.">10</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>S.&nbsp;Rastkar, &ldquo;Summarizing software concerns,&rdquo; in <span class="cmti-10">ICSE &rsquo;10: Proceedings of the 32nd ACM/IEEE</span>     <span class="cmti-10">International Conference on Software Engineering</span>.   New York, NY, USA: ACM, 2010, pp. 527&ndash;528. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c11"></a>  (<a href="#c11.">11</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>G.&nbsp;C.  Murphy,  &ldquo;Lightweight  structural  summarization  as  an  aid  to  software  evolution,&rdquo;  Ph.D.     dissertation, University of Washington, Washington, DC, USA, 1996. </font>      </p>            ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><span class="biblabel"><a name="c12"></a>  (<a href="#c12.">12</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>A.&nbsp;Kuhn, S.&nbsp;Ducasse, and T.&nbsp;G&icirc;rba, &ldquo;Semantic clustering: Identifying topics in source code,&rdquo; <span class="cmti-10">Information and Software Technology</span>, vol.&nbsp;49, no.&nbsp;3, pp. 230&ndash;243, 2007. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c13"></a>  (<a href="#c13.">13</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>D.&nbsp;Poshyvanyk and A.&nbsp;Marcus, &ldquo;Combining formal concept analysis with information retrieval for     concept location in source code,&rdquo; in <span class="cmti-10">ICPC &rsquo;07: Proceedings of the 15th IEEE International Conference</span>     <span class="cmti-10">on Program Comprehension</span>.   Washington, DC, USA: IEEE Computer Society, 2007, pp. 37&ndash;48. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c14"></a>  (<a href="#c14.">14)</a><span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>M.&nbsp;A. Storey, L.&nbsp;T. Cheng, I.&nbsp;Bull, and P.&nbsp;Rigby, &ldquo;Shared waypoints and social tagging to support     collaboration  in  software  development,&rdquo;  in  <span class="cmti-10">CSCW  &rsquo;06:  Proceedings  of  the  2006  20th  anniversary</span>     <span class="cmti-10">conference on Computer Supported Cooperative Work</span>.   New York, NY, USA: ACM, 2006, pp. 195&ndash;198. </font>     </p>            <p><font face="Verdana" size="2"><span class="biblabel"><a name="c15"></a>  (<a href="#c15.">15</a>)<span class="bibsp">&nbsp;&nbsp;&nbsp;</span></span>G.&nbsp;Sridhara, E.&nbsp;Hill, D.&nbsp;Muppaneni, L.&nbsp;Pollock, and K.&nbsp;Vijay-Shanker, &ldquo;Towards automatically     generating  summary  comments  for  java  methods,&rdquo;  in  <span class="cmti-10">ASE  &rsquo;10:  Proceedings  of  the  IEEE/ACM</span>     <span class="cmti-10">International Conference on Automated Software Engineering</span>.   New York, NY, USA: ACM, 2010, pp.     43&ndash;52. </font> </p>       </div>             ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[K. S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Automatic summarising: The state of the art]]></article-title>
<source><![CDATA[Information Processing and Management: an International Journal]]></source>
<year>2007</year>
<volume>43</volume>
<numero>6</numero>
<issue>6</issue>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Corbi]]></surname>
<given-names><![CDATA[T. A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Program understanding: challenge for the 1990&#8217;s]]></article-title>
<source><![CDATA[IBM Systems Journal]]></source>
<year>June</year>
<month> 1</month>
<day>98</day>
<volume>28</volume>
<page-range>294-306</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Haiduc]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Aponte]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Moreno]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Marcus]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[On the use of automated text summarization techniques for summarizing source code]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ WCRE &#8217;10: Proceedings of the 2010 17th Working Conference on Reverse Engineering]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>Washington DC</conf-loc>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Steinberger]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Jeek]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[&#8220;Text summarization: An old challenge and new approaches,&#8221;]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Abraham]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Hassanien]]></surname>
<given-names><![CDATA[A.-E]]></given-names>
</name>
<name>
<surname><![CDATA[Leon]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Snáel]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
</person-group>
<source><![CDATA[Foundations of Computational Intelligence: ser. Studies in Computational Intelligence]]></source>
<year>2009</year>
<volume>206</volume>
<publisher-loc><![CDATA[Berlin / Heidelberg ]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Likert]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A technique for the measurement of attitudes]]></article-title>
<source><![CDATA[Archives of Psychology]]></source>
<year>1932</year>
<volume>22</volume>
<numero>140</numero>
<issue>140</issue>
<page-range>1-55</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Haiduc]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Aponte]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Marcus]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Supporting program comprehension with source code summarization]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ICSE &#8217;10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rastkar]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Murphy]]></surname>
<given-names><![CDATA[G. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Murray]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Summarizing software artifacts: a case study of bug reports]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ICSE &#8217;10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hamou-Lhadj]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Lethbridge]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Summarizing the content of large traces to facilitate the understanding of the behaviour of a software system]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ICPC &#8217;06: Proceedings of the 14th IEEE International Conference on Program Comprehension]]></conf-name>
<conf-date>2006</conf-date>
<conf-loc>Washington DC</conf-loc>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Buse]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Weimer]]></surname>
<given-names><![CDATA[W. R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Automatically documenting program changes]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ASE &#8217;10: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rastkar]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Summarizing software concerns]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ICSE &#8217;10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering.]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Murphy]]></surname>
<given-names><![CDATA[G. C]]></given-names>
</name>
</person-group>
<source><![CDATA[]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kuhn]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Ducasse]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Gîrba]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Semantic clustering: Identifying topics in source code]]></article-title>
<source><![CDATA[Information and Software Technology]]></source>
<year>2007</year>
<volume>49</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>230-243</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Poshyvanyk]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Marcus]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Combining formal concept analysis with information retrieval for concept location in source code]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ICPC &#8217;07: Proceedings of the 15th IEEE International Conference on Program Comprehension]]></conf-name>
<conf-date>2007</conf-date>
<conf-loc>Washington DC</conf-loc>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Storey]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Cheng]]></surname>
<given-names><![CDATA[L. T]]></given-names>
</name>
<name>
<surname><![CDATA[Bull]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Rigby]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Shared waypoints and social tagging to support collaboration in software development]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ CSCW &#8217;06: Proceedings of the 2006 20th anniversary conference on Computer Supported Cooperative Work]]></conf-name>
<conf-date>2006</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sridhara]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Hill]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Muppaneni]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Pollock]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Vijay-Shanker]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Towards automatically generating summary comments for java methods]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ ASE &#8217;10: Proceedings of the IEEE/ACM International Conference on Automated Software Engineering]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc>New York NY</conf-loc>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
