Advanced Search
CS Search   Google Search
HomeDigital LibrarySite MapStoreHelpContact UsPress RoomShopping Cart

Past Issues >> Table of Contents

NOVEMBER/DECEMBER 2007 (Vol. 24, No. 6) Web Extra
0740-7459/07/$25.00 © 2007 IEEE
Published by the IEEE Computer Society

Web Extra

Method: How We Selected and Analyzed the Studies

(From IEEE Software)
Voice of Evidence
Method: How We Selected and Analyzed the Studies

This material supplements the Nov./Dec. Voice of Evidence column, "Are Two Heads Better than One? On the Effectiveness of Pair Programming."

Tore Dybå, Simula Research Laboratory and Sintef ICT
Erik Arisholm, Dag I.K. Sjøberg, and Jo E. Hannay, Simula Research Laboratory and the University of Oslo
Forrest Shull, Fraunhofer Center for Experimental Software Engineering, Maryland

We followed general procedures for performing systematic reviews,1 which are based largely on standard meta-analytic techniques.

Inclusion and exclusion criteria

We examined all published English-language studies of pair programming in which a comparison was made

  • between isolated pairs and individuals or
  • in a team context.

We excluded studies that didn’t compare PP to an alternative approach.

Data sources and search strategy

We searched the ACM Digital Library, Compendex, IEEE Xplore, and the ISI (Institute for Scientific Information) Web of Science with the following basic search string: “‘pair programming’ OR ‘collaborative programming.’” In addition, we hand-searched all volumes of these thematic conference proceedings:

  • XP (the International Conference on Agile Processes in Software Engineering and Extreme Programming),
  • XP Agile Universe, and
  • the Agile Development Conference.

Study identification and selection

The identification and selection process consisted of three major stages. First, we applied the search terms to the titles, abstracts, and keywords of the articles in the electronic databases and conference proceedings. We didn’t search editorials, prefaces, article summaries, interviews, news items, correspondence, discussions, comments, reader’s letters, or summaries of tutorials, workshops, panels, and poster sessions. We found 214 unique citations.

Next, Jo Hannay and Tore Dybå evaluated the titles and abstracts of those 214 studies for relevance to the review. If they couldn’t tell from the title, abstract, and keywords whether a study conformed to our inclusion criteria, they performed a detailed review. During this second stage, we included all studies that indicated some comparison of PP with an alternative. This resulted in 52 citations that were passed on to the next stage.

Finally, Hannay and Dybå retrieved and reviewed the full text of all 52 citations. We included all studies that compared PP with an alternative in isolation or in a team context. This left 19 articles, four of which we later excluded because they didn’t report enough information to compute standardized effect sizes. So, 15 studies (all experiments) met our inclusion criteria and were included in the review (see table A).

Data extraction and checking

In a spreadsheet, we collected information from the 15 articles, including the

  • study type and duration;
  • type of treatment, system, and tasks;
  • number of groups and the group assignment;
  • type of subjects and their experience with PP;
  • number of pairs and of individuals; and
  • outcome variable, means, standard deviations, counts, percentages, and p-values.

We read each article thoroughly, and three authors (Hannay, Dybå, and Erik Arisholm) extracted and cross-checked the data. Discrepancies were resolved through discussion among all the authors.

Statistical analysis

We used the Comprehensive Meta-Analysis version 2 ( to calculate effect size estimates in terms of Hedges’ g2 for all comparisons in all articles that reported enough descriptive statistics or raw data for such calculations. We summarized the results using fixed-effects meta-analysis and Forest plots.

Table A. The 15 Studies Included in the Meta-Analysis.

Reference Code Study Subjects Total no. of subjects No. of pairs No. of individual programmers Study setting
P98 John Nosek3 Professionals working individually 15 5 5 Had 45 minutes to solve one programming task (a script for checking database consistency)
S00 Laurie Williams et al.4 Students working individually 41 14 13 A six-week course where students had to deliver four programming assignments
S01 Jerzy Nawrocki and Adam Wojciechowski5 Students working individually 15 5 5 Four lab sessions over a winter semester, as part of a university course; wrote four C/C++ programs ranging from 150–400 LOC
S02 Prashant Baheti, Edward Gehringer, and David Stotts6 Students working in teams 98 16 9 Five weeks to complete a curricular OO programming project; teams had distinct projects
P02 Matevz Rostaher and Marjan Hericko7 Professionals working individually 16 6 4 Six small user stories filling one day


Sven Heiberg et al.8

Students working in teams

84, 66*

23, 16*

19, 17*

Four sessions over four weeks, involving two programming tasks to implement a component for a large gaming system

S05a Gerardo Canfora, Aniello Cimitlie, and Corrado Aaron Visaggio9 Students working individually 24 12 24 Two applications each with two tasks (run 1 and run 2)
S05b Matthias Müller10 Students working individually 38 19 23 Two runs of one programming session each on two initial programming tasks (polynomial and shuffle-puzzle) producing about 150 LOC
S05c Jari Vanhanen and Casper Lassenius11 Students working in teams 16 2 2 A nine-week student project in which each subject spent 100 hours (400 hours per team of four); 1,500–4,000 LOC were written
S06a Lech Madeyski12 Students working individually 188 28 31, 35* Eight laboratory sessions involving one initial programming task in a finance accounting system (27 user stories)
S06b Müller13 Students working individually 18, 16* 4, 5* 6 One session involving initial design and programming tasks for an elevator system
S06c Monvorath Phongpaibul and Barry Boehm14 Students working in teams 95 7 7 Had 12 weeks to complete four phases of development and inspection
S06d Shaochun Xu and Vaclav Rajlich15 Students working individually 12 4 4 Two sessions with pairs and one session with individuals; one initial programming task produced 200–300 LOC
P07b Canfora et al.16 Professionals working individually 18 5, 4* 8, 10* Study session and two runs (totalling 390 minutes) involving four maintenance tasks (grouped in two assignments) to modify design documents (use case and class diagrams)
P07a Erik Arisholm et al.17 Professionals working individually 295 98 99

10 experimental sessions with individuals over three months; 17 sessions with pairs over five months (each session lasted one day and included different subjects); Modified two systems of 200–300 Java LOC each

*Multiple tests were included from this source, with different numbers of subjects.


  1. B.A. Kitchenham, Procedures for Performing Systematic Reviews, joint tech. report TR/SE-0401, Computer Science Dept., Keele Univ., 0400011T.1, Nat’l ICT Australia, 2004.
  2. M.W. Lipsey and D.B. Wilson, Practical Meta-Analysis, Sage Publications, 2000.
  3. J.T. Nosek, “The Case for Collaborative Programming,” Comm. ACM, vol. 41, no. 3, 1998, pp. 105–108.
  4. L. Williams, R.R. Kessler, W. Cunningham, and R. Jeffries, “ Strengthening the Case for Pair Programming,” IEEE Software, vol. 17, no. 4, 2000, pp. 19–25.
  5. J. Nawrocki and A. Wojciechowski, “Experimental Evaluation of Pair Programming,” Proc. European Software Control and Metrics Conference (ESCOM 01), 2001, pp. 269–276.
  6. P. Baheti, E. Gehringer, and D. Stotts, “Exploring the Efficacy of Distributed Pair Programming,” Extreme Programming and Agile Methods—XP/Agile Universe 2002, LNCS 2418, Springer, 2002, pp. 208–220.
  7. M. Rostaher and M. Hericko, “Tracking Test First Pair Programming—An Experiment,” Proc. XP/Agile Universe 2002, LNCS 2418, Springer, 2002, pp. 174–184.
  8. S. Heiberg, U. Puus, P. Salumaa, and A. Seeba, “Pair-Programming Effect on Developers Productivity,” Extreme Programming and Agile Processes in Software Eng.—Proc 4th Int’l Conf. XP 2003, LNCS 2675, Springer, 2003, pp. 215–224.
  9. G. Canfora, A. Cimitlie, and C.A. Visaggio, “Empirical Study on the Productivity of the Pair Programming,” Extreme Programming and Agile Processes in Software Eng.—Proc 6th Int’l Conf. XP 2005, LNCS 3556, Springer, 2005, pp. 92–99.
  10. M.M. Müller, “Two Controlled Experiments Concerning the Comparison of Pair Programming to Peer Review,” J. Systems and Software, vol. 78, no. 2, 2005, pp. 169–179.
  11. J. Vanhanen and C. Lassenius, “Effects of Pair Programming at the Development Team Level: An Experiment,” Proc. Int’l Symp. Empirical Software Eng. (ISESE 05), IEEE CS Press, 2005, pp. 336–345.
  12. L. Madeyski, “The Impact of Pair Programming and Test-Driven Development on Package Dependencies in Object-Oriented Design—An Experiment,” Product-Focused Software Process Improvement—Proc. 7th Int’l Conf. (Profes 06), LNCS 4034, Springer, 2006, pp. 278–289.
  13. M.M. Müller, “A Preliminary Study on the Impact of a Pair Design Phase on Pair Programming and Solo Programming,” Information and Software Technology, vol. 48, no. 5, 2006, pp. 335–344.
  14. M. Phongpaibul and B. Boehm, “An Empirical Comparison between Pair Development and Software Inspection in Thailand,” Proc. Int’l Symp. Empirical Software Eng. (ISESE 06), ACM Press, 2006, pp. 85–94.
  15. S. Xu and V. Rajlich, “Empirical Validation of Test-Driven Pair Programming in Game Development,” Proc. Int’l Conf. Computer and Information Science and Int’l Workshop Component-Based Software Eng., Software Architecture and Reuse (ICIS-COMSAR 06), IEEE CS Press, 2006, pp. 500–505.
  16. G. Canfora, A. Cimitile, F. Garcia, M. Piattini, and C.A. Visaggio, “Evaluating Performances of Pair Designing in Industry,” J. Systems and Software, vol. 80, no. 8, 2007, pp. 1317–1327.
  17. E. Arisholm, H. Gallis, T. Dybå, and D.I.K. Sjøberg, “Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise,” IEEE Trans. Software Eng., vol. 33, no. 2, 2007, pp. 65–86.

Correction: The article as published in the Nov/Dec 2007 issue of IEEE Software contained an error. On page 12, 3rd column, last paragraph, the text reads: "The one exception is study S06b" and cites reference 9. The text should read: "The one exception is study S06c," and reference 9 should refer to M. Phongpaibul and B. Boehm, “An Empirical Comparison between Pair Development and Software Inspection in Thailand,” Proc. Int’l Symp. Empirical Software Eng. (ISESE’06), IEEE CS Press, 2006, pp. 85–94. We regret the error.

Cite this article:
Tore Dybå, Erik Arisholm, Dag I.K. Sjøberg, Jo E. Hannay, and Forrest Shull, "Method: How We Selected and Analyzed the Studies," IEEE Software, vol. 24, no. 6.