loading...
The Role of Deliberate Artificial Design Elements in Software Engineering Experiments
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2008.13March/April 2008 (vol. 34 no. 2) pp. 242-259
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Increased realism in software engineering experiments is often promoted as an important means to increase generalizability and industrial relevance. In this context, artificiality, e.g., the use of constructed tasks in place of realistic tasks, is seen as a threat. In this article, we examine the opposite view, that deliberately introduced artificial design elements may increase knowledge gain and enhance both generalizability and relevance. In the first part of the article, we identify and evaluate arguments and examples in favor of, and against, deliberately introducing artificiality into software engineering experiments. In the second part of the article, we summarize a content analysis of articles reporting software engineering experiments published over the ten-year period 1993-2002. The analysis reveals a striving for realism and external validity, but little awareness of for what and when, various degrees of artificiality and realism are appropriate. We conclude that an increased awareness and deliberation in these respects is essential. However, arguments in favor of artificial design elements should not be used to justify studies that are badly designed or that have research questions of low relevance.

[1] 242 T.K. Abdel-Hamid, K. Sengupta, and D. Ronan, “Software Project Control: An Experimental Investigation of Judgment with Fallible Information,” IEEE Trans. Software Eng., vol. 19, no. 6, pp. 603-612, June 1993.
[2] ACM Computing Classification System, http://www.acm.orgclass, 2004.
[3] R. Agarwal, “Cognitive Fit in Requirements Modeling: A Study of Object and Process Methodologies,” J. Management Information Systems, vol. 13, no. 2, pp. 137-162, 1996.
[4] R. Agarwal, P. De, and A.P. Sinha, “Comprehending Object and Process Models: An Empirical Study,” IEEE Trans. Software Eng., vol. 25, no. 4, pp. 541-556, July/Aug. 1999.
[5] E. Aronson, T.D. Wilson, and R.M. Akert, Social Psychology: The Heart and the Mind. HarperCollins, 1994.
[6] E. Aronson, T.D. Wilson, and M.B. Brewer, “Experimentation in Social Psychology,” The Handbook of Social Psychology, fourth ed., D.T. Gilbert, S.T. Fiske, and G. Lindzey, eds., chapter 3, vol. 1, pp.99-142, McGraw-Hill, 1998.
[7] R. Axelrod and M.D. Cohen, Harnessing Complexity: Organizational Implications of a Scientific Frontier. Basic Books, 2001.
[8] S.B. Bacharach, “Organizational Theories: Some Criteria for Evaluation,” Academy of Management Rev., vol. 14, no. 4, pp. 496-515, 1989.
[9] Y. Bar-Yam, Dynamics of Complex Systems (Studies in Nonlinearity). Westview Press, 2003.
[10] V.R. Basili, Empirical Software Eng., editorial, vol. 1, no. 2, pp. 105-108, Jan. 1996.
[11] V.R. Basili, S. Green, O. Laitenberger, F. Lanubile, F. Shull, S. Sorumgard, and M.V. Zelkowitz, “The Empirical Investigation of Perspective-Based Reading,” Empirical Software Eng., vol. 1, no. 2, pp. 133-164, Jan. 1996.
[12] V.R. Basili, F. Shull, and F. Lanubile, “Building Knowledge through Families of Experiments,” IEEE Trans. Software Eng., vol. 25, no. 4, pp. 456-473, July/Aug. 1999.
[13] A.C. Benander, B. Benander, and H. Pu, “Recursion versus Iteration: An Empirical Study of Comprehension,” J. Systems and Software, vol. 32, no. 1, pp. 73-82, 1996.
[14] A. Bianchi, F. Lanubile, and G. Visaggio, “A Controlled Experiment to Assess the Effectiveness of Inspection Meetings,” Proc. Seventh IEEE Int'l Symp. Software Metrics, pp. 42-50, 2001.
[15] S. Biffl, B. Freimut, and O. Laitenberger, “Investigating the Cost-Effectiveness of Reinspections in Software Development,” Proc. 23rd Int'l Conf. Software Eng., pp. 155-164, 2001.
[16] T. Boswell and C. Brown, “The Scope of General Theory,” Sociological Methods and Research, vol. 28, no. 2, pp. 154-185, 1999.
[17] L.C. Briand, C. Bunse, and J.W. Daly, “A Controlled Experiment for Evaluating Quality Guidelines on the Maintainability of Object-Oriented Designs,” IEEE Trans. Software Eng., vol. 27, no. 6, pp. 513-530, June 2001.
[18] L.C. Briand, C. Bunse, J.W. Daly, and C. Differding, “Technical Communication: An Experimental Comparison of the Maintainability of Object-Oriented and Structured Design Documents,” Empirical Software Eng., vol. 2, no. 3, pp. 291-312, Sept. 1997.
[19] J.P. Campbell, “Labs, Fields, and Straw Issues,” Generalizing from Laboratory to Field Settings, E.A. Locke, ed., pp. 269-279, Lexington Books, 1986.
[20] J.M. Carlsmith, P.C. Ellsworth, and E. Aronson, Methods of Research in Social Psychology. Addison-Wesley, 1976.
[21] M. Cartwright, “An Empirical View of Inheritance,” Information and Software Technology, vol. 40, no. 14, pp. 795-799, Dec. 1998.
[22] R.W. Cooksey, Judgment Analysis: Theory, Methods and Applications. Academic Press, 1996.
[23] R.W. Cooksey, “The Methodology of Social Judgement Theory,” Thinking and Reasoning, vol. 2, no. 2/3, pp. 141-173, 1996.
[24] L.J. Cronbach, Designing Evaluations of Social and Educational Programs. Josey-Bass, 1982.
[25] L.J. Cronbach, S.R. Ambron, S.M. Dornbusch, R.D. Hess, R.C. Hornik, D.C. Phillips, D.F. Walker, and S.S. Weiner, Toward Reform of Program Evaluation. Josey-Bass, 1980.
[26] J. Daly, A. Brooks, J. Miller, M. Roper, and M. Wood, “Evaluating Inheritance Depth on the Maintainability of Object-Oriented Software,” Empirical Software Eng., vol. 1, no. 2, pp. 109-132, Jan. 1996.
[27] D.D. Davis and C.A. Holt, Experimental Economics. Princeton Univ. Press, 1993.
[28] A. Drappa and J. Ludewig, “Simulation in Software Engineering Training,” Proc. 22nd Int'l Conf. Software Eng., pp. 199-208, 2000.
[29] H.L. Dreyfus and S.E. Dreyfus, Mind over Machine. The Free Press, 1988.
[30] A. Endres and D. Rombach, A Handbook of Software and Systems Engineering: Empirical Observations, Laws and Theories. Fraunhofer IESE Series on Software Eng., Pearson Education, 2003.
[31] R.P. Feynman, QED: The Strange Theory of Light and Matter. Penguin Science, 1985.
[32] K. Finney, K. Rennolls, and A. Fedorec, “Measuring the Comprehensibility of Z Specifications,” J. Systems and Software, vol. 42, no. 1, pp. 3-15, July 1998.
[33] R.A. Fisher, The Design of Experiments. Oliver and Boyd, 1935.
[34] Group Processes, M. Foschi and E.J. Lawler, eds. Nelson-Hall, 1994.
[35] R.L. Glass, I. Vessey, and V. Ramesh, “Research in Software Engineering: An Analysis of the Literature,” Information and Software Technology, vol. 44, no. 8, pp. 491-506, 2002.
[36] U.H. Graneheim and B. Lundman, “The Challenge of Qualitative Content Analysis,” Nurse Education Today, vol. 24, pp. 105-112, 2004.
[37] T.M. Gruschke and M. Jørgensen, “Assessing Uncertainty of Software Development Effort Estimates: Learning from Outcome Feedback,” Proc. 11th IEEE Int'l Symp. Software Metrics, p. 4, 2005.
[38] F. Guala, “Economics in the Lab: Completeness vs. Testability,” J.Economic Methodology, vol. 12, no. 2, pp. 185-196, 2005.
[39] K.R. Hammond, “Upon Reflection,” Thinking and Reasoning, vol. 2, nos. 2/3, pp. 239-248, 1996.
[40] K.R. Hammond, T.R. Brehmer, and D.O. Steinmann, “Social Judgement Theory,” Human Judgment and Decision Processes, pp.271-312, 1975.
[41] K.R. Hammond and T.R. Stewart, The Essential Brunswik. Oxford Univ. Press, 2001.
[42] J.E. Hannay, D.I.K. Sjøberg, and T. Dybå, “A Systematic Review of Theory Use in Software Engineering Experiments,” IEEE Trans. Software Eng., vol. 33, no. 2, pp. 87-107, Feb. 2007.
[43] W.L. Hays, Statistics, fifth ed. Wadsworth Publishing, 1994.
[44] S.M. Henry and K. Todd Stevens, “Using Belbin's Leadership Role to Improve Team Effectiveness: An Empirical Investigation,” J.Systems and Software, vol. 44, no. 3, pp. 241-250, Jan. 1999.
[45] J.D. Herbsleb and A. Mockus, “Formulation and Preliminary Test of an Empirical Theory of Coordination in Software Engineering,” Proc. Fourth Joint European Software Eng. Conf./ACM SIGSOFT Symp. Foundations of Software Eng., pp. 112-121, 2003.
[46] R. Hogarth, “Beyond Discrete Biases: Functional and Dysfunctional Aspects of Judgmental Heuristics,” Psychological Bull., vol. 90, no. 2, pp. 197-217, 1981.
[47] R.M. Hogarth, Educating Intuition. Univ. of Chicago Press, 2001.
[48] R.M. Hogarth, “The Challenge of Representative Design in Psychology and Economics,” J. Economic Methodology, vol. 12, no. 2, pp. 253-263, 2005.
[49] R.W. Holt, D.A. Boehm-Davis, and A.C. Schultz, “Mental Representations of Programs for Student and Professional Programmers,” Proc. Second Workshop Empirical Studies of Programmers, pp. 33-46, 1987.
[50] M. Höst, B. Regnell, and C. Wohlin, “Using Students as Subjects: A Comparative Study of Students and Professionals in Lead-Time Impact Assessment,” Empirical Software Eng., vol. 5, no. 3, pp. 201-214, Nov. 2000.
[51] M. Höst, C. Wohlin, and T. Thelin, “Experimental Context Classification,” Proc. 27th Int'l Conf. Software Eng., 2005.
[52] F. Houdek, “External Experiments—A Workable Paradigm for Collaboration between Industry and Academia,” Lecture Notes on Empirical Software Eng., N. Juristo and A.M. Moreno, eds., vol. 12, chapter 4, pp. 133-166, World Scientific, 2003.
[53] G.S. Howard, T. Bodnovich, T. Janicki, J. Liegle, S. Klein, P. Albert, and D. Cannon, “The Efficacy of Matching Information Systems Development Methodologies with Application Characteristics: An Empirical Study,” J. Systems and Software, vol. 45, no. 3, pp. 177-195, Mar. 1999.
[54] IEEE Keyword Taxonomy, http://www.computer.org/mc/key wordssoftware.htm , 2004.
[55] D.R. Ilgen, “Laboratory Research: A Question of When, Not If,” Generalizing from Laboratory to Field Settings, E.A. Locke, ed., pp.257-267, Lexington Books, 1986.
[56] N. Juristo and A.M. Moreno, Basics of Software Engineering Experimentation. Kluwer Academic, 2003.
[57] P. Juslin, “Representative Design: Cognitive Science from a Brunswikian Perspective,” The Essential Brunswik, K.R. Hammond and T.R. Stewart, eds. Oxford Univ. Press, pp. 404-408, 2001.
[58] D. Kahneman, J.L. Knetsch, and R.H. Thaler, “The Endowment Effect, Loss Aversion, and Status Quo Bias: Anomalies,” J.Economic Perspectives, vol. 5, no. 1, pp. 193-206, 1991.
[59] M. Keil, L. Wallace, D. Turk, G. Dixon-Randall, and U. Nulden, “An Investigation of Risk Perception and Risk Propensity on the Decision to Continue a Software Development Project,” J. Systems and Software, vol. 53, no. 2, pp. 145-157, Aug. 2000.
[60] B.A. Kitchenham, “Procedures for Performing Systematic Reviews,” Keele Univ. Technical Report TR/SE-0401/NICTA Technical Report 0400011T.1, 2004.
[61] B.A. Kitchenham, S.L. Pfleeger, L.M. Pickard, P.W. Jones, D.C. Hoaglin, K. El Emam, and J. Rosenberg, “Preliminary Guidelines for Empirical Research in Software Engineering,” IEEE Trans. Software Eng., vol. 28, no. 8, pp. 721-734, Aug. 2002.
[62] S. Kracauer, “The Challenge of Qualitative Content Analysis,” The Public Opinion Quarterly, special issue on int'l comm. research, vol.16, no. 4, pp. 631-642, Winter 1952.
[63] K. Krippendorff, Content Analysis: An Introduction to Its Methodology, second ed. Sage, 2004.
[64] O. Laitenberger and J.M. DeBaud, “Perspective-Based Reading of Code Documents at Robert Bosch GMBH,” Information and Software Technology, vol. 39, no. 11, pp. 781-791, Oct. 1997.
[65] O. Laitenberger, K. El Emam, and T.G. Harbich, “An Internally Replicated Quasi-Experimental Comparison of Checklist and Perspective Based Reading of Code Documents,” IEEE Trans. Software Eng., vol. 27, no. 5, pp. 387-421, May 2001.
[66] O. Laitenberger and H.D. Rombach, “(Quasi-)Experimental Studies in Industrial Settings,” Lecture Notes on Empirical Software Eng., N. Juristo and A.M. Moreno, eds., vol. 12, chapter 5, pp. 167-227, World Scientific, 2003.
[67] K.B. Lloyd and D.J. Jankowski, “A Cognitive Information Processing and Information Theory Approach to Diagram Clarity: A Synthesis and Experimental Investigation,” J. Systems and Software, vol. 45, no. 3, pp. 203-214, Mar. 1999.
[68] E.A. Locke, “Generalizing from Laboratory to Field: Ecological Validity or Abstraction from Essential Elements,” Generalizing from Laboratory to Field Settings, E.A. Locke, ed., pp. 3-9, Lexington Books, 1986.
[69] Generalizing from Laboratory to Field Settings, E.A. Locke, ed. Lexington Books, 1986.
[70] J.W. Lucas, “Theory-Testing, Generalization, and the Problem of External Validity,” Sociological Theory, vol. 21, no. 3, pp. 236-253, 2003.
[71] J.G. Lynch Jr., “Theory and External Validity,” J. Academy of Marketing Science, pp. 367-376, 1999.
[72] J.L. Mackie, “Causes and Conditions,” Causation, Oxford Readings in Philosophy, E. Sosa and M. Tooley, eds., pp. 33-55, Oxford Univ. Press, 1993.
[73] B. Markovsky, “The Structure of Theories,” Group Processes, M.Foschi and E.J. Lawler, eds., pp. 3-24, Nelson-Hall, 1994.
[74] R.A. Maxion and R.T. Olszewski, “Eliminating Exception Handling Errors with Dependability Cases: A Comparative Empirical Study,” IEEE Trans. Software Eng., vol. 26, no. 9, pp. 888-906, Sept. 2000.
[75] P. Mayring, “Qualitative Content Analysis,” Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, vol. 1, no. 2,http://www.qualitative-research.net/fqs-texte/ 2-002-00may ring-e.htm, June 2000.
[76] J. Miller, “Replicating Software Engineering Experiments: A Poisoned Chalice or the Holy Grail,” Information and Software Technology, vol. 47, pp. 233-244, 2005.
[77] D.G. Mook, “In Defense of External Invalidity,” Am. Psychologist, vol. 38, pp. 379-387, 1983.
[78] G.A. Moore, Crossing the Chasm, revised ed. Harper Business, 2002.
[79] A.N. Nash, J.P. Muczyk, and F.L. Vettori, “The Relative Practical Effectiveness of Programmed Instruction,” Personnel Psychology, vol. 24, pp. 397-410, 1971.
[80] M.C. Ohlsson, C. Wohlin, and B. Regnell, “A Project Effort Estimation Study,” Information and Software Technology, vol. 40, nos. 11/12, pp. 831-839, Dec. 1998.
[81] A.A. Porter, H. Siy, A. Mockus, and L. Votta, “Understanding the Sources of Variation in Software Inspections,” ACM Trans. Software Eng. Methodology, vol. 7, no. 1, pp. 41-79, 1998.
[82] A.A. Porter and L. Votta, “Comparing Detection Methods for Software Requirements Inspections: A Replication Using Professional Subjects,” Empirical Software Eng., vol. 3, no. 4, pp. 355-379, Dec. 1998.
[83] A.A. Porter, L.G. Votta, and V.R. Basili Jr., “Comparing Detection Methods for Software Requirements Inspections: A Replicated Experiment,” IEEE Trans. Software Eng., vol. 21, no. 6, pp. 563-575, June 1995.
[84] S. Ramanujan, R.W. Scamell, and J.R. Shah, “An Experimental Investigation of the Impact of Individual, Program, and Organizational Characteristics on Software Maintenance Effort,” J. Systems and Software, vol. 54, no. 2, pp. 137-157, Oct. 2000.
[85] C. Robson, Real World Research, second ed. Blackwell Publishing, 2002.
[86] E.M. Rogers, Diffusion of Innovations, fifth ed. Free Press, 2003.
[87] M. Roper, M. Wood, and J. Miller, “An Empirical Evaluation of Defect Detection Technique,” Information and Software Technology, vol. 39, no. 11, pp. 763-775, Oct. 1997.
[88] A. Rosenberg, Philosophy of Science: A Contemporary Introduction. Routledge, 2001.
[89] K.J. Rothermel, C.R. Cook, M.M. Burnett, J. Schonfeld, T.R.G. Green, and G. Rothermel, “WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation,” Proc. 22nd Int'l Conf. Software Eng., pp. 230-239, 2000.
[90] W.C. Salmon, “Four Decades of Scientific Explanation,” Scientific Explanation XIII, Minnesota Studies in the Philosophy of Science, P. Kitcher and W.C. Salmon, eds. pp. 3-219, Minnesota Press, 1989.
[91] W.R. Shadish, T.D. Cook, and D.T. Campbell, Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, 2002.
[92] H.A. Simon, The Sciences of the Artificial, third ed. MIT Press, 1996.
[93] D.I.K. Sjøberg, B. Anda, E. Arisholm, T. Dybå, M. Jørgensen, A. Karahasanović, E. Koren, and M. Vokáč, “Conducting Realistic Experiments in Software Engineering,” Proc. 18th Int'l Symp. Empirical Software Eng., pp. 17-26, Oct. 2002.
[94] D.I.K. Sjøberg, T. Dybå, B.C.D. Anda, and J.E. Hannay, “Building Theories in Software Engineering,” Advanced Topics in Empirical Software Eng., F. Shull, J. Singer, and D.I.K. Sjøberg, eds. Springer-Verlag, 2008.
[95] D.I.K. Sjøberg, T. Dybå, and M. Jørgensen, “The Future of Empirical Methods in Software Engineering Research,” Proc. Conf. Future of Software Eng., pp. 358-378, 2007.
[96] D.I.K. Sjøberg, J.E. Hannay, O. Hansen, V.B. Kampenes, A. Karahasanović, N.K. Liborg, and A.C. Rekdal, “A Survey of Controlled Experiments in Software Engineering,” IEEE Trans. Software Eng., vol. 31, no. 9, pp. 733-753, Sept. 2005.
[97] R. Sugden, “Experiment, Theory, World: A Symposium on the Role of Experiments in Economics,” J. Economic Methodology, vol. 12, no. 2, pp. 177-184, 2005.
[98] R. Sugden, “Experiments as Exhibits and Experiments as Tests,” J.Economic Methodology, vol. 12, no. 2, pp. 291-302, 2005.
[99] W.F. Tichy, “Should Computer Scientist Experiment More? 16 Excuses to Avoid Experimentation,” Computer, vol. 31, no. 5, pp.32-40, May 1998.
[100] A. Tversky and D. Kahneman, “Judgement under Uncertainty: Heuristics and Biases,” Science, vol. 185, no. 27, pp. 1124-1131, Sept. 1974.
[101] B. Van Fraassen, The Scientific Image. Oxford Univ. Press, 1980.
[102] I. Vessey and S.A. Conger, “Requirements Specification: Learning Object, Process, and Data Methodologies,” Comm. ACM, vol. 37, no. 5, pp. 102-113, 1994.
[103] I. Vessey and D. Galletta, “Cognitive Fit: An Empirical Study of Information Acquisition,” Information Systems Research, vol. 2, pp.63-84, Mar. 1991.
[104] R. Vinter, M. Loomes, and D. Kornbrot, “Applying Software Metrics to Formal Specifications: A Cognitive Approach,” Proc. Fifth IEEE Int'l Symp. Software Metrics, pp. 216-223, 1998.
[105] S. Vosniadou and A. Ortony, “Similarity and Analogical Reasoning: A Synthesis,” Similarity and Analogical Reasoning, S. Vosniadou and A. Ortony, eds., pp. 1-17, Cambridge Univ. Press, 1989.
[106] W.S. Waller and M.F. Zimbelman, “A Cognitive Footprint in Archival Data: Generalizing the Dilution Effect from Laboratory to Field Settings,” Organizational Behavior and Decision Processes, vol. 91, pp. 254-268, 2003.
[107] M. Webster Jr., “Experimental Methods,” Group Processes, M.Foschi and E.J. Lawler, eds., pp. 43-69, Nelson-Hall, 1994.
[108] D.A. Whetten, “What Constitutes a Theoretical Contribution?” Academy of Management Rev., vol. 14, no. 4, pp. 490-495, 1989.
[109] J. Whiteside, S. Jones, P.S. Levy, and D. Wixon, “User Performance with Command, Menu, and Iconic Interfaces,” Proc. ACM Conf. Human Factors in Computing Systems, pp. 185-191, 1985.
[110] R.K. Yin, “Case Study Research: Design and Methods,” Applied Social Research Methods Series, third ed., vol. 5, Sage Publications, 2003.

Index Terms:
Software Engineering, Surveys of historical development of one particular area
Citation:
Jo Hannay, Magne Jørgensen, "The Role of Deliberate Artificial Design Elements in Software Engineering Experiments," IEEE Transactions on Software Engineering, vol. 34, no. 2, pp. 242-259, Mar./Apr. 2008, doi:10.1109/TSE.2008.13
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions