Contents lists available atScienceDirect
Science of Computer Programming
www.elsevier.com/locate/scico
Technical Debt tracking: Current state of practice
A survey and multiple case study in 15 large organizations
Antonio Martini
a,b,∗, Terese Besker
c, Jan Bosch
caCATechnologiesStrategic,ResearchTeamBarcelona,Spain
bUniversityofOslo,ProgrammingandSoftwareEngineering,Oslo,Norway
cComputerScienceandEngineering,ChalmersUniversityofTechnology,Göteborg,Sweden
a rt i c l e i n f o a b s t ra c t
Articlehistory:
Received5November2017
Receivedinrevisedform20March2018 Accepted25March2018
Availableonline29March2018
Keywords:
TechnicalDebt Changemanagement Softwareprocessimprovement Survey
Multiplecasestudy
Largesoftwarecompaniesneedtosupportcontinuousandfastdeliveryofcustomervalue bothin the short and long term. However, this can be hinderedif both the evolution andmaintenanceofexistingsystemsarehamperedbyTechnicalDebt. Althoughalotof theoreticalworkonTechnicalDebthasbeenproducedrecently,itspracticalmanagement lacks empirical studies. In this paper, we investigate the state of practice in several companiestounderstandwhatthe costofmanagingTDis,whattoolsare usedtotrack TD, and howatracking process is introduced inpractice. We combined two phases:a surveyinvolving226 respondents from15 organizations and an in-depth multiple case studyin threeorganizations including 13 interviews and 79 Technical Debtissues. We selectedtheorganizationswhereTechnicalDebtwasbettertrackedinordertodistillbest practices.We foundthatthedevelopmenttimededicatedtomanagingTechnicalDebtis substantial(anaverageof25%oftheoveralldevelopment),butmostlynotsystematic:only afewparticipants(26%)useatool,andonly7.2%methodicallytrackTechnicalDebt.We foundthatthemostusedandeffectivetoolsarecurrentlybacklogsandstaticanalyzers.By studyingtheapproachesinthecompaniesparticipatinginthecasestudy,wereporthow companiesstarttracking TechnicalDebtandwhat theinitialbenefitsandchallengesare.
Finally,weproposeaStrategicAdoptionModelfortheintroductionoftrackingTechnical Debtinsoftwareorganizations.
©2018TheAuthors.PublishedbyElsevierB.V.Thisisanopenaccessarticleunderthe CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Largesoftware companiesneed tosupport continuous andfastdelivery ofcustomer value both intheshortandlong terms. However, this can be hinderedif both the evolution and maintenance of thesystems are hampered by Technical Debt.
Technical Debt (TD) hasbeenstudied recentlyinthe softwareengineeringliterature [1–4]. TD iscomposed ofa debt, which is a sub-optimal technicalsolution that leads toshort-term benefitsas well asto the future paymentof interest, whichistheextracostduetothepresenceofTD(forexample,slowfeaturedevelopment orlowquality)[5].Theprincipal isregardedasthecostofrefactoringTD.AlthoughaccumulatingTechnicalDebtmightproveusefulinsomecases,inothers, theinterestmightlargelysurpasstheshort-termgain,forexample,bycausingdevelopmentcrisesinthelongterm[6].
*
Correspondingauthor.E-mailaddresses:antonio.martini@ifi.uio.no(A. Martini),besker@chalmers.se(T. Besker),jan.bosch@chalmers.se(J. Bosch).
https://doi.org/10.1016/j.scico.2018.03.007
0167-6423/©2018TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Inthispaper,wethereforeaimataddressingthefollowingRQs:
RQ1: HowmuchofthesoftwaredevelopmenttimeisestimatedtobeemployedinmanagingTD?
ItisalsoimportanttounderstandhowaTDtrackingprocessisintroducedandimplementedinlargesoftwarecompanies:
RQ2: TowhatextentaresoftwarepractitionersfamiliarwiththetermTechnicalDebt?
RQ3: TowhatextentaresoftwarepractitionersawareoftheTDpresentintheirsystem?
RQ4: TowhatextentdosoftwareorganizationstrackTD?
RQ5: IsthereadifferencebetweenindividualandcollectivemanagementofTD?
RQ6: DoesthebackgroundoftherespondentsinfluencethewayinwhichTDismanaged?
RQ7: WhattoolsareusedtotrackTD?
RQ8: HowdosoftwareorganizationsintroduceaTDtrackingprocess?
RQ9: WhataretheinitialbenefitsandchallengeswhenlargeorganizationsstarttrackingTD?
Toshedlight onthesequestions,wehaveconductedasurvey in15organizationswith226participants,andwe have carriedoutamultiplecasestudyinthreecompaniesthathavestartedtrackingTD:Inthiscontext,wehaveinterviewed13 practitionersresponsiblefortrackingTDandanalyzed79TDitemsfromapoolof597improvements.Ourfindingsinclude thefollowingcontributions:
1. ThecostofmanagingTDinlargesoftwareorganizationsissubstantial,anditisestimatedtobe,onaverage,25%ofthe wholedevelopmenttime.
2. We list the tools that are currently used to track TD, andwe provide a first assessment of which ones create less managementoverhead.
3. Wereport thestateofpracticerelatedtotheintroductionofaTDmanagement processin15Scandinavianorganiza- tions.
4. Wereport thelessonslearnedfromthreecompanies thathavestartedtrackingTechnicalDebt: theirstarting process, theperceivedbenefits,andthechallenges.
5. WeproposeaStrategicAdoptionModelforTrackingTechnicalDebt(SAMTTD),aimedathelpingcompaniesassesstheir TechnicalDebtmanagementprocessandmakedecisionsonitsimprovement.Themodelalsodefinesthenextresearch challengestobeaddressedintheoryandtobeevaluatedinpractice.
This paper adds new andmore in-depth results to the findings reported in a previous paper [10]. In particular, we addressnewresearchquestions(RQ2,RQ3,RQ5,RQ6,RQ7),whileweaddnewinsightsrelatedtotherelationshipbetween RQ4andRQ7 (or else,we studyhow thepractitioners’ perception oftracking Technical Debtis relatedtotheir usage of tools).
The remainder of the paperreports our methodologyin section 2, the resultsin section 3, andthen we discussthe resultsinsection4,concludinginsection5.
2. Methodology
For the execution of this study, we aimed at combining different sources of data (source triangulation) and differ- ent methodologies(methodology triangulation) to obtain reliable results [11]. Tofulfill these triangulationstrategies, we surveyed226participants.Thedifferentsourcesincluded15largeorganizationsanddifferentroles,that is,developers,ar- chitects,andmanagers.Tocomplementsuchquantitativeinvestigation,wefollowedupwithaqualitative,in-depthmultiple casestudyatthreeofthecompaniesinvolvedinthesurveyandthathavestartedtrackingTD.Here,weconductedinterviews with13employees,andwe analyzeddocuments including79TDissuesout ofapool of597improvementspresentatthe companies.
2.1. Survey
In this study, we have involved 15 software organizations belonging to eight distinct large software companies. We consider a largesoftwarecompanyan organization withmore than250 employees.As shownin the descriptive statistics
Table 1
KindsofTechnicalDebtrecognizedin[3,9,10].
Survey entries Source and literature term
Lack or low quality of testing Test Debt [3]
Low code quality Source Code Debt [3]
Lack or low quality of requirement Requirement Debt [3]
Lack or low quality of documentation Documentation Debt [3]
Dependency violations Architecture Debt [3,12]
Complex architectural design Architecture Debt [3,12]
Too many different patterns and policies Architecture Debt [3,12]
Dependencies on external resources/software Architecture Debt [3,12]
Lack of reusability in design Architecture Debt [3,12]
Uneasy/Tensed social interactions between different stakeholders Social Debt [3,13]
Lack of adequate environment and infrastructure during development Infrastructure Debt [3]
inTable2,91.6%oftherespondentsreportedworkingforanorganizationbigger than250employees.Theremaining8.6%
were consultants from small/medium organizations working on the same systems and projects developed by the large organizations participatingin thesurvey. Thelatter can,therefore, be considered asworkinginthe same context asthe other91.6%oftheparticipants.
Seven out ofeight companies developed embedded software, whileanother one developed software foroptimization (company D). The companies are anonymizedand named A-H, and the sub-organizations are called B1, B2, F1–F4, and G1–G4.
2.1.1. Surveydatacollection
Inthefirstpartofthesurvey,weaskedabouttheparticipants’backgroundinformation:
•Softwaredevelopmentexperience:“<2years,”“2–5years,”“5–10years,”“>10years”
•Role:“ProductManager,”“ProjectManager,”“SoftwareArchitect,”“Developer,”“Tester,”“Expert,”“Other(Specify)”
•Gender
•Education
•Teamsize
•Organizationsize
•SizeoftheircurrentprojectinMLOC(MillionsofLinesofCode)
Inthesecondpartofthesurvey,weaskedforandanalyzedthedatarelatedtotheeffortcausedbyseveralTechnicalDebt challenges. Tomakesurethattherespondentsdidnotmisinterpretthequestion,thechallengeswerelistedasreportedin currentliteratureandnotasgeneric“TechnicalDebt.”Table1reportsthedifferentkindsofTDtogetherwiththeirscientific names and therelated academicsource. Thisassured that a better construct validity ofour survey was achieved, aswe reducedthesubjectivityoftherespondentsinterpreting“TechnicalDebt.”
It is importantto noticethat the details andtheresults fromthe questionsin thesecond part ofthesurvey arenot includedinthispaperbecausethedatahasbeenusedtocoveradifferentscopeandtoanswerdifferentquestionsrelated toTechnicalDebtinanotherwork[14].Therefore,theonlyquestionsoverlappingbetweenthepapersaretheonesrelated tothebackgroundoftherespondents.
In thethird partofthe survey,we askedthefollowing questions,some ofwhichcan be mappeddirectly tothe RQs.
Someofthefollowingquestionsareinsteadstatements.Inthosecases,wehaveaskedtheagreementoftheparticipantsto suchaproposition.
Q1. “HowmuchoftheoveralldevelopmenteffortisusuallyspentonTDmanagementactivities?”
Q2. “Howfamiliarareyouwiththeterm‘TechnicalDebt’?”
Q3. “IamawareofhowmuchTechnicalDebtwehaveinoursystem.”
Q4. “AllteammembersareawareofthelevelofTechnicalDebtinoursystem.”
Q5. “Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem.”
Q6. “AllteammembersparticipateintrackingTechnicalDebtinoursystem.”
Q7.“IhaveaccesstotheoutputofthetrackingoftheTechnicalDebtinoursystem.”
Q8. “AllteammembershaveaccesstotheoutputofTechnicalDebtinoursystem.”
Q9. “IfyoutrackTechnicalDebtinyourproject,whatkindoftool(s)doyouuse?”
TheformulationofQ1 wasslightlydifferent, aswedidnotmention“TD,”butwereferredtothechallengesmentioned inthesecondpartofthesurvey(seeTable1).However,weusetheformulationinQ1intherestofthepaperforthesake ofreadability.
AfterquestionQ2,thesurveyincludedthefollowingdefinitionofTechnicalDebt:
upatechnicalcontextthatcanmakeafuturechangemorecostlyorimpossible.Technicaldebtisacontingentliabilitywhoseimpactis limitedtointernalsystemqualities,primarilymaintainabilityandevolvability.”
Inourdefinition,weomittedthesecondpartoftheDagstuhldefinition.However,byenumeratingthedifferentkindsof TDinthefirstpartofthesurvey(excludingexternalqualitiesfromthequestionnaire),wecanbesurethatthesecondpart oftheDagstuhldefinitionwasalsocovered,althoughnotexplicitlymentioned.
Thisassuresthatwe providedtheparticipantswithagoodmeanstounderstandwhatTechnicalDebtmeant whenwe askedaboutitsmanagement.However,wecannotguaranteethatthepractitionersreadandunderstoodthedefinition.
ForquestionQ1,sincewewantedtoquantifytheamountofeffortrelatedtoTDfacedbythecompanies,weprovideda scaleincludingthefollowingoptions:“<10%,”“10–20%”. . .“80–90%,”and“Idon’tknow.”Thisquestionwasaimeddirectly atansweringRQ1.
For Q2, we provided the answers “Not at all familiar,” “Slightly familiar,” “Moderately familiar,” “Very Familiar,” and
“ExtremelyFamiliar.”Theanswersweremappedona5-gradeLikertscale,respectively0–4.Thisquestionaimeddirectlyat answeringRQ2.
ForQ5–Q8,weaskedtherespondentstoreporttheiragreementona6-gradeLikertscale:“stronglydisagree,”“disagree,”
“somewhatdisagree,”andthesymmetric scaleforagreement.Thesestatementswere aimed atanswering RQ3,RQ4,and RQ5. In particular, we wanted to understand if tracking Technical Debt was an individual activity (by asking the same questions for the individual and aboutthe whole team) andif there was a discrepancy betweenthe awareness of the practitionersandtheirtrackingprocess.
As forquestion Q9, we askedtheparticipants to report thetools usedin aqualitative way(text-box). The inputwas thenpost-processedandcompiledintheresultingwordcloud.Thisquestionwasused,togetherwiththepreviousones,to answerRQ7.
2.1.2. Surveydataanalysis
First,weanalyzedtheanswersfromQ1tounderstandthemagnitudeoftheestimatedeffort spentbytherespondents onmanagingTD.Wetransformedtheanswersfromcategoricaltonumerical:forexample,weparsed“<10%”to5,“10–20%”
to15,andsoon.Afterthecalculations,wecanreapplythetoleranceintervalof+5/−5,andthevariousmeansandsoforth wouldnot change.When calculatingthe means,we didnot consider the“Idon’t know” answers.However, only asmall portionoftheanswerswasofthiskind(11.5%).
Toavoidthebiasintroducedbydifferentrolesansweringthequestionnaire,weranacross-tabulationchi-squaretestof independencetounderstandwhethertheroleoftheparticipantsaffectedtheanswers.
Thesecondstepwastoapply frequencyanalysisonquestionsQ5–Q8.Todoso,wetransformedthecategoricaldatato aLikertscale(1–6),where“stronglydisagree”wasmappedto1and“stronglyagree”to6.AsforQ5,wealsoreportedthe groupedanswers inthreemainintervals, “Notracking”{1–2}, “Somewhattracking” {3–4},and“Tracking”{5–6}. Weused theseaggregatedintervalsonlyforthelastresultsrelatedtotheadoptionmodelSAMTTD.
Forsomeoftheresults,weusedastandardboxplot.Theboxplotisacomprehensivewaytovisualizevariousdescriptive statistics altogether at a glance. We used thismethod whenwe aimed atshowing the difference aboutthe distribution of the data with respect to two specific variables. Forexample, Fig. 4 showsthe comparison, with respect to different companies,ofthedistributionofthemanagementeffort:Wecancomparethemedians(theblacklinesinthemiddle),but wecanalsoseedifferentpercentiles(wheremostoftheanswerswereconcentrated)andoutliers.
Inothercases,wecomparedthedifferentvariablesusingstatisticaltests.Forexample,itseemedinterestingtocompare how muchthe respondentswere aware ofTD withrespectto howmuch they were trackingit. Todoso, we performed a number of tests forlinear correlation using the tool R. Most of the numerical variables did not have a strong linear correlationwitheachother,excepttheanswers forQ5,Q6,Q7,andQ8. Thisisnotsurprisingbecause,ifTDisnottracked byanindividual,itisprobablynottrackedbytheteam,andtheoutputwillnotbevisibletotheindividualortotheothers aswell.ThePearsontestsforlinearcorrelationgaveresultsfrom0.72upto0.89with p-valuevastlylowerthan0.05.This canbeconsideredagoodtestforthereliabilityoftheanswers.Sincethesevariablesallstronglycorrelate,intheremainder ofthepaper,whenstudyingdifferentvariables,we willuseonlythe“tracking” variablewithoutconsideringwhetherthe outputwasavailableornot.
We also wanted tounderstand whetherthe results dependedon a specific variable.For example,we tested whether developers answered differently fromarchitects or managers. Thus, to answer RQ6,we ran several chi-squared testsof independence between the background variables of the participants and their answers related to questions Q5–Q8.For example,wewanted toknowifthefamiliarity,awareness,tracking,andsoonoftherespondentswoulddependontheir background,suchasbytheiraffiliationwithacompany,theireducation,andsoon.ThisanalysiswasdonetoanswerRQ6.
We finallyanalyzed thequalitativeanswers fromQ9to understandbetterthe resultsansweringRQ7.We selectedthe answers inwhichtherespondentsreportedthattoolswere explicitlyused(61/226,27%oftherespondents),andwecom- paredtherespectivelevelsofawareness,tracking,andfamiliarity.Thiswasdonetounderstandbetterwhattherespondents meantby“tracking.”Wealsocreatedaword-cloudrepresentationofthequalitativeanswersforQ10.This,wefound,could representquitewellwhichtoolswerethemostusedandinwhatway.Todoso,weprocessedthequalitativedata,remov- ingtermsthatwouldappearinthewordcloudbutwouldnotmakesensefromthetoolpointofview,forexample,“code”
and“TechnicalDebt.”Finally,fromthecodingofthequalitativeanswers,wecouldalsoidentifythefrequenciesofthetools used.Todoso,wemanuallycodedthe61answersinthefollowingsixcategories:
•Comments:Theseare usually“TODO”comments,leftby thedevelopersinthecodeorother artifacts.Theseareuseful for the developers to know that something is left to do, but it does not imply a systematic monitoring of the TD reportedinthecomments.
•Documentation:Fromthequalitativeanswers,thisrepresentsatextorspreadsheetwhereissuesarelistedandexplained inasemi-systematicformat.Anotherexamplecouldbeawiki.However,suchdocumentationisdifferentfromabacklog asitismoredifficulttomonitor,anditdoesnotuseaspecifictechnologyto manageandperformoperationsonthe backlog.
•Issues:usingthesameticketsystemforbugfixing,butusuallydown-prioritizingtheissuesrelatedtoTechnicalDebt.
•Backlog: ThisiseitheradedicatedbacklogforTDissuesortheusual featurebacklogwhereTDitemsaremixedwith features.Thispracticeusuallyinvolvesatechnologysuchasprojectmanagementtools.
•Staticanalyzer:ThesearetoolssuchasSonarQube,SonarGraph,Klockwork,andsoonusedtoanalyzethesourcecodein searchforTechnicalDebt.Inafewcases,respondentsreportthattheybuilttheirownmetricstools.Thesetoolsusually check(language-specific)rulesorpatternsthatcanwarnthedevelopersofthepresenceofTD.Thesetoolsareusedas trackersbythedevelopers,withthelimitationthattheycoveronlypartoftheTD.
•Lint:Theyare alsostaticanalyzersbutareusedmoreto findpotentialbugsandsecurityissuesratherthan technical debt.
•Testcoverage:Someoftherespondentsmeasuretestcoverage,andtheyconsideralowtestcoverageaspresenceoftest debt.
2.2. Multiplecasestudy
TounderstandbettertowhatextentcompaniestrackedTD(RQ4)andhowthetrackingprocesswasintroduced(RQ8–9), we conductedamultiplecasestudy,investigatingsomeofthecompanies involvedinthesurvey.Weinterviewed13em- ployees fromcasesB1 (project manager, systemarchitect, andtwo developers), F1 (three softwarearchitects responsible forTD managementinthree differentteams), andF4(twosystemarchitects,two projectmanagers,andtwodevelopers).
In particular,tounderstandwhat wasconsidered “goodtracking,”we hadtheopportunities tointerviewtheparticipants, belongingtocompanyF1,whoanswered“stronglyagree”(thehighestleveloftracking)toquestionQ5.Thisgaveusanidea ofwhatwasconsideredascurrentbestpracticesfortrackingTD.Tosupporttheinterviews,wealsoanalyzed79outof597 TDbacklogitemsusedfortrackingimprovements(andthusincludingTDitems)incompaniesB1,F1,andF4.
2.2.1. Interviews
2.2.1.1. Datacollection The interviewquestionswere designedto covertaxonomieswe foundinthe pre-studyconcerning thereasonforinitiation,theactivitieswithintheTDmanagementprocess,andtheprocessimplementation.Allinterviewswere audio-recorded,andtheresultsoftheinterviewswereorganizedbydifferentquestionsandactivitiesforlateranalysis.
Weformulatedtheinterviewquestionsinthreesections.
•Thefirstsectioncontainsquestionsabouttheprofileoftheintervieweesandtheircompanies.
•The second section focused the questions onthe initialization ofthe process formanaging TD.“What was the main reasonforimplementingaTDmanagementprocess?”“Whodecidedthattheprocessshouldbeimplemented?”“What negativeeffectsdidyouexperienceinyoursystemduetoTD?”(RQ8).
•In thethirdsection,we askedaboutthe outcomeofthe implementedprocess (RQ4)andhowthe companiesexperi- encedtheimplementationoftheprocessintermsofthemostobviousbenefitsandchallenges(RQ9).
2.2.1.2. Dataanalysis Thedataanalysisusedaninductiveapproachbasedonopen coding[17].Wewere lookingforactiv- ities relatedtotheintroductionofaTDmanagementprocess inthecompany. Forthispurpose,wefollowedthepoints in [18],whichisawell-knownstudyonchangemanagementinsoftwareengineering.ThedatawerecodedusingaQualitative DataAnalysis(QDA)softwaretoolcalledAtlas.ti.Suchatoolsupportskeepingtrackofthelinksbetweentaxonomies,codes, andquotations. Basedon thetaxonomies,we developedacodingscheme thatcontains acorresponding setofcodes and sub-codes. Fig.1showsanexampleofourcodehierarchyandhowthecodesweremappedtothetaxonomy.Thegraphis partoftheoveralldatacollectionmodel(notcompletelydisplayedhereforspacelimitations).
Fig. 1.The coding process.
Fig. 2.Number of participants per organization.
As an example of how the coding was conducted, we present a quotation from one of the intervieweeswhich was mappedto theMotivationsub-code.“Werealizedthatforeachandeveryreleaseittookmuchtimecorrectingorfixingproblems withadditionalpatchesandittookmoreandmoretimeaddingnewfeaturesontopofthesystem.”
2.2.2. Documentanalysis
TogainmoreevidenceonhowthecompaniesweretrackingTD(RQ4),weinvestigatedtheexistingdocumentation.Also, wehadaccesstotheTDbacklogs ofthestudiedteams:26itemsintheorganizationB1,451itemsinF1,and20itemsin F4.WeanalyzedtheTDitems’fields,values,andhowtheywereranked.WedidnotanalyzeallitemsincompanyF1,as451 itemsalsoincludedimprovementsthatwerenotTD.Werandomlyselected30itemsthatcorrespondedtothedefinitionof TD; weanalyzedthem,andthenwe testedourassumptions byrandomly lookingatotheritemsinthebacklog.Weused thebacklogsintheinterviews(see previoussection)toaskfollow-upquestionsoftheparticipants.Also, weanalyzedthe documentationthatwascreatedbytheorganizationstoexplainTDtotheusersofthetrackingprocess.
3. Results
3.1. Demographicsandbackgroundoftherespondents
Intotal,weobtained226completeanswers.Thetotalrespondentswere 259,whichgivesusacompletion rateof87%.
We aimed athaving a similar number of respondents fromeach organization (Fig. 2). The participants were almost all experienced practitioners, since 156 respondents (69%) had more than 10 years of experience, while only 8 (3.5%) had lessthan two yearsof experience (the remaining 62,27.5%,had betweentwo and10years ofexperience). Severalroles participatedinthesurvey:37managers(16%),52softwarearchitects(23%),105developers(46%),seventesters(2.65%),14 experts(5.75%),andninesystemengineers(4%)completedthesurvey.
AsshowninTable2,wecaninferthefollowingcharacteristicsofthestudiedsample:
• Experience:Mostoftherespondentshadmorethantwoyearsofexperience,while69%ofthemhadmorethan10years ofexperience.Theestimationscan,therefore,beconsideredreasonablyreliable,astheyaremadebyexpertpractitioners usedtoestimatingtheirwork(morediscussioninthethreatstovaliditysection).
• Education:Mostoftherespondentshadabachelor’sormaster’sdegree.Thelevelofeducationisthereforequitehigh.
However,thesampledoesnotincludemanypractitionersinvolvedinresearchprojects.
• Teamsize:Althoughmanyoftheteamsaresmall(1–10members),thesampleincludesasubstantialnumberofrespon- dentsworkinginlargeteamsaswell.
Table 2
Backgrounddatarelatedtotherespondents,withthepercentage,thenumberofrespondents,and the relativedistribution.
•Organizationsize:As mentionedintheanalysismadeinsection 2.1,theorganizationoftherespondents islarge.This waschosenbydesign.Wewantedtorestrictourresultstolargeorganizations.Thisimposesalimitationonourstudy:
wecannotgeneralizetheseresultstosmallorganizations.
•Ageofthecurrentsystem:Thedistributionofthedifferentsystemsisquiteeven,asthesamplecoversalmostequallyall thedifferentphasesofthesystem.Thisraises thedegreeofgeneralizabilityofourresults,asitassuresthat ourdata coverboth“young”and“old”systems.
3.2. EstimationofmanagementcostofTD(RQ1)
First,wereporttheanswerstoQ1fromthesurvey.InFig.3,weshowthedistributionoftherespondentswithrespect tothedifferentlevelsofestimatedeffortthatwerereported.Bypickingthemiddlevalues,asexplainedinthemethodology section (e.g.,10–20%was transformedinto15),wecalculatedthattheaveragecost ofmanaging theTDwasestimatedby 215respondentstobe25.9%withamedianof25%ofthewholedevelopmenttime.
Fromtheresults,wecanseehowmostoftherespondentsansweredbetween0and40%,whilehalfofthemarebetween 10and30%.However,somerespondentsreportspendingmorethan40%oftheirtimemanagingTD.
Looking atthecomparisonofmedians(boldlines)andpercentilesamongthecompanies(boxplotinFig.4),wecannot seeabigdifferenceinhowtherespondentsanswered,apartfortheslightdifferenceforE,F1,andF3.Thismeansthatthe amountoftimespentmanagingTDisquitenotdependentontheorganization.
A chi-squaretest ofindependence, aggregatingthe intervalsover50% inthe samecategory (the lackofvalues would haveinvalidatedthechi-squaretest)yieldeda p-value of0.144, sowecouldnotrejectthehypothesis thattheroleofthe
Fig. 3.Distribution of respondents for Q1: “How much of the overall development effort is usually spent on TD management activities?”
Fig. 4.Comparison of companies with respect to Q1: “How much of the overall development effort is usually spent on TD management activities?”
Fig. 5.Distribution of respondents according to their answers to Q2: “How familiar are you with the term “Technical Debt”?”
respondentswouldinfluencetheiranswer.Thismeansthattheanswersdidnotvarysignificantlyacrosstheroles,contrary towhatonemightexpect,consideringdifferentviewsandexperiencesofdifferentrolesintheorganizations.
3.3. Familiaritywiththeterm“TechnicalDebt”(RQ2)
Therespondentsseemtobe,intotal,moderatelyfamiliarwiththetermTechnicalDebt(Fig.5).Themeanis2.26,while themedianis2.Fromthegraphbelow,we canseethat therearemorerespondentswhoare veryfamiliar withrespectto theotherones.
Fromthecomparisonamongthecompanies,wecanseehowtheyaremostlyonthesamelevel:F4isabovealltherest, whiletheorganizationsB2andG4arenotveryfamiliarwiththeTDconcept.However,sincewedidnothaveaccesstothe practitionersworkinginthesetwoorganizations,wecannottellwhatthecauseofthislackoffamiliaritywas.Weomitthe testofindependence,astheresultsareclearlyvisibleinFig.6.
3.4. AwarenessofTechnicalDebtpresentinthesystem(RQ3andRQ5)
WhenassessingthelevelofawarenessoftheTDpresentintheirsystem,therespondents,onaverage,somewhatagree thattheyareawareofhowmuchTDtheyhaveintheirsystem(mean=3.69,median=4).Almosthalfofthem(45%)some- whatagree,whileonly21%feel moreconfident(theyagreeorstronglyagree)andtheremaining 32%disagreeorsomewhat disagree.Only3%oftherespondentswerenotawareofTD.
Ontheother hand,thepractitioners seemed lessconvincedthat thewholeteam wouldbeaware ofhowmuchTD is presentinthesystem. Here,themeanis2.8,whilethemedianis3,bothclosetoamilddisagreement.Thecomparisonof theanswersisreportedinFig.7.Thechi-squaretestofindependenceconfirmedthatthedistributionsarenotdependent, witha p-value<2.2e–16.
Fig. 6.Level of familiarity with the term Technical Debt for each organization (answering Q2: “How familiar are you with the term ‘Technical Debt’?”).
Fig. 7.ComparisonofanswersforQ3:“IamawareofhowmuchTechnicalDebtwehaveinoursystem”(IndividualAwareness)andQ4:“Allteammembers areawareofthelevelofTechnicalDebtinoursystem”(TeamAwareness).
Fig. 8.DistributionofanswerswithrespecttoQ3:“IamawareofhowmuchTechnicalDebtwehaveinoursystem.”1–6correspondto“stronglydisagree”
to“stronglyagree.”
Forwhatconcernsthedifferentcompanies,they arequitealignedontheawareness amongeach other.Onceagain,B2 seems to haveasomewhat lower levelofawareness.The resultssuggest thatbelongingto oneorthe other organization wouldnothaveanimpactonthelevelofawarenessoftheiremployees(Fig.8).
3.5. TrackingTechnicalDebt(RQ4)
Inthissection,wereporttheresultsfromQ5:“Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem.”
Theaveragetrackinglevel,reportedby219respondents,is2.3withamedianof2.Ontheteamlevel,itseemedtobejust slightlyworse,asshowninFig.9anddiscussedbelow.
Based ontheresultsofachi-squaretestofindependencebetweentherole andthetrackinglevel,wecould notreject the hypothesis (p-value 0.63)that the role ofthe respondentswould influencetheir answer withrespect totracking. In Fig.10,weshowthecomparisonamongdifferentcompanies.Wecanseehowthedifferentcompaniesansweredsimilarly, apartfromcompanyF4andpartlycompanyD.However,thetestforindependencedidnotshowanysignificantrelationship betweenthevariablecompanyandtheanswergiveninthesurveywithrespecttoQ5(trackingTD).
Finally, there is very little difference between tracking on an individual (Q5) or team level (Q6). Only some of the individualstrackTDmorethantherestoftheirteam.ThisisstronglyconfirmedbyaWilcoxontest,whichrejectedthenull hypothesis(p-value=2.008e–05)thatthedifferenceinthetwopaireddistributionsisgivenbychance.Inotherwords,the sameparticipantansweredverysimilarlywhenaskedQ5andQ6,andthisisnotbecauseofrandomness,whichmeansthat ifsomeoneintheteamtracksTD,itisveryprobablethatthewholeteamisinvolvedinthetracking.
Finally,asobservedinthemethodologysection, theresultsfromQ7andQ8(relatedto whointheteamhasaccessto theoutcomeofTDtracking)verystronglycorrelatedwiththeanswerstoQ5,sowedonotreporttheexactresultshere.In otherwords,thismeansthattherespondentswhotrackTDalsohaveaccesstoitsoutput(e.g.,backlogs,dashboards,etc.).
Fig. 9.DistributionofanswersrelatedtoQ5:“Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem”andQ6:“Allteammembersparticipate intrackingTechnicalDebtinoursystem.”
Fig. 10.Distribution of answers for Q5: “I track (using tools, documentation, etc.) Technical Debt in our system.”
3.6. InfluenceofthebackgroundofrespondentsonthemanagementofTD(RQ6)
WehavepartlyansweredRQ6(“DoesthebackgroundoftherespondentsinfluencethewayinwhichTDismanaged?”) in the previous sections, especially withrespect to the variables roles and organizations. However, we hadseveral other variablesinthebackgroundsection,andweinvestigatedwhetheranyofthosevariableswouldhelpinunderstandingwhat causesamoreorlessmatureTD tracking.Toanswerthisquestion,weranseveralstatisticalchi-squaredtestsofindepen- dencebetween thebackground variables(education, team size,etc.)and thevariables of interest(familiarity, awareness, andtrackingofTD).However,noneofthestatisticaltestsyieldedasignificantanswer.Technically,wecouldnotrejectany hypothesesforwhichtheanswersweredependentonthebackgroundoftherespondents.Sincetheresultswouldinclude severalcombinationsof p-valuesthatwouldnotaddanymeaningtothemanuscript,wedecidedtoomitsuchatable.
Inconclusion,themanagementofTDdependsonsomefactorsthathavenotbeencapturedbythesurveyedbackground variables.However,inthenextsections,weprovidesomeanswersthatcouldnotbefoundinthequantitativedatabutseem toberelatedtothehistoricalandsocialcontextwheretheparticipantswork.Moreinformationisgiveninsections3.8and 3.9.
3.7. ToolsusedtotrackTechnicalDebt(RQ7)
Inthissection,weanalyzedwhethertherespondentswhousedsometoolstotrackTDwere alsomoreawareofTDor trackeditmorethan theothers.Todoso, weconsideredonly theanswersfromthe61participants(27%)who answered thequalitativequestionQ9(specifyingwhattoolsthey used).WealsoreporttheboxplotforthequestionsQ5andQ3:we compared theanswers of theparticipants whoused a tool withthe oneswho didnot. Fig. 11illustrates the resultswe found: “Awareness”isthe awarenessoftherespondentswhodidnotuseatool, while“Awareness_Tool”istheoneforthe onesusingatool(samefor“Tracking”). Itseemsthat,indeed,iftheparticipantsuseda tooltotrackTD,thentheywould reportahighperceptionoftrackingTD.Achi-squaredtestofindependenceconfirmsastrongdifferenceinthedistribution oftheanswers(p-value<2.2e–16),stronglyconfirmingthisclaim.However,moresurprisingly,theirperceptionofthelevel of awareness ofhow much TD ispresent in thesystem would onlyslightly change. Thisis confirmed by a chi-squared testofindependence(p-value of0.59),whichdidnotshowanydifferenceinthedistributionoftheanswers betweenthe participantsusingatoolornot.Verysimilarresultswerefoundattheteamtrackinglevel,sowedonotreporttheminthe boxplotbelow.
Giventhe highdifferencein trackingbetweenthe respondentswho claimedtouse atool andthe oneswho didnot, we can safelyclaimthat the respondentstrackingTD alsouse atool. This resultconfirms thatwe captured mostofthe respondents’answers relatedto thetool that they used.The respondents whodid not inputan answer forthetool also most probablydon’t use any tool, since they have in general a much lower level of tracking.Therefore, we can further validatetheresultthatonly26%oftheparticipantsusedatooltotrackTD.Thisisalsoimportantforthereliabilityofour resultsrelatedtotheSAMTTDmodelexplainedinthenextsections.
Fig. 11.Distribution of answers for Awareness and Tracking and comparison if a tool is used or not.
Fig. 12.Word cloud of the tools used by the participants to track TD.
Fig. 13.Number of participants using a specific kind of tool.
Fromthequalitativedata,wecouldalsoreportwhattoolswereusedinpractice.Afterremovingsomeofthewordsthat would justcreatenoise (suchas “TechnicalDebt,” seemethodology section formoredetails), we obtainedthe following wordcloud,whichshowsthedistributionoftoolsusedamongtherespondents(Fig.12).
Bycodifyingthe qualitativeanswersincomments,documentation,issues,backlog,staticanalyzer,lintandtestcoverage,we canalsoanalyzethefrequencies.Wecanseehowthetoolthatwasmostlyusedisabacklog(dedicatedtoTDorthesame usedforfeaturedevelopment),followedbydocumentation,staticanalyzersandissuetrackers(Fig. 13).
We then analyzedthe distributions ofthe respondentsforAwarenessand Trackinglevels (Fig. 14) withrespectto the differentkindsoftools.Ontheother hand,byanalyzingthekindofusedtool withrespecttothe meanamountofeffort spent onmanagementactivities(Fig. 15), wecanseeaquitecleardifference.Althoughthisdifferencecouldnotbestatis- tically tested(thechi-squaretestsdidnotreportsignificantdifference,butthiscouldbeduetothesmallsample),backlog andstaticanalyzersaretheonesthatseemtocreatelessoverhead.
Inconclusion,thefollowingconsiderationsonthetoolscanbemade:
•Comments in the code help awareness, but they are not considered tracking, and they are used by just 1% of the respondents.Thisisprobablybecausetheyarenotusedinadocumentthatcanbemonitoredbytheteamoutsidethe code.
Fig. 14.Distributions of levels of TD tracking (“_tr”) and awareness (“_aw”) reported to the user of each kind of tool.
Fig. 15.Mean of management effort for each kind of tool.
• Documentation increasesTD awareness, but it is not considered as a high level of tracking, andit has the highest overhead.ThemaintoolsusedherewereMicrosoftExcelorWord.Wecaninferthatthispracticeisnotrecommendable incomparisonwiththeotherones.
• UsingabugsystemfortrackingTDisnotconsideredascontributingtoabetterlevelofawarenessortrackingcompared to theother techniques,andit hasa slightlyhigher overhead.We wouldinfer that thisis also not thebest wayof trackingTD.
• Backlogs,staticanalyzersand“lint”programsallincreasethetrackinglevel,butwecannotseeabigdifference(although staticcodeanalyzersseem tocontribute betterto theparticipants’awareness). Theyare alsotheoneswiththeleast overhead.TheythereforeseemtobeconsideredthebestpracticesatthemomenttotrackTD.
• Backlogsarethemostusedtoolamongtheparticipants.Inparticular,themostusedbacklogtoolsareJira,Hansoft,and Excel.
• Testcoveragedoesnotseemtocontributetoomuchtotheawarenessandtrackinglevel,althoughitdoesnot involve muchoverhead.ThismightbebecausetestcoverageisrelatedtoonlyasmallpartofTD.
3.8. WhyandhowdocompaniesstarttrackingTD?(RQ3)
First, we report why the companies decided to start tracking TD, or else their motivation. Then, we found that the preparationactivitywascriticaltostarttrackingTD,andwe,therefore,reportthemainstepsinvolvedinthispractice.
3.8.1. MotivationforstartofTDtracking
ThemainreasonsbehindthestartoftrackingTDwererelatedtoexperiencingtheinterestofTD,orelsethereweretoo manybugstofix,decreasedfeaturedevelopment,performanceissues:
“Becausewerealizedthatforeachandeveryreleaseittookmuchtimecorrectingorfixingproblemwithadditionalpatchesand ittookmoreandmoretimeaddingnewfeaturesontopofthesystem.[. . .] Thesystembecamemoreandmoreinefficient.” These statementsconfirmourpreviousresults[6], asoneofthearchitects alsomentioned:“AftersometimetheTDwasincreasing andwehadacrisissituation.”
In other words, themain motivationwas related tothe negative impact experienced by thepractitioners, orelse the perceptionoftheinterestassociatedtotheTD.
3.8.2. Preparationofthetrackingprocess
Fromthecasesinvestigated,itwasclearthatadoptingaTDtrackingprocessrequiressomeinitialactivitiesandtimeto implementtheprocess.From B1,weunderstoodthatthey“Havedonethisfor1.5yearsmoreorless,switchingfromreactiveto moreproactive.It’sabetterinformationaboutthestatusofthesystem.”Thepreparationincludesthefollowingaspects.Although we used [11] tocodetheseresults, weprefer toreport them ina waythatis morereadable inthecontext ofTechnical Debtmanagement:
•Initiative—Inallthethreecases,thetrackingprocessstartedfromanindividualinitiative.Amanager,asystemarchitect, anexperienceddeveloper,andsoforth.Inotherwords,trackingTDrequiresachampioninthesub-organizationwhois awareofTDandiswillingtopromotetheadoptionofthepractices.
•Budget—Tracking TDneeds both aninitial effort andacontinuous effort. CompanyB1startedwith150hours,in the beginning,foradevelopment unit(i.e.,asub-organizationresponsibleforasub-system,whichincludesa fewteams).
However, this was “okjusttostartthebacklog,butnottogoindepthinvestigation.” The continuous time allocated to trackingTDvariedacrossourcases:itrangedfrom10%(companyF)to30%(companyA).Thecasesalsoshowhowthe continuousallocationofresourcestomanageTDcouldbedynamic,andvaryingaccordingtonewlyidentifieditems,as suggestedforArchitecturalTDin[19].
•Management involvement—Although the initiativecan start fromanyoneintheorganization, trackingTD requiresan initialandacontinuousinvestment(budget).Thisentailstheneedofinvolvingamanagerwhounderstandstheimpor- tanceofTDandwhocangrantabudgetforthisactivity.
•Benefits—As theprevious pointentails,thereisaneed,forthemanagementtounderstandthebenefitsoftrackingTD giventheinitialandcontinuousbudgetallocation.Suchbenefitsneedtobecommunicatedandcontinuouslyevaluated tojustifysuchinvestment.
•Measurementsetup—According tocompanyB1,anamountoftimeisneededtosetupmeasurements(e.g.,complexity) and TD identification (static code analyzers). In other companies, such as F1, we found that a developer set up a specific analysistoolto measurecomplexity andbugdensity:thisactivitywas supportedby ateamdedicated tothe measurementsintheorganization.
•Explanationandalignment—TheChampionfortheTDtrackingactivityneedstocommunicatewelltotheteamswhatTD isandwhatneedstobereported(toavoidoverhead).Theintervieweesmentionedthattheyconductedafirstworkshop forexplainingTDanditstracking,andtheyalsoproducedsomedocumentation.Itisalsoimportanttohaveavalidation workshopinwhichtheteamsbringupsomeTDissuestoaligntheirunderstandingwiththemainTDconceptssuchas PrincipalandInterest.
•AppointingofaSub-SystemTDResponsible(SSTR)—TDtrackingneedssomeoneresponsibleacrosstheorganizationwho cantaketheinitiativetosupportthetrackingprocess.Inallthestudiedcases,thepeopleresponsibleforcollectingand maintaining a list of TD issues were chosen as experienced developers on a given sub-system. The sub-system TD responsible,however,needstobesupportedbytheknowledgeoftheteamswhentrackingtheissuesbecausedifferent practitionershavebetterandmoredetailedviewsofdifferentpartsofthesystem.
•Breaking down anddistributingTD items—The SSRTneeds to allocate the TD itemsto the teams accordingto their competences andtheir responsibilitieswithrespectto thesystem. Architectureitemswere explicitlyappointedtoan experienceddevelopertobeanalyzedandestimated.
•Communication of TD tomanagement—Once the firstTD backlog was prepared, itwas communicated toa manager connected to the evaluated (sub-)system. This was supposed to show management the risk associated with such a systemduetoTD.
Insummary,quiteafewactivitiesarenecessarytosetupaTDtrackingprocess;thisrequirestheorganizationstotake theinitialdecisionofallocatingsomebudgettoTDtracking.
3.9. WhatarethebenefitsandchallengesoftrackingTD?(RQ4) 3.9.1. Benefits
When weevaluated thetrackingprocesstogether withtheteams,theymentioned severalbenefitsoftrackingTD.The backloggavethemalong-termperspective,notonlytheshort-termonegivenbythefeaturebacklog.Therespondentsdid notthinkthattheTDbacklogwashardtomaintain.Thisissupportedbythelowermanagementoverheadreportedinthe surveywithrespecttotheotherpractices.
One ofthearchitects inorganizationF4mentioned that,afteran importantarchitecturalTDitem was refactored,“The evidencewasvisibleinthenextreleasewithpositiveimpactwhenaddingnewfeaturesontopoftheonewefixed.Easiertoaddand nosideeffect,cleanerarchitecture.”According tothe projectmanagerinterviewed incompany B1,theinitiative was overall successful,butitneededtobecontinuouslysupported,tobereallyeffective.“Yesitwasworthit,butitisimportanttofollowit upnowandtomakesurethatpartsofthelistaredone[refactored].”
3.9.2. Challenges
Althoughtherespondentsmentionedseveralbenefits,someissueswiththecurrentapproacheswerealsoreported.The mostimportantone was theacceptance,from themanagers, oftheneed forrefactoring.Even withthelistupdated, the informationabouttheriskandbenefitsofperformingarefactoringwasnotalwayscleartothemanagers.Thismeantthat, especiallyforlargeTDitems,itwasdifficulttoreceivetheneededbudgetforTDrepayment.
OneofthemajorproblemsinstartingtotrackTDwasthatthefirststepneededasubstantialamountofefforttocollect all the existing items. Although this wouldbe only a one-timeeffort, in some teams the managers wouldnot concede thenecessarybudget. Achallengementioned byall theparticipantswas that therefactoringbecamemoredifficultto be prioritized andcompletely repaidwhen severalitemsandseveralteams were involved.It required“double” theeffort to prioritizetheitemwithdifferentmanagers(whocoulddisagreeonthenecessityofrefactoring)andthecoordinationofthe refactoringwasconsideringquiteriskyandasadangerousoverhead.Forexample,TDissuesinvolvinginterfacesweremore time-consumingtoestimateandprioritize,becausetheyrequiredmorediscussionsinvolvingmorestakeholdersfrommore teams.
Anotherchallenge in the prioritization activitywas thedifficulty ofprioritizing among TD items, especially where an explicit risk/impact value was not calculated.The participants reported that it was generallydifficult to show an actual gainfromthecost/benefitanalysistothemanagers,evenwithafield explicitlyrepresentedinthebacklog.Ingeneral,the intuitivevaluesusedfortherisk/interest(butusuallynotincludingasystematiccalculation)wereworkingonlysometimes, andmoreexplanationsandindicatorswererequiredbythemanagerstoacceptacostlyrefactoring.
Therespondentsmentionedthedifficultyofcoordinatingthedifferentteamsinusingastandardizedprocessfortracking TD.In some cases,itwas difficultto “makethemcare”aboutreportingTD, whileforother teams theTDlist wascreated withenthusiasm.
Finally,theparticipantsmentionedthatinsomecasestheTDbacklogitselfdidnotmaketheTDmoreconvincingforthe managementtoberefactored,butitservedfortheteamstoremembertotakecareofTD,whichwouldotherwiseremain invisibleandoverlooked.
3.10. StrategicAdoptionstrategy
Asafinalresultfromthecombinationofthevariousanalysesperformedsofar,weaggregatedtheresultsandcombined them with the roadmap related to the current literature on TD. This led to the Strategic Adoption Model for Tracking Technical Debt (SAMTTD,Fig. 16).The firstfour steps inthemodel representtheresultsfromthe survey onthe current stateofpracticeinthecompanies.
We usedthe resultsfromQ4 tocreatethe first step:If therespondents were notfamiliar withthe TDconcept, they could beon ahigherlevel.Then, wedefinedthreemore levelsofTD trackingmaturity.Todiscernbetweenthedifferent levels,wemappedpracticesthatwefoundusedornotandthatcorrelatedwithdifferentlevelsoftracking(e.g.,theusage ofatool).Weadditionallyusedtheresultsfromtheinterviewswhereitwasclearwhatdifferentpracticeswereintroduced totrackTD.
• Unaware:There is noawareness ofwhat TechnicalDebt is andthereforehowto manage it.According toour survey data,only8.4%oftheparticipantsareinthisstage.Thisdatumisrelatedtotherespondentsthatanswered“Notfamiliar atall”withthetermTechnicalDebt,asvisibleinFig.5.
• Notracking:Inthisstage,thesoftwareengineersareawareoftheTDmetaphor,andthereisageneralunderstandingof thenegativeeffectsbroughtbyhavingTDinthesystem,butthereisnoinitiativetotrackTD,whichremainsinvisible.
Around65.6%oftherespondentsreportbeingonthislevel,by(strongly)disagreeingabouttrackingTD.Thepercentwas calculatedbycountingthetotalanswersminus theanswers fromQ4,countedpreviouslyastheunaware respondents, andtheoneswhousetools,countedinthenextlevels(26%).Therefore,thisyielded100–8.4–26=65.6.
• Ad-hoc:Inthisstage,thesoftwareengineersareawareofwhatTDis,andsomeoftheindividualshavestartedtracking TDontheir own.Thismakes theTDmanagementprocess ad-hoc,since,withoutadedicatedbudget,such individuals use what is available, interms of toolsand processes, forother activities. Forexample, accordingto the qualitative answersrelatedtoQ3,thesprintorproductbacklog,acommonissuetrackerorasimpleexcelspreadsheetcanbeused fortrackingTD.Static analysistoolsmightbe inusebutare limitedtotheindividual usage.Accordingtothe survey, approximately26%oftherespondentsareatleastonthisstage(61participants,26%,wereusingtools,seesection3.6).
However, fromtheseones, we needtotake awayaround 7% thatwe placeonthenext level(see point). Intotal,we thereforereportaround19%ofrespondentsonthislevel.
Fig. 16.The Strategic Adoption Model for Tracking Technical Debt: the main milestones and the state of practice (% of respondents per category).
•Systematictracking:ThecompanyinthislevelhasacknowledgedtheimportanceoftrackingTDalsoonamanagement level (see Preparationsection). Therefore, there is a budget generically associated with the management of TD. This amountusuallyrangesbetween10%and30%.AccordingtothedocumentanalysisoftheTDitemsfromthecasestudy, aspecificbackloganddocumentationrelatedtoTDisnecessary,withTD-specificvaluesusefultoanalyzetheprincipal andtherisk/interest.TheTDisunderstoodbytheparticipants,who havebeeninstructedby apersonresponsiblefor the process (see Preparation). There is an iterative process in placeto monitor TD (identify, estimate, prioritize, and repayit), andsuchprocess issubjectedto continuousimprovement.7.2% oftherespondentsare onthisstage,actively tracking TD. This isthe maximum level achievedby the companies, asconfirmed by the interviewees. Thisamount canbeobtainedwhentakingintoconsiderationtherespondentswhoanswered“Agree”or“StronglyAgree”toQ5(see Fig.9).
Wedonothaveevidencethatcompanieshavebetterprocessesandtoolsinplace.However,basedoncurrentliterature onTD[3] andrelatedworkonchangemanagement[18],wehypothesizefuturematuritystepsthatcanbereachedbythe companieswhentheresultsofresearchwouldbeputinplace.Weidentifythefollowingthreesteps:
•Measured:Inthisstage,identificationtoolsforTD areinplace,forexample,theuseofthetoolSonarQube forsource code TD(suchasMcCabecomplexity)or, forexample,dependency checkersonthe architecturelevel (asreportedin companyF1).Themeasurementoftheinterestisalsoinplace,forexample,thereareindicatorsthatshowtheamount ofinterestpaidorpredictediftherefactoringisnotconducted.Suchtoolsarenotemployedinpracticeyetandshould beintegratedtoprovideoverallindicatorstoprovidehelptothestakeholderstoestimateandprioritizeTD.Theauthors ofthispaperareactivelyworkingonintroducingsuchtoolsandindicators,asexplainedinourrecentwork[20].
•Institutionalized:Accordingtochangemanagement [18],aprocessismaturewhenitisspreadandstandardizedacross thewholeorganization.ThiswouldallowanalignedprioritizationofTDacrossthesystem.Thiswouldalsoallow the practitioners toplan theallocationofresourcesaccordingto thequalityofthe(sub-)systemsinorder toplanforthe life-cycleoftheproduct.Asanexample,thereadercanconsiderateamwhoneedstobuildafeatureonasub-system developed by other teams: knowing howmuch TD ispresentin such sub-systemwould allowthe team toestimate whetherrefactoringisneededortheleadtimeforthefeaturestoreachthecustomer.
•Fullyautomated: In this stage, the decisions on the refactoring are completely data-driven, making use of statistics collected onhistoricaldataorbybenchmarkingthesystemagainstacollectionofreferencesystems.Forthispurpose, however,thepreviousstepsarenecessary.
4. Discussion
Thecombinationofdatafrom226participantsin15largesoftwareorganizationswiththein-depthcasestudyprovided an overallpictureofthecurrentstateofpracticewithrespecttoTDtracking.Inthissection,wediscussthecontributions in thismanuscript, withrespectto practitionersandresearchers,we compareourresultswithexistingliterature, andwe reportlimitationsandthreatstovalidityrelatedtoourstudy.
4.1. CurrentstateofpracticeoftrackingTDandimplicationsforpractitionersandresearchers
TheresultsrelatedtoRQ1tellusthatsoftwarecompaniesspend,onaverage,around25%oftheirdevelopmenttimeon TD managementactivities. Theboxplots(Fig.4andFig.10)show someconsistencyinthecompanies: Themediansrange
almostoneoutoffivedoitinanad-hocway(19%),thatis,byusingtoolsthatarenotmadeforTDtrackingandtherefore arenoteffective.Finally,only7%oftheparticipantstracksTDinamorededicatedway.
An interesting observationis that the resultsare not significantly affected by the backgroundandthe role of there- spondents.Thisdatumincreasesthereliability oftheresults:Independentofthe organizationandthebackgroundofthe participants, we found very similar resultsacross the respondents, whichcan be considered also more general. In other words,themeansandthevarianceacrossdifferentpractitionersaresimilarindifferentorganizations.
However,thisalsoledustoconsiderthefollowing:Differentroleswithdifferentprioritiesandviews(e.g.,managersand developers)agreedontheestimatedamountofeffortdonetokeepTDatbay,aswellasonthefactthatsucheffortisnot systematic(TDismostlynottracked).Then,anunansweredquestionis:IfTDissopainful,whydoorganizationsnottrack TDmoresystematically?OnepossibleansweristhatemployeesdonotknowhowtotrackTDeffectively.Thisissupported bythefact thatmostofthosewhotrackTDdonotusepropertoolsordocumentation,whilethefew whosystematically trackTD still do somanually andrarely usebasic measurements.For thisreason, we found itimportant topropose the SAMTTDmodel,tohelppractitionersunderstandwhatitmeanstotrackTDandwhatisnecessarytoimplementatracking processinpractice.
AnotheranswertothecurrentlackofTDtracking,despitethemanagementeffort,mightbefoundintheresultsrelated to RQ8 andRQ9 concerning the necessityof a Preparationphase and its cost,which is critical forthe introduction ofa TD tracking process in thecompanies. At the outset, the initiative needs to be conductedby one ormore champions in the organization. An initial budget should be allocated to allow the first activities related to the TD inventory, and this entailsaneedforacommitmentbymanagement,whichisachievedbycommunicatinghowasystematicTDmanagement processwouldbringbenefitstotheorganization.Unfortunately,thisisoneofthechallengesreportedbythepractitioners, whoclaimthatthereisalackofgoodinstrumentsandpubliclyavailableresultstoadvocatefortheneedofsystematicTD management.Other activities includethecommunicationandalignment ofwhatshouldbe collected asTD, theset-upof measurementsystems,theappointmentofaSub-SystemTDResponsible(SSTR),andthebreakdownanddistributionofthe TDitemstotheteams.Unfortunately,thefirstinvestmentcanbeburdensome.Forexample,atrialof150initialhoursfor a unit withthreeteams was barely enoughto identifypreliminarily theinitial TD list.It alsodidnot leave time forthe company toset up measurementsystems andaccuratelyestimate andprioritize theTD items, although updatingthe TD backlogbecomeslightweightinthefollowingiterations.
Fortools totrackTD, we found that manyparticipants usebacklogs, implementedin projectmanagement tools such asJira and Hansoft,andstatic analyzers.The results alsosuggest that these approachesrequire lessmanagement effort, andthey seemtogive slightlymoreawareness oftheTD inthesystem. However, itseems that,formostofthe respon- dents, the awareness of the amount ofTD present in the systemis not affected by the tool inuse, ifnot slightly.This means that TD tools are not only used by the teams to be aware of the TD, but also for communication, monitoring, and management purposes. The usefulness ofthese toolsis shown by the fact that the participantsusing backlogs and static analyzers spent less than the average time (18–19% compared to 25.9%) on TD management. However, the tools seem not to help raise the awareness of the respondents: The mean awareness remains between “somewhat disagree”
and“somewhatagree.”Manyqualitative answers,bothfromthesurveyandfromthecasestudy,alsoreport thefactthat many TD items cannot be automatically revealed because they are too context-specific and they cannot be represented by generic patterns.This leads to theconclusion that better andmore specific toolsformanaging TDneed to be devel- oped.
Insummary,managingTDrequiresafew investmentsthatarenot wellknownbythepractitioners andaredifficultto bemotivatedbyaprecisecost/benefitsratio.Consequently,withoutaninvestmentinprocessesandtoolstotrackTD,itis difficulttomakeTDvisible,aswellastoadvocateforrefactoring“invisible”TD.Thisrepresentsaviciouscycle:companies suffer the negative effects of TD and try to contain it, but at the same time they do not find enough motivations to investinamoresystematicmanagementprocess. Bylookingatthemotivationsforstartingto trackTD,the resultsshow that organizationsdoso whenthey experiencetheinterest ofTD:slowfeaturedevelopment,qualityissues,andperformance degradation. However, at such a point, the interest associated withTD is already highand, as explainedin other recent papers—[6],[12]—fromtheauthorsofthismanuscript,itishardtorefactor,asthecosthasalsoincreasedandhasbecome tooexpensive.Inconclusion,theonlywayouttheviciouscycleseemstobe,forthepractitioners,toproactivelystarttracking TD.Usingbacklogsandstaticanalyzershelpreducethemanagementoverheadandincrease(evenifslightly)theawareness ofTD. Newtoolsneed tobe developed,in twomain directions:allowing the developersto communicatethe urgencyof refactoringTDtothemanagement,andbetter(semi-)automatictoolstoidentifyandtrackTDtoincreasetheawarenessof therespondents.
4.2. Relatedwork
Therearetwosurvey-basedstudiesregardingthefamiliarityandtoolusagerelatedtoTD.In[21],theauthorsconcluded that 50% ofrespondents saidthat no toolswere usedandonly 16%said that toolsgaveenough details. Theirstudyalso showsthat27%oftherespondentsdonotidentifyTD.Furthermore,Holvitieetal.[22] showthatover20%oftherespon- dents (inFinland)indicatedpoorornoTDknowledge.However, inthesestudies,wecannotfindanestimate oftheeffort spent onTD management, andthereis no explanationofhow a TD trackingprocess can be startedorimplemented. As a comparisonwiththesestudies,theresults fromoursurvey showthat familiaritywithTDandits trackingseems tobe higheramongthe respondentswhoanswered oursurvey.Thismayberelatedtothe differentsize,culture,ordomainof the organizations,butgiventhatourstudyismorerecent,wecould speculatethatthefamiliaritywithTDisgrowing. In ourresults,only8.4%ofourrespondentswerenot familiarwithTD,and27% oftherespondentsusedtools.Bothfindings arehigherthanintheothersurveys.
There areafewarticlesaboutindustrialpracticesconcerningTechnicalDebt,forexample[8,23],and[24],buttheyare single casestudies, and, in two cases,they were performedin smallcompanies. Also, such work doesnot focus on the currentstateofpracticeofTechnicalDebttracking,anestimationoftheTDmanagementeffort,themotivationsforstarting to trackTD,orthe maturity evolutionoftracking.This makesit difficulttocompare theresultswithour survey,butwe willtakethetopicsonebyoneanddiscusssimilaritiesanddifferences.AsforthecostoftrackingTD,[25] reportsdetailed resultsfromasinglecasestudy.Some resultsare inlinewiththebroadresultsreportedhereincluding,forexample,that the effortmightvarygreatly,reaching even70%ofthedevelopmenttime, andstarting theTDtrackingismoreexpensive in thebeginning butitbecomes more lightweightwhen theprocess is repeated.In [26],the TD managementprocess of several companies is analyzed withreported resultssimilar toour cases, forexample, thelimited use ofmeasurements andlackofasystematicprocess.However,incontrastwithourwork,thestudydoesnotfocusonTDtracking;itreportsa broadsnapshotofcurrentpracticesanddoesnottakechangemanagementperspectiveintoaccount.Forexample,wereport information suchasthequantified costof managingTD, thereasonswhyorganizationsstarttracking TD,andtheprepa- rationactivities andcosts necessarytotrackTD.Wepresenta maturitymodel,SAMTTD,that, takingchangemanagement aspects intoaccount, allows for the transferof knowledge to practice.This is visiblein the additionalfour levels added inour model.We canconsiderthefourthstepinourmodelasanespeciallyimportantadditionto ourworkbecausewe foundevidenceofasystematicprocessusingTD-specificdocumentationnotreportedin[26].Also,noneofthecitedstudies reports quantitative answers fromas manyas 226practitioners, which alsoshow trends andstatistical results reported here.
There are a few studies regardingTechnical Debt trackingandtools inthe literature. Asfor tools,mostofthe recent findingsreporttoolscreatedbyresearchers(e.g.[27–29]).Theexperiencereportsareusuallyrelatedtotheevaluationofthe tool inaspecificcontextand,therefore,cannotbeconsideredasstate-of-practice(atleast,notyet).Thisisunderstandable asnewtoolsarebeingdevelopedwhilethismanuscriptisbeingwritten,andtheattentiontoTDbysoftwareorganizations isquiterecent.Asfortracking,threeinitiativeshavebeenreportedintheliterature[28,30,31].Thefirstone,[28],presents a tool calledDebtFlag, which allows trackingTD and its propagation. However, the evaluationof such a tool in practice hasyettobereported.Thesecondone, [30],reportstheevaluationofatool(AnaConDebt)toassessandtrackTD.Afirst studyhasbeendoneinanindustrialenvironment,butmorestudiesareneededtounderstandwhetherthetool isusable in practice. Finally,the last paper,[31], reportsa new methodto analyze the TD reported in codecomments. Although some ofthefeatures ofthesemi-automaticapproach seeminteresting,itisnot clearhowmanyTDitemsarecovered by commentsandwhetherthisapproachcanbeusedinpractice(thepaperdoesnotreportapracticaluseofthemethodwith an evaluationfromthepractitioners).Forexample,ifwelookatthesurveyconductedinthispaper,currentlyonlyaround 1%oftheparticipants(three)statethattheytrackTDusingcomments.
4.3. Limitationsandthreatstovalidity
Here we report the main threats to validity regarding thisstudy, according to [11]:constructvalidity,internalvalidity, externalvalidity,andreliability.
Construct validityis concernedwiththeinvestigation deviceandthevalidity ofthedatawithrespectto theRQsthat are investigated.In a survey,this is usually one of the mainthreats to the validity ofthe results, asparticipants might interpret definitions andother terms differently fromeach other.Although this phenomenon is unavoidable, we took a few approachesto mitigate the consequences.As for the misunderstandings related to theinterpretation of what TD is, we havereported,beforethequestions,shortdefinitionsoftheissuesandmanagementactivities thatareassociatedwith TD accordingtothemostup-to-dateliterature.Inother words,we didnot askquestionson“TechnicalDebt” directlybut, instead,onmoreconcreteissuesthatareassociatedwithit.Inourexperience,thisshouldhavereducedthepossibilitythat therespondentswouldconsiderTDassomethingelse,forexample,bugsormissingfeatures(somethingthatmighthappen in practice, according to our experience). We also provided, in the last part of the survey, a definition operationalized fromthe variousexisting formal definitions.We askedaquestion aboutwhetherthepractitioners were familiarwithTD accordingtothedefinition,andtheymostlyagreed.Althoughthisdoesnotensurethatthepractitionershadansweredwith fullknowledgeofwhatTechnicalDebtis,webelievethatthetwomitigationstrategiestogethercontributedtoreducingthe threatstoconstructvalidity.
Asfortheresultsconcerning testinghypothesesstatistically,itisimportantto noticethat,inmostcases,we couldnot rejectthenullhypothesesthattheresultswoulddependonthebackgroundoftherespondents(roles,company,etc.).This meansthatwecouldnotfindenoughevidenceinthisdatasettosupporttherejectionofthenullhypotheses,butthereader shouldbewarnedthat wealsodidnot provetheoppositehypotheses.Insummary,we cannotclaimthatthebackground playedaroleintheresults.
Finally,itisimportanttoreportthethreatstoexternalvalidity.Weinvestigatedmostlylargecompaniesinvolvedinthe developmentofembeddedsystemsandfromtheScandinavianarea.Thisentailsthreepossiblethreats.
• Itispossiblethat,inotherdomains(e.g.,webdevelopment),thepercentofthecompaniesinthematuritystepswould differ.Tomitigatethisthreat,wehaveincludedacompanydeveloping“pure”optimizationsoftware.Inthiscase,wedid notfindastatisticaldifferencewithrespectto theothercompanies.However,moreresearch isneededtounderstand ifthereisadifference.
• Companiesin other countries, withdifferent contexts andculturalbackgrounds, mightanswer thesurvey differently or have different ways of managing Technical Debt. However, all the companies investigated in this study employ developers from all over the world and have distributed development. It is therefore likely that the background of theparticipantsinthesurveywouldactuallybemoreheterogeneousthantheorganizationsthemselves,whoareonly Scandinavian.
• SmallcompaniesmightbehaveverydifferentlywithrespecttoTechnicalDebtmanagement.
Therefore,thereadermustbeawarethattherearesomelimitationstotheextenttowhichwecangeneralizefromthese results.
Therearealsothreatstothereliabilityoftheresults,orelse,theresultsmightbebiaseddependingonaninterpretation givenbytheauthors,method,orsourceofevidence(e.g.,ifweaskedonlydevelopersbutnotmanagers),asreportedbelow.
• Thereisa threatinthequantitiesestimatedbytherespondentswithrespecttoQ1. Wedonotknowwhatthegiven estimationsarebasedonsincemostoftheparticipantsdonotexplicitlytrackTDandtheir timespentonit.However, asthe demographic datashow, many participants can count several years (more than 10) of software development experience.Estimationsarebasedonexperience,andtheyarereferencedtothepractitioners’lastprojects,whichlimits a possibleretrospective bias.Practitioners are usedto estimatingthe amountof workthat has beendone orthat is upcoming,whichmitigatesthethreatthattheestimatedeffortwouldbeverydistantfromtherealone.
• Asfortheauthors’interpretation,wehavemadesurethat,especiallyforthequalitativedataanalysis,wehaveapplied observertriangulation:Twoormoreauthorshaveanalyzedtheinterviewsandeitherseparately codedthestatements orcheckedtheotherauthors’codes.Althoughthisdoesnotremovethethreatcompletely,itisthemainstrategyused whenqualitativedataanalysisisinvolvedinthestudy.
• Relyingonlyonquantitative datamightmiss importantdetailsthat are necessarytounderstandtheresults ormight showcorrelationsthatarenotrelatedtoanyrealcausality.Forexample,wecouldnotfindreasonsfromthequantitative backgrounddatathatwouldexplainthevarianceintheamountoftimethattheparticipantsareemployingtomanage TD.However,wecouldcombinethequantitativeresultstoqualitativeanswerscomingfromsomeoftheorganizations participatinginthesurvey,whichhelpedexplainthefactorsrelatedtotheirmaturitybyanalyzingtheinterviews.
• Finally,there is a threat ofreliability ofthe results, asthepercentage ofdevelopers participatingin the survey was largerthan other roles.Thismeansthat theresults mightbeskewedby thedevelopers’biases. However, tomitigate thisthreat,weperformedachi-squaretesttounderstandifthedistributionoftheanswerswoulddependontheroles oftherespondents. Thetest didnot supportsuch ahypothesis, meaningthat there wasnot a statisticallysignificant differencebetweendifferentrespondingroles(differentrolesgavesimilaranswers).Byhavingsuch rolesparticipating inthesurvey,wecouldapplyamitigationstrategydenotedassourcetriangulation.
5. Conclusion
According to 226respondents in15 softwareorganizations, practitioners estimate spending, onaverage, a substantial amountoftimetryingtomanageTD(25%),althoughsuchanamountisaffectedbysomevariance.Softwarecompaniesin ScandinaviaaremorefamiliarwiththeTDmetaphorwithrespecttopreviousstudies,andtheytrackTDmore.Theaware- nessofTDinthesystemseemsto besomewhatknownby thedevelopers,independentofwhichapproachisused.Tools suchasbacklogs(themostpopularapproach)andstaticanalyzershelpreducethemanagementoverheadofapproximately