survey and multiple case study in 15 large organizations Technical Debt tracking: Current state of practiceA Science of Computer Programming

(1)

Contents lists available atScienceDirect

Science of Computer Programming

www.elsevier.com/locate/scico

Technical Debt tracking: Current state of practice

A survey and multiple case study in 15 large organizations

Antonio Martini

^a^,^b^,∗

, Terese Besker

^c

, Jan Bosch

^c

aCATechnologiesStrategic,ResearchTeamBarcelona,Spain

bUniversityofOslo,ProgrammingandSoftwareEngineering,Oslo,Norway

cComputerScienceandEngineering,ChalmersUniversityofTechnology,Göteborg,Sweden

a rt i c l e i n f o a b s t ra c t

Articlehistory:

Received5November2017

Receivedinrevisedform20March2018 Accepted25March2018

Availableonline29March2018

Keywords:

TechnicalDebt Changemanagement Softwareprocessimprovement Survey

Multiplecasestudy

Largesoftwarecompaniesneedtosupportcontinuousandfastdeliveryofcustomervalue bothin the short and long term. However, this can be hinderedif both the evolution andmaintenanceofexistingsystemsarehamperedbyTechnicalDebt. Althoughalotof theoreticalworkonTechnicalDebthasbeenproducedrecently,itspracticalmanagement lacks empirical studies. In this paper, we investigate the state of practice in several companiestounderstandwhatthe costofmanagingTDis,whattoolsare usedtotrack TD, and howatracking process is introduced inpractice. We combined two phases:a surveyinvolving226 respondents from15 organizations and an in-depth multiple case studyin threeorganizations including 13 interviews and 79 Technical Debtissues. We selectedtheorganizationswhereTechnicalDebtwasbettertrackedinordertodistillbest practices.We foundthatthedevelopmenttimededicatedtomanagingTechnicalDebtis substantial(anaverageof25%oftheoveralldevelopment),butmostlynotsystematic:only afewparticipants(26%)useatool,andonly7.2%methodicallytrackTechnicalDebt.We foundthatthemostusedandeffectivetoolsarecurrentlybacklogsandstaticanalyzers.By studyingtheapproachesinthecompaniesparticipatinginthecasestudy,wereporthow companiesstarttracking TechnicalDebtandwhat theinitialbeneﬁtsandchallengesare.

Finally,weproposeaStrategicAdoptionModelfortheintroductionoftrackingTechnical Debtinsoftwareorganizations.

©²⁰¹⁸^TheÂuthors.^Published^byÊlsevier^B.V.^Thisîsânôpenâccessârticleûnder^the CCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Largesoftware companiesneed tosupport continuous andfastdelivery ofcustomer value both intheshortandlong terms. However, this can be hinderedif both the evolution and maintenance of thesystems are hampered by Technical Debt.

Technical Debt (TD) hasbeenstudied recentlyinthe softwareengineeringliterature [1–4]. TD iscomposed ofa debt, which is a sub-optimal technicalsolution that leads toshort-term beneﬁtsas well asto the future paymentof interest, whichistheextracostduetothepresenceofTD(forexample,slowfeaturedevelopment orlowquality)[5].Theprincipal isregardedasthecostofrefactoringTD.AlthoughaccumulatingTechnicalDebtmightproveusefulinsomecases,inothers, theinterestmightlargelysurpasstheshort-termgain,forexample,bycausingdevelopmentcrisesinthelongterm[6].

*

Correspondingauthor.

E-mailaddresses:antonio.martini@iﬁ.uio.no(A. Martini),besker@chalmers.se(T. Besker),jan.bosch@chalmers.se(J. Bosch).

https://doi.org/10.1016/j.scico.2018.03.007

0167-6423/©²⁰¹⁸^TheÂuthors.^Published^byÊlsevier^B.V.^Thisîsânôpenâccessârticleûnder^the^CC^BY-NC-ND^license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(2)

Inthispaper,wethereforeaimataddressingthefollowingRQs:

RQ1: HowmuchofthesoftwaredevelopmenttimeisestimatedtobeemployedinmanagingTD?

ItisalsoimportanttounderstandhowaTDtrackingprocessisintroducedandimplementedinlargesoftwarecompanies:

RQ2: TowhatextentaresoftwarepractitionersfamiliarwiththetermTechnicalDebt?

RQ3: TowhatextentaresoftwarepractitionersawareoftheTDpresentintheirsystem?

RQ4: TowhatextentdosoftwareorganizationstrackTD?

RQ5: IsthereadifferencebetweenindividualandcollectivemanagementofTD?

RQ6: DoesthebackgroundoftherespondentsinﬂuencethewayinwhichTDismanaged?

RQ7: WhattoolsareusedtotrackTD?

RQ8: HowdosoftwareorganizationsintroduceaTDtrackingprocess?

RQ9: WhataretheinitialbeneﬁtsandchallengeswhenlargeorganizationsstarttrackingTD?

Toshedlight onthesequestions,wehaveconductedasurvey in15organizationswith226participants,andwe have carriedoutamultiplecasestudyinthreecompaniesthathavestartedtrackingTD:Inthiscontext,wehaveinterviewed13 practitionersresponsiblefortrackingTDandanalyzed79TDitemsfromapoolof597improvements.Ourﬁndingsinclude thefollowingcontributions:

1. ThecostofmanagingTDinlargesoftwareorganizationsissubstantial,anditisestimatedtobe,onaverage,25%ofthe wholedevelopmenttime.

2. We list the tools that are currently used to track TD, andwe provide a ﬁrst assessment of which ones create less managementoverhead.

3. Wereport thestateofpracticerelatedtotheintroductionofaTDmanagement processin15Scandinavianorganiza- tions.

4. Wereport thelessonslearnedfromthreecompanies thathavestartedtrackingTechnicalDebt: theirstarting process, theperceivedbeneﬁts,andthechallenges.

5. WeproposeaStrategicAdoptionModelforTrackingTechnicalDebt(SAMTTD),aimedathelpingcompaniesassesstheir TechnicalDebtmanagementprocessandmakedecisionsonitsimprovement.Themodelalsodeﬁnesthenextresearch challengestobeaddressedintheoryandtobeevaluatedinpractice.

This paper adds new andmore in-depth results to the ﬁndings reported in a previous paper [10]. In particular, we addressnewresearchquestions(RQ2,RQ3,RQ5,RQ6,RQ7),whileweaddnewinsightsrelatedtotherelationshipbetween RQ4andRQ7 (or else,we studyhow thepractitioners’ perception oftracking Technical Debtis relatedtotheir usage of tools).

The remainder of the paperreports our methodologyin section 2, the resultsin section 3, andthen we discussthe resultsinsection4,concludinginsection5.

2. Methodology

For the execution of this study, we aimed at combining different sources of data (source triangulation) and different methodologies(methodology triangulation) to obtain reliable results [11]. Tofulﬁll these triangulationstrategies, we surveyed226participants.Thedifferentsourcesincluded15largeorganizationsanddifferentroles,that is,developers,ar- chitects,andmanagers.Tocomplementsuchquantitativeinvestigation,wefollowedupwithaqualitative,in-depthmultiple casestudyatthreeofthecompaniesinvolvedinthesurveyandthathavestartedtrackingTD.Here,weconductedinterviews with13employees,andwe analyzeddocuments including79TDissuesout ofapool of597improvementspresentatthe companies.

2.1. Survey

In this study, we have involved 15 software organizations belonging to eight distinct large software companies. We consider a largesoftwarecompanyan organization withmore than250 employees.As shownin the descriptive statistics

(3)

Table 1

KindsofTechnicalDebtrecognizedin[3,9,10].

Survey entries Source and literature term

Lack or low quality of testing Test Debt [3]

Low code quality Source Code Debt [3]

Lack or low quality of requirement Requirement Debt [3]

Lack or low quality of documentation Documentation Debt [3]

Dependency violations Architecture Debt [3,12]

Complex architectural design Architecture Debt [3,12]

Too many different patterns and policies Architecture Debt [3,12]

Dependencies on external resources/software Architecture Debt [3,12]

Lack of reusability in design Architecture Debt [3,12]

Uneasy/Tensed social interactions between different stakeholders Social Debt [3,13]

Lack of adequate environment and infrastructure during development Infrastructure Debt [3]

inTable2,91.6%oftherespondentsreportedworkingforanorganizationbigger than250employees.Theremaining8.6%

were consultants from small/medium organizations working on the same systems and projects developed by the large organizations participatingin thesurvey. Thelatter can,therefore, be considered asworkinginthe same context asthe other91.6%oftheparticipants.

Seven out ofeight companies developed embedded software, whileanother one developed software foroptimization (company D). The companies are anonymizedand named A-H, and the sub-organizations are called B1, B2, F1–F4, and G1–G4.

2.1.1. Surveydatacollection

Intheﬁrstpartofthesurvey,weaskedabouttheparticipants’backgroundinformation:

•Softwaredevelopmentexperience:“<2years,”“2–5years,”“5–10years,”“>10years”

•^Role:^“Product^Manager,”^“Project^Manager,”^“SoftwareArchitect,”“Developer,”“Tester,”“Expert,”“Other(Specify)”

•^Gender

•^Education

•^Team^size

•Organizationsize

•^Sizeôf^their^current^projectⁱⁿ^MLOC^(Millionsôf^Linesôf^Code)

Inthesecondpartofthesurvey,weaskedforandanalyzedthedatarelatedtotheeffortcausedbyseveralTechnicalDebt challenges. Tomakesurethattherespondentsdidnotmisinterpretthequestion,thechallengeswerelistedasreportedin currentliteratureandnotasgeneric“TechnicalDebt.”Table1reportsthedifferentkindsofTDtogetherwiththeirscientiﬁc names and therelated academicsource. Thisassured that a better construct validity ofour survey was achieved, aswe reducedthesubjectivityoftherespondentsinterpreting“TechnicalDebt.”

It is importantto noticethat the details andtheresults fromthe questionsin thesecond part ofthesurvey arenot includedinthispaperbecausethedatahasbeenusedtocoveradifferentscopeandtoanswerdifferentquestionsrelated toTechnicalDebtinanotherwork[14].Therefore,theonlyquestionsoverlappingbetweenthepapersaretheonesrelated tothebackgroundoftherespondents.

In thethird partofthe survey,we askedthefollowing questions,some ofwhichcan be mappeddirectly tothe RQs.

Someofthefollowingquestionsareinsteadstatements.Inthosecases,wehaveaskedtheagreementoftheparticipantsto suchaproposition.

Q1. “HowmuchoftheoveralldevelopmenteffortisusuallyspentonTDmanagementactivities?”

Q2. “Howfamiliarareyouwiththeterm‘TechnicalDebt’?”

Q3. “IamawareofhowmuchTechnicalDebtwehaveinoursystem.”

Q4. “AllteammembersareawareofthelevelofTechnicalDebtinoursystem.”

Q5. “Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem.”

Q6. “AllteammembersparticipateintrackingTechnicalDebtinoursystem.”

Q7.“IhaveaccesstotheoutputofthetrackingoftheTechnicalDebtinoursystem.”

Q8. “AllteammembershaveaccesstotheoutputofTechnicalDebtinoursystem.”

Q9. “IfyoutrackTechnicalDebtinyourproject,whatkindoftool(s)doyouuse?”

TheformulationofQ1 wasslightlydifferent, aswedidnotmention“TD,”butwereferredtothechallengesmentioned inthesecondpartofthesurvey(seeTable1).However,weusetheformulationinQ1intherestofthepaperforthesake ofreadability.

AfterquestionQ2,thesurveyincludedthefollowingdeﬁnitionofTechnicalDebt:

(4)

upatechnicalcontextthatcanmakeafuturechangemorecostlyorimpossible.Technicaldebtisacontingentliabilitywhoseimpactis limitedtointernalsystemqualities,primarilymaintainabilityandevolvability.”

Inourdefinition,weomittedthesecondpartoftheDagstuhldefinition.However,byenumeratingthedifferentkindsof TDinthefirstpartofthesurvey(excludingexternalqualitiesfromthequestionnaire),wecanbesurethatthesecondpart oftheDagstuhldefinitionwasalsocovered,althoughnotexplicitlymentioned.

Thisassuresthatwe providedtheparticipantswithagoodmeanstounderstandwhatTechnicalDebtmeant whenwe askedaboutitsmanagement.However,wecannotguaranteethatthepractitionersreadandunderstoodthedeﬁnition.

ForquestionQ1,sincewewantedtoquantifytheamountofeffortrelatedtoTDfacedbythecompanies,weprovideda scaleincludingthefollowingoptions:“<10%,”“10–20%”. . .“80–90%,”and“Idon’tknow.”Thisquestionwasaimeddirectly atansweringRQ1.

For Q2, we provided the answers “Not at all familiar,” “Slightly familiar,” “Moderately familiar,” “Very Familiar,” and

“ExtremelyFamiliar.”Theanswersweremappedona5-gradeLikertscale,respectively0–4.Thisquestionaimeddirectlyat answeringRQ2.

ForQ5–Q8,weaskedtherespondentstoreporttheiragreementona6-gradeLikertscale:“stronglydisagree,”“disagree,”

“somewhatdisagree,”andthesymmetric scaleforagreement.Thesestatementswere aimed atanswering RQ3,RQ4,and RQ5. In particular, we wanted to understand if tracking Technical Debt was an individual activity (by asking the same questions for the individual and aboutthe whole team) andif there was a discrepancy betweenthe awareness of the practitionersandtheirtrackingprocess.

As forquestion Q9, we askedtheparticipants to report thetools usedin aqualitative way(text-box). The inputwas thenpost-processedandcompiledintheresultingwordcloud.Thisquestionwasused,togetherwiththepreviousones,to answerRQ7.

2.1.2. Surveydataanalysis

First,weanalyzedtheanswersfromQ1tounderstandthemagnitudeoftheestimatedeffort spentbytherespondents onmanagingTD.Wetransformedtheanswersfromcategoricaltonumerical:forexample,weparsed“<10%”to5,“10–20%”

to15,andsoon.Afterthecalculations,wecanreapplythetoleranceintervalof+⁵/−^5,^and^the^various^means^and^so^forth wouldnot change.When calculatingthe means,we didnot consider the“Idon’t know” answers.However, only asmall portionoftheanswerswasofthiskind(11.5%).

Toavoidthebiasintroducedbydifferentrolesansweringthequestionnaire,weranacross-tabulationchi-squaretestof independencetounderstandwhethertheroleoftheparticipantsaffectedtheanswers.

Thesecondstepwastoapply frequencyanalysisonquestionsQ5–Q8.Todoso,wetransformedthecategoricaldatato aLikertscale(1–6),where“stronglydisagree”wasmappedto1and“stronglyagree”to6.AsforQ5,wealsoreportedthe groupedanswers inthreemainintervals, “Notracking”{1–2}, “Somewhattracking” {3–4},and“Tracking”{5–6}. Weused theseaggregatedintervalsonlyforthelastresultsrelatedtotheadoptionmodelSAMTTD.

Forsomeoftheresults,weusedastandardboxplot.Theboxplotisacomprehensivewaytovisualizevariousdescriptive statistics altogether at a glance. We used thismethod whenwe aimed atshowing the difference aboutthe distribution of the data with respect to two speciﬁc variables. Forexample, Fig. 4 showsthe comparison, with respect to different companies,ofthedistributionofthemanagementeffort:Wecancomparethemedians(theblacklinesinthemiddle),but wecanalsoseedifferentpercentiles(wheremostoftheanswerswereconcentrated)andoutliers.

Inothercases,wecomparedthedifferentvariablesusingstatisticaltests.Forexample,itseemedinterestingtocompare how muchthe respondentswere aware ofTD withrespectto howmuch they were trackingit. Todoso, we performed a number of tests forlinear correlation using the tool R. Most of the numerical variables did not have a strong linear correlationwitheachother,excepttheanswers forQ5,Q6,Q7,andQ8. Thisisnotsurprisingbecause,ifTDisnottracked byanindividual,itisprobablynottrackedbytheteam,andtheoutputwillnotbevisibletotheindividualortotheothers aswell.ThePearsontestsforlinearcorrelationgaveresultsfrom0.72upto0.89with p-valuevastlylowerthan0.05.This canbeconsideredagoodtestforthereliabilityoftheanswers.Sincethesevariablesallstronglycorrelate,intheremainder ofthepaper,whenstudyingdifferentvariables,we willuseonlythe“tracking” variablewithoutconsideringwhetherthe outputwasavailableornot.

We also wanted tounderstand whetherthe results dependedon a speciﬁc variable.For example,we tested whether developers answered differently fromarchitects or managers. Thus, to answer RQ6,we ran several chi-squared testsof independence between the background variables of the participants and their answers related to questions Q5–Q8.For example,wewanted toknowifthefamiliarity,awareness,tracking,andsoonoftherespondentswoulddependontheir background,suchasbytheiraﬃliationwithacompany,theireducation,andsoon.ThisanalysiswasdonetoanswerRQ6.

(5)

We ﬁnallyanalyzed thequalitativeanswers fromQ9to understandbetterthe resultsansweringRQ7.We selectedthe answers inwhichtherespondentsreportedthattoolswere explicitlyused(61/226,27%oftherespondents),andwecom- paredtherespectivelevelsofawareness,tracking,andfamiliarity.Thiswasdonetounderstandbetterwhattherespondents meantby“tracking.”Wealsocreatedaword-cloudrepresentationofthequalitativeanswersforQ10.This,wefound,could representquitewellwhichtoolswerethemostusedandinwhatway.Todoso,weprocessedthequalitativedata,remov- ingtermsthatwouldappearinthewordcloudbutwouldnotmakesensefromthetoolpointofview,forexample,“code”

and“TechnicalDebt.”Finally,fromthecodingofthequalitativeanswers,wecouldalsoidentifythefrequenciesofthetools used.Todoso,wemanuallycodedthe61answersinthefollowingsixcategories:

•^Comments:^Theseâre ûsually^“TODO”^comments,^left^by ^the^developersⁱⁿ^the^codeôrôther ârtifacts.^Theseâreûseful for the developers to know that something is left to do, but it does not imply a systematic monitoring of the TD reportedinthecomments.

•Documentation:Fromthequalitativeanswers,thisrepresentsatextorspreadsheetwhereissuesarelistedandexplained inasemi-systematicformat.Anotherexamplecouldbeawiki.However,suchdocumentationisdifferentfromabacklog asitismorediﬃculttomonitor,anditdoesnotuseaspeciﬁctechnologyto manageandperformoperationsonthe backlog.

•Îssues:ûsing^the^same^ticket^system^for^bug^fixing,^butûsuallydown-prioritizingtheissuesrelatedtoTechnicalDebt.

•^Backlog: ^Thisîsêitherâ^dedicated^backlog^for^TDîssuesôr^theûsual ^feature^backlog^where^TDîtemsâre^mixed^with features.Thispracticeusuallyinvolvesatechnologysuchasprojectmanagementtools.

•^Staticânalyzer:^Theseâre^tools^suchâs^SonarQube,SonarGraph,Klockwork,andsoonusedtoanalyzethesourcecodein searchforTechnicalDebt.Inafewcases,respondentsreportthattheybuilttheirownmetricstools.Thesetoolsusually check(language-specific)rulesorpatternsthatcanwarnthedevelopersofthepresenceofTD.Thesetoolsareusedas trackersbythedevelopers,withthelimitationthattheycoveronlypartoftheTD.

•^Lint:^Theyâre âlso^staticânalyzers^butâreûsed^more^to ^find^potential^bugsând^securityîssues^rather^than ^technical debt.

•^Test^coverage:^Some^of^therespondentsmeasuretestcoverage,andtheyconsideralowtestcoverageaspresenceoftest debt.

2.2. Multiplecasestudy

TounderstandbettertowhatextentcompaniestrackedTD(RQ4)andhowthetrackingprocesswasintroduced(RQ8–9), we conductedamultiplecasestudy,investigatingsomeofthecompanies involvedinthesurvey.Weinterviewed13em- ployees fromcasesB1 (project manager, systemarchitect, andtwo developers), F1 (three softwarearchitects responsible forTD managementinthree differentteams), andF4(twosystemarchitects,two projectmanagers,andtwodevelopers).

In particular,tounderstandwhat wasconsidered “goodtracking,”we hadtheopportunities tointerviewtheparticipants, belongingtocompanyF1,whoanswered“stronglyagree”(thehighestleveloftracking)toquestionQ5.Thisgaveusanidea ofwhatwasconsideredascurrentbestpracticesfortrackingTD.Tosupporttheinterviews,wealsoanalyzed79outof597 TDbacklogitemsusedfortrackingimprovements(andthusincludingTDitems)incompaniesB1,F1,andF4.

2.2.1. Interviews

2.2.1.1. Datacollection The interviewquestionswere designedto covertaxonomieswe foundinthe pre-studyconcerning thereasonforinitiation,theactivitieswithintheTDmanagementprocess,andtheprocessimplementation.Allinterviewswere audio-recorded,andtheresultsoftheinterviewswereorganizedbydifferentquestionsandactivitiesforlateranalysis.

Weformulatedtheinterviewquestionsinthreesections.

•^The^first^section^contains^questionsâbout^the^profileôf^theintervieweesandtheircompanies.

•^The ^second ^section ^focused ^the ^questions ^on^the initialization ofthe process formanaging TD.“What was the main reasonforimplementingaTDmanagementprocess?”“Whodecidedthattheprocessshouldbeimplemented?”“What negativeeffectsdidyouexperienceinyoursystemduetoTD?”(RQ8).

•În ^the^third^section,^we âskedâbout^the ôutcomeôf^the implementedprocess (RQ4)andhowthe companiesexperi- encedtheimplementationoftheprocessintermsofthemostobviousbenefitsandchallenges(RQ9).

2.2.1.2. Dataanalysis Thedataanalysisusedaninductiveapproachbasedonopen coding[17].Wewere lookingforactiv- ities relatedtotheintroductionofaTDmanagementprocess inthecompany. Forthispurpose,wefollowedthepoints in [18],whichisawell-knownstudyonchangemanagementinsoftwareengineering.ThedatawerecodedusingaQualitative DataAnalysis(QDA)softwaretoolcalledAtlas.ti.Suchatoolsupportskeepingtrackofthelinksbetweentaxonomies,codes, andquotations. Basedon thetaxonomies,we developedacodingscheme thatcontains acorresponding setofcodes and sub-codes. Fig.1showsanexampleofourcodehierarchyandhowthecodesweremappedtothetaxonomy.Thegraphis partoftheoveralldatacollectionmodel(notcompletelydisplayedhereforspacelimitations).

(6)

Fig. 1.The coding process.

Fig. 2.Number of participants per organization.

As an example of how the coding was conducted, we present a quotation from one of the intervieweeswhich was mappedto theMotivationsub-code.“Werealizedthatforeachandeveryreleaseittookmuchtimecorrectingorﬁxingproblems withadditionalpatchesandittookmoreandmoretimeaddingnewfeaturesontopofthesystem.”

2.2.2. Documentanalysis

TogainmoreevidenceonhowthecompaniesweretrackingTD(RQ4),weinvestigatedtheexistingdocumentation.Also, wehadaccesstotheTDbacklogs ofthestudiedteams:26itemsintheorganizationB1,451itemsinF1,and20itemsin F4.WeanalyzedtheTDitems’ﬁelds,values,andhowtheywereranked.WedidnotanalyzeallitemsincompanyF1,as451 itemsalsoincludedimprovementsthatwerenotTD.Werandomlyselected30itemsthatcorrespondedtothedeﬁnitionof TD; weanalyzedthem,andthenwe testedourassumptions byrandomly lookingatotheritemsinthebacklog.Weused thebacklogsintheinterviews(see previoussection)toaskfollow-upquestionsoftheparticipants.Also, weanalyzedthe documentationthatwascreatedbytheorganizationstoexplainTDtotheusersofthetrackingprocess.

3. Results

3.1. Demographicsandbackgroundoftherespondents

Intotal,weobtained226completeanswers.Thetotalrespondentswere 259,whichgivesusacompletion rateof87%.

We aimed athaving a similar number of respondents fromeach organization (Fig. 2). The participants were almost all experienced practitioners, since 156 respondents (69%) had more than 10 years of experience, while only 8 (3.5%) had lessthan two yearsof experience (the remaining 62,27.5%,had betweentwo and10years ofexperience). Severalroles participatedinthesurvey:37managers(16%),52softwarearchitects(23%),105developers(46%),seventesters(2.65%),14 experts(5.75%),andninesystemengineers(4%)completedthesurvey.

AsshowninTable2,wecaninferthefollowingcharacteristicsofthestudiedsample:

• Experience:Mostoftherespondentshadmorethantwoyearsofexperience,while69%ofthemhadmorethan10years ofexperience.Theestimationscan,therefore,beconsideredreasonablyreliable,astheyaremadebyexpertpractitioners usedtoestimatingtheirwork(morediscussioninthethreatstovaliditysection).

• ^Education:^Most^of^therespondentshadabachelor’sormaster’sdegree.Thelevelofeducationisthereforequitehigh.

However,thesampledoesnotincludemanypractitionersinvolvedinresearchprojects.

• ^Team^size:Âlthough^manyôf^the^teamsâre^small^(1–10^members),^the^sampleîncludesâsubstantialnumberofrespon- dentsworkinginlargeteamsaswell.

(7)

Table 2

Backgrounddatarelatedtotherespondents,withthepercentage,thenumberofrespondents,and the relativedistribution.

•Organizationsize:As mentionedintheanalysismadeinsection 2.1,theorganizationoftherespondents islarge.This waschosenbydesign.Wewantedtorestrictourresultstolargeorganizations.Thisimposesalimitationonourstudy:

wecannotgeneralizetheseresultstosmallorganizations.

•^Age^of^the^current^system:^Thedistributionofthedifferentsystemsisquiteeven,asthesamplecoversalmostequallyall thedifferentphasesofthesystem.Thisraises thedegreeofgeneralizabilityofourresults,asitassuresthat ourdata coverboth“young”and“old”systems.

3.2. EstimationofmanagementcostofTD(RQ1)

First,wereporttheanswerstoQ1fromthesurvey.InFig.3,weshowthedistributionoftherespondentswithrespect tothedifferentlevelsofestimatedeffortthatwerereported.Bypickingthemiddlevalues,asexplainedinthemethodology section (e.g.,10–20%was transformedinto15),wecalculatedthattheaveragecost ofmanaging theTDwasestimatedby 215respondentstobe25.9%withamedianof25%ofthewholedevelopmenttime.

Fromtheresults,wecanseehowmostoftherespondentsansweredbetween0and40%,whilehalfofthemarebetween 10and30%.However,somerespondentsreportspendingmorethan40%oftheirtimemanagingTD.

Looking atthecomparisonofmedians(boldlines)andpercentilesamongthecompanies(boxplotinFig.4),wecannot seeabigdifferenceinhowtherespondentsanswered,apartfortheslightdifferenceforE,F1,andF3.Thismeansthatthe amountoftimespentmanagingTDisquitenotdependentontheorganization.

A chi-squaretest ofindependence, aggregatingthe intervalsover50% inthe samecategory (the lackofvalues would haveinvalidatedthechi-squaretest)yieldeda p-value of0.144, sowecouldnotrejectthehypothesis thattheroleofthe

(8)

Fig. 3.Distribution of respondents for Q1: “How much of the overall development effort is usually spent on TD management activities?”

Fig. 4.Comparison of companies with respect to Q1: “How much of the overall development effort is usually spent on TD management activities?”

Fig. 5.Distribution of respondents according to their answers to Q2: “How familiar are you with the term “Technical Debt”?”

respondentswouldinﬂuencetheiranswer.Thismeansthattheanswersdidnotvarysigniﬁcantlyacrosstheroles,contrary towhatonemightexpect,consideringdifferentviewsandexperiencesofdifferentrolesintheorganizations.

3.3. Familiaritywiththeterm“TechnicalDebt”(RQ2)

Therespondentsseemtobe,intotal,moderatelyfamiliarwiththetermTechnicalDebt(Fig.5).Themeanis2.26,while themedianis2.Fromthegraphbelow,we canseethat therearemorerespondentswhoare veryfamiliar withrespectto theotherones.

Fromthecomparisonamongthecompanies,wecanseehowtheyaremostlyonthesamelevel:F4isabovealltherest, whiletheorganizationsB2andG4arenotveryfamiliarwiththeTDconcept.However,sincewedidnothaveaccesstothe practitionersworkinginthesetwoorganizations,wecannottellwhatthecauseofthislackoffamiliaritywas.Weomitthe testofindependence,astheresultsareclearlyvisibleinFig.6.

3.4. AwarenessofTechnicalDebtpresentinthesystem(RQ3andRQ5)

WhenassessingthelevelofawarenessoftheTDpresentintheirsystem,therespondents,onaverage,somewhatagree thattheyareawareofhowmuchTDtheyhaveintheirsystem(mean=^3.69,^median=^4).Âlmost^halfôf^them^(45%)^some- whatagree,whileonly21%feel moreconfident(theyagreeorstronglyagree)andtheremaining 32%disagreeorsomewhat disagree.Only3%oftherespondentswerenotawareofTD.

Ontheother hand,thepractitioners seemed lessconvincedthat thewholeteam wouldbeaware ofhowmuchTD is presentinthesystem. Here,themeanis2.8,whilethemedianis3,bothclosetoamilddisagreement.Thecomparisonof theanswersisreportedinFig.7.Thechi-squaretestofindependenceconﬁrmedthatthedistributionsarenotdependent, witha p-value<2.2e–16.

(9)

Fig. 6.Level of familiarity with the term Technical Debt for each organization (answering Q2: “How familiar are you with the term ‘Technical Debt’?”).

Fig. 7.ComparisonofanswersforQ3:“IamawareofhowmuchTechnicalDebtwehaveinoursystem”(IndividualAwareness)andQ4:“Allteammembers areawareofthelevelofTechnicalDebtinoursystem”(TeamAwareness).

Fig. 8.DistributionofanswerswithrespecttoQ3:“IamawareofhowmuchTechnicalDebtwehaveinoursystem.”1–6correspondto“stronglydisagree”

to“stronglyagree.”

Forwhatconcernsthedifferentcompanies,they arequitealignedontheawareness amongeach other.Onceagain,B2 seems to haveasomewhat lower levelofawareness.The resultssuggest thatbelongingto oneorthe other organization wouldnothaveanimpactonthelevelofawarenessoftheiremployees(Fig.8).

3.5. TrackingTechnicalDebt(RQ4)

Inthissection,wereporttheresultsfromQ5:“Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem.”

Theaveragetrackinglevel,reportedby219respondents,is2.3withamedianof2.Ontheteamlevel,itseemedtobejust slightlyworse,asshowninFig.9anddiscussedbelow.

Based ontheresultsofachi-squaretestofindependencebetweentherole andthetrackinglevel,wecould notreject the hypothesis (p-value 0.63)that the role ofthe respondentswould inﬂuencetheir answer withrespect totracking. In Fig.10,weshowthecomparisonamongdifferentcompanies.Wecanseehowthedifferentcompaniesansweredsimilarly, apartfromcompanyF4andpartlycompanyD.However,thetestforindependencedidnotshowanysigniﬁcantrelationship betweenthevariablecompanyandtheanswergiveninthesurveywithrespecttoQ5(trackingTD).

Finally, there is very little difference between tracking on an individual (Q5) or team level (Q6). Only some of the individualstrackTDmorethantherestoftheirteam.ThisisstronglyconﬁrmedbyaWilcoxontest,whichrejectedthenull hypothesis(p-value=^2.008e–05)^that^the^differenceⁱⁿ^the^two^paireddistributionsisgivenbychance.Inotherwords,the sameparticipantansweredverysimilarlywhenaskedQ5andQ6,andthisisnotbecauseofrandomness,whichmeansthat ifsomeoneintheteamtracksTD,itisveryprobablethatthewholeteamisinvolvedinthetracking.

Finally,asobservedinthemethodologysection, theresultsfromQ7andQ8(relatedto whointheteamhasaccessto theoutcomeofTDtracking)verystronglycorrelatedwiththeanswerstoQ5,sowedonotreporttheexactresultshere.In otherwords,thismeansthattherespondentswhotrackTDalsohaveaccesstoitsoutput(e.g.,backlogs,dashboards,etc.).

(10)

Fig. 9.DistributionofanswersrelatedtoQ5:“Itrack(usingtools,documentation,etc.)TechnicalDebtinoursystem”andQ6:“Allteammembersparticipate intrackingTechnicalDebtinoursystem.”

Fig. 10.Distribution of answers for Q5: “I track (using tools, documentation, etc.) Technical Debt in our system.”

3.6. InﬂuenceofthebackgroundofrespondentsonthemanagementofTD(RQ6)

WehavepartlyansweredRQ6(“DoesthebackgroundoftherespondentsinﬂuencethewayinwhichTDismanaged?”) in the previous sections, especially withrespect to the variables roles and organizations. However, we hadseveral other variablesinthebackgroundsection,andweinvestigatedwhetheranyofthosevariableswouldhelpinunderstandingwhat causesamoreorlessmatureTD tracking.Toanswerthisquestion,weranseveralstatisticalchi-squaredtestsofindepen- dencebetween thebackground variables(education, team size,etc.)and thevariables of interest(familiarity, awareness, andtrackingofTD).However,noneofthestatisticaltestsyieldedasigniﬁcantanswer.Technically,wecouldnotrejectany hypothesesforwhichtheanswersweredependentonthebackgroundoftherespondents.Sincetheresultswouldinclude severalcombinationsof p-valuesthatwouldnotaddanymeaningtothemanuscript,wedecidedtoomitsuchatable.

Inconclusion,themanagementofTDdependsonsomefactorsthathavenotbeencapturedbythesurveyedbackground variables.However,inthenextsections,weprovidesomeanswersthatcouldnotbefoundinthequantitativedatabutseem toberelatedtothehistoricalandsocialcontextwheretheparticipantswork.Moreinformationisgiveninsections3.8and 3.9.

3.7. ToolsusedtotrackTechnicalDebt(RQ7)

Inthissection,weanalyzedwhethertherespondentswhousedsometoolstotrackTDwere alsomoreawareofTDor trackeditmorethan theothers.Todoso, weconsideredonly theanswersfromthe61participants(27%)who answered thequalitativequestionQ9(specifyingwhattoolsthey used).WealsoreporttheboxplotforthequestionsQ5andQ3:we compared theanswers of theparticipants whoused a tool withthe oneswho didnot. Fig. 11illustrates the resultswe found: “Awareness”isthe awarenessoftherespondentswhodidnotuseatool, while“Awareness_Tool”istheoneforthe onesusingatool(samefor“Tracking”). Itseemsthat,indeed,iftheparticipantsuseda tooltotrackTD,thentheywould reportahighperceptionoftrackingTD.Achi-squaredtestofindependenceconfirmsastrongdifferenceinthedistribution oftheanswers(p-value<2.2e–16),stronglyconfirmingthisclaim.However,moresurprisingly,theirperceptionofthelevel of awareness ofhow much TD ispresent in thesystem would onlyslightly change. Thisis confirmed by a chi-squared testofindependence(p-value of0.59),whichdidnotshowanydifferenceinthedistributionoftheanswers betweenthe participantsusingatoolornot.Verysimilarresultswerefoundattheteamtrackinglevel,sowedonotreporttheminthe boxplotbelow.

Giventhe highdifferencein trackingbetweenthe respondentswho claimedtouse atool andthe oneswho didnot, we can safelyclaimthat the respondentstrackingTD alsouse atool. This resultconﬁrms thatwe captured mostofthe respondents’answers relatedto thetool that they used.The respondents whodid not inputan answer forthetool also most probablydon’t use any tool, since they have in general a much lower level of tracking.Therefore, we can further validatetheresultthatonly26%oftheparticipantsusedatooltotrackTD.Thisisalsoimportantforthereliabilityofour resultsrelatedtotheSAMTTDmodelexplainedinthenextsections.

(11)

Fig. 11.Distribution of answers for Awareness and Tracking and comparison if a tool is used or not.

Fig. 12.Word cloud of the tools used by the participants to track TD.

Fig. 13.Number of participants using a speciﬁc kind of tool.

Fromthequalitativedata,wecouldalsoreportwhattoolswereusedinpractice.Afterremovingsomeofthewordsthat would justcreatenoise (suchas “TechnicalDebt,” seemethodology section formoredetails), we obtainedthe following wordcloud,whichshowsthedistributionoftoolsusedamongtherespondents(Fig.12).

Bycodifyingthe qualitativeanswersincomments,documentation,issues,backlog,staticanalyzer,lintandtestcoverage,we canalsoanalyzethefrequencies.Wecanseehowthetoolthatwasmostlyusedisabacklog(dedicatedtoTDorthesame usedforfeaturedevelopment),followedbydocumentation,staticanalyzersandissuetrackers(Fig. 13).

We then analyzedthe distributions ofthe respondentsforAwarenessand Trackinglevels (Fig. 14) withrespectto the differentkindsoftools.Ontheother hand,byanalyzingthekindofusedtool withrespecttothe meanamountofeffort spent onmanagementactivities(Fig. 15), wecanseeaquitecleardifference.Althoughthisdifferencecouldnotbestatis- tically tested(thechi-squaretestsdidnotreportsigniﬁcantdifference,butthiscouldbeduetothesmallsample),backlog andstaticanalyzersaretheonesthatseemtocreatelessoverhead.

Inconclusion,thefollowingconsiderationsonthetoolscanbemade:

•^Comments ⁱⁿ ^the ^code ^help âwareness, ^but ^they âre ^not ^considered ^tracking, ând ^they âre ûsed ^by ^just ^1% ôf ^the respondents.Thisisprobablybecausetheyarenotusedinadocumentthatcanbemonitoredbytheteamoutsidethe code.

(12)

Fig. 14.Distributions of levels of TD tracking (“_tr”) and awareness (“_aw”) reported to the user of each kind of tool.

Fig. 15.Mean of management effort for each kind of tool.

• Documentation increasesTD awareness, but it is not considered as a high level of tracking, andit has the highest overhead.ThemaintoolsusedherewereMicrosoftExcelorWord.Wecaninferthatthispracticeisnotrecommendable incomparisonwiththeotherones.

• Ûsingâ^bug^system^for^tracking^TDîs^not^consideredâscontributingtoabetterlevelofawarenessortrackingcompared to theother techniques,andit hasa slightlyhigher overhead.We wouldinfer that thisis also not thebest wayof trackingTD.

• ^Backlogs,^staticânalyzersând^“lint”^programsâllîncrease^the^tracking^level,^but^we^cannot^seeâ^big^difference^(although staticcodeanalyzersseem tocontribute betterto theparticipants’awareness). Theyare alsotheoneswiththeleast overhead.TheythereforeseemtobeconsideredthebestpracticesatthemomenttotrackTD.

• ^Backlogsâre^the^mostûsed^toolâmong^theparticipants.Inparticular,themostusedbacklogtoolsareJira,Hansoft,and Excel.

• ^Test^coverage^does^not^seem^to^contribute^too^much^to^theâwarenessând^tracking^level,âlthoughît^does^not învolve muchoverhead.ThismightbebecausetestcoverageisrelatedtoonlyasmallpartofTD.

3.8. WhyandhowdocompaniesstarttrackingTD?(RQ3)

First, we report why the companies decided to start tracking TD, or else their motivation. Then, we found that the preparationactivitywascriticaltostarttrackingTD,andwe,therefore,reportthemainstepsinvolvedinthispractice.

3.8.1. MotivationforstartofTDtracking

ThemainreasonsbehindthestartoftrackingTDwererelatedtoexperiencingtheinterestofTD,orelsethereweretoo manybugstoﬁx,decreasedfeaturedevelopment,performanceissues:

“Becausewerealizedthatforeachandeveryreleaseittookmuchtimecorrectingorfixingproblemwithadditionalpatchesand ittookmoreandmoretimeaddingnewfeaturesontopofthesystem.[. . .] Thesystembecamemoreandmoreinefficient.” These statementsconfirmourpreviousresults[6], asoneofthearchitects alsomentioned:“AftersometimetheTDwasincreasing andwehadacrisissituation.”

(13)

In other words, themain motivationwas related tothe negative impact experienced by thepractitioners, orelse the perceptionoftheinterestassociatedtotheTD.

3.8.2. Preparationofthetrackingprocess

Fromthecasesinvestigated,itwasclearthatadoptingaTDtrackingprocessrequiressomeinitialactivitiesandtimeto implementtheprocess.From B1,weunderstoodthatthey“Havedonethisfor1.5yearsmoreorless,switchingfromreactiveto moreproactive.It’sabetterinformationaboutthestatusofthesystem.”Thepreparationincludesthefollowingaspects.Although we used [11] tocodetheseresults, weprefer toreport them ina waythatis morereadable inthecontext ofTechnical Debtmanagement:

•Initiative—Inallthethreecases,thetrackingprocessstartedfromanindividualinitiative.Amanager,asystemarchitect, anexperienceddeveloper,andsoforth.Inotherwords,trackingTDrequiresachampioninthesub-organizationwhois awareofTDandiswillingtopromotetheadoptionofthepractices.

•Budget—Tracking TDneeds both aninitial effort andacontinuous effort. CompanyB1startedwith150hours,in the beginning,foradevelopment unit(i.e.,asub-organizationresponsibleforasub-system,whichincludesa fewteams).

However, this was “okjusttostartthebacklog,butnottogoindepthinvestigation.” The continuous time allocated to trackingTDvariedacrossourcases:itrangedfrom10%(companyF)to30%(companyA).Thecasesalsoshowhowthe continuousallocationofresourcestomanageTDcouldbedynamic,andvaryingaccordingtonewlyidentiﬁeditems,as suggestedforArchitecturalTDin[19].

•^Management involvement—Although the initiativecan start fromanyoneintheorganization, trackingTD requiresan initialandacontinuousinvestment(budget).Thisentailstheneedofinvolvingamanagerwhounderstandstheimpor- tanceofTDandwhocangrantabudgetforthisactivity.

•Benefits—As theprevious pointentails,thereisaneed,forthemanagementtounderstandthebenefitsoftrackingTD giventheinitialandcontinuousbudgetallocation.Suchbenefitsneedtobecommunicatedandcontinuouslyevaluated tojustifysuchinvestment.

•Measurementsetup—According tocompanyB1,anamountoftimeisneededtosetupmeasurements(e.g.,complexity) and TD identiﬁcation (static code analyzers). In other companies, such as F1, we found that a developer set up a speciﬁc analysistoolto measurecomplexity andbugdensity:thisactivitywas supportedby ateamdedicated tothe measurementsintheorganization.

•Explanationandalignment—TheChampionfortheTDtrackingactivityneedstocommunicatewelltotheteamswhatTD isandwhatneedstobereported(toavoidoverhead).Theintervieweesmentionedthattheyconductedaﬁrstworkshop forexplainingTDanditstracking,andtheyalsoproducedsomedocumentation.Itisalsoimportanttohaveavalidation workshopinwhichtheteamsbringupsomeTDissuestoaligntheirunderstandingwiththemainTDconceptssuchas PrincipalandInterest.

•Âppointingôfâ^Sub-System^TDResponsible(SSTR)—TDtrackingneedssomeoneresponsibleacrosstheorganizationwho cantaketheinitiativetosupportthetrackingprocess.Inallthestudiedcases,thepeopleresponsibleforcollectingand maintaining a list of TD issues were chosen as experienced developers on a given sub-system. The sub-system TD responsible,however,needstobesupportedbytheknowledgeoftheteamswhentrackingtheissuesbecausedifferent practitionershavebetterandmoredetailedviewsofdifferentpartsofthesystem.

•^Breaking ^down ^anddistributingTD items—The SSRTneeds to allocate the TD itemsto the teams accordingto their competences andtheir responsibilitieswithrespectto thesystem. Architectureitemswere explicitlyappointedtoan experienceddevelopertobeanalyzedandestimated.

•Communication of TD tomanagement—Once the ﬁrstTD backlog was prepared, itwas communicated toa manager connected to the evaluated (sub-)system. This was supposed to show management the risk associated with such a systemduetoTD.

Insummary,quiteafewactivitiesarenecessarytosetupaTDtrackingprocess;thisrequirestheorganizationstotake theinitialdecisionofallocatingsomebudgettoTDtracking.

3.9. WhatarethebeneﬁtsandchallengesoftrackingTD?(RQ4) 3.9.1. Beneﬁts

When weevaluated thetrackingprocesstogether withtheteams,theymentioned severalbeneﬁtsoftrackingTD.The backloggavethemalong-termperspective,notonlytheshort-termonegivenbythefeaturebacklog.Therespondentsdid notthinkthattheTDbacklogwashardtomaintain.Thisissupportedbythelowermanagementoverheadreportedinthe surveywithrespecttotheotherpractices.

One ofthearchitects inorganizationF4mentioned that,afteran importantarchitecturalTDitem was refactored,“The evidencewasvisibleinthenextreleasewithpositiveimpactwhenaddingnewfeaturesontopoftheoneweﬁxed.Easiertoaddand nosideeffect,cleanerarchitecture.”According tothe projectmanagerinterviewed incompany B1,theinitiative was overall successful,butitneededtobecontinuouslysupported,tobereallyeffective.“Yesitwasworthit,butitisimportanttofollowit upnowandtomakesurethatpartsofthelistaredone[refactored].”

(14)

3.9.2. Challenges

Althoughtherespondentsmentionedseveralbenefits,someissueswiththecurrentapproacheswerealsoreported.The mostimportantone was theacceptance,from themanagers, oftheneed forrefactoring.Even withthelistupdated, the informationabouttheriskandbenefitsofperformingarefactoringwasnotalwayscleartothemanagers.Thismeantthat, especiallyforlargeTDitems,itwasdifficulttoreceivetheneededbudgetforTDrepayment.

OneofthemajorproblemsinstartingtotrackTDwasthattheﬁrststepneededasubstantialamountofefforttocollect all the existing items. Although this wouldbe only a one-timeeffort, in some teams the managers wouldnot concede thenecessarybudget. Achallengementioned byall theparticipantswas that therefactoringbecamemorediﬃcultto be prioritized andcompletely repaidwhen severalitemsandseveralteams were involved.It required“double” theeffort to prioritizetheitemwithdifferentmanagers(whocoulddisagreeonthenecessityofrefactoring)andthecoordinationofthe refactoringwasconsideringquiteriskyandasadangerousoverhead.Forexample,TDissuesinvolvinginterfacesweremore time-consumingtoestimateandprioritize,becausetheyrequiredmorediscussionsinvolvingmorestakeholdersfrommore teams.

Anotherchallenge in the prioritization activitywas thedifficulty ofprioritizing among TD items, especially where an explicit risk/impact value was not calculated.The participants reported that it was generallydifficult to show an actual gainfromthecost/benefitanalysistothemanagers,evenwithafield explicitlyrepresentedinthebacklog.Ingeneral,the intuitivevaluesusedfortherisk/interest(butusuallynotincludingasystematiccalculation)wereworkingonlysometimes, andmoreexplanationsandindicatorswererequiredbythemanagerstoacceptacostlyrefactoring.

Therespondentsmentionedthediﬃcultyofcoordinatingthedifferentteamsinusingastandardizedprocessfortracking TD.In some cases,itwas diﬃcultto “makethemcare”aboutreportingTD, whileforother teams theTDlist wascreated withenthusiasm.

Finally,theparticipantsmentionedthatinsomecasestheTDbacklogitselfdidnotmaketheTDmoreconvincingforthe managementtoberefactored,butitservedfortheteamstoremembertotakecareofTD,whichwouldotherwiseremain invisibleandoverlooked.

3.10. StrategicAdoptionstrategy

Asaﬁnalresultfromthecombinationofthevariousanalysesperformedsofar,weaggregatedtheresultsandcombined them with the roadmap related to the current literature on TD. This led to the Strategic Adoption Model for Tracking Technical Debt (SAMTTD,Fig. 16).The ﬁrstfour steps inthemodel representtheresultsfromthe survey onthe current stateofpracticeinthecompanies.

We usedthe resultsfromQ4 tocreatethe ﬁrst step:If therespondents were notfamiliar withthe TDconcept, they could beon ahigherlevel.Then, wedeﬁnedthreemore levelsofTD trackingmaturity.Todiscernbetweenthedifferent levels,wemappedpracticesthatwefoundusedornotandthatcorrelatedwithdifferentlevelsoftracking(e.g.,theusage ofatool).Weadditionallyusedtheresultsfromtheinterviewswhereitwasclearwhatdifferentpracticeswereintroduced totrackTD.

• Ûnaware:^There îs ^noâwareness ôf^what ^Technical^Debt îs ând^therefore^how^to ^manage ît.Âccording ^toôur ^survey data,only8.4%oftheparticipantsareinthisstage.Thisdatumisrelatedtotherespondentsthatanswered“Notfamiliar atall”withthetermTechnicalDebt,asvisibleinFig.5.

• ^No^tracking:În^this^stage,^the^softwareêngineersâreâwareôf^the^TD^metaphor,ând^thereîsâ^generalunderstandingof thenegativeeffectsbroughtbyhavingTDinthesystem,butthereisnoinitiativetotrackTD,whichremainsinvisible.

Around65.6%oftherespondentsreportbeingonthislevel,by(strongly)disagreeingabouttrackingTD.Thepercentwas calculatedbycountingthetotalanswersminus theanswers fromQ4,countedpreviouslyastheunaware respondents, andtheoneswhousetools,countedinthenextlevels(26%).Therefore,thisyielded100–8.4–26=^65.6.

• Âd-hoc:În^this^stage,^the^softwareêngineersâreâwareôf^what^TDîs,ând^someôf^theindividualshavestartedtracking TDontheir own.Thismakes theTDmanagementprocess ad-hoc,since,withoutadedicatedbudget,such individuals use what is available, interms of toolsand processes, forother activities. Forexample, accordingto the qualitative answersrelatedtoQ3,thesprintorproductbacklog,acommonissuetrackerorasimpleexcelspreadsheetcanbeused fortrackingTD.Static analysistoolsmightbe inusebutare limitedtotheindividual usage.Accordingtothe survey, approximately26%oftherespondentsareatleastonthisstage(61participants,26%,wereusingtools,seesection3.6).

However, fromtheseones, we needtotake awayaround 7% thatwe placeonthenext level(see point). Intotal,we thereforereportaround19%ofrespondentsonthislevel.

(15)

Fig. 16.The Strategic Adoption Model for Tracking Technical Debt: the main milestones and the state of practice (% of respondents per category).

•^Systematic^tracking:^The^companyⁱⁿ^this^level^hasacknowledgedtheimportanceoftrackingTDalsoonamanagement level (see Preparationsection). Therefore, there is a budget generically associated with the management of TD. This amountusuallyrangesbetween10%and30%.AccordingtothedocumentanalysisoftheTDitemsfromthecasestudy, aspecificbackloganddocumentationrelatedtoTDisnecessary,withTD-specificvaluesusefultoanalyzetheprincipal andtherisk/interest.TheTDisunderstoodbytheparticipants,who havebeeninstructedby apersonresponsiblefor the process (see Preparation). There is an iterative process in placeto monitor TD (identify, estimate, prioritize, and repayit), andsuchprocess issubjectedto continuousimprovement.7.2% oftherespondentsare onthisstage,actively tracking TD. This isthe maximum level achievedby the companies, asconfirmed by the interviewees. Thisamount canbeobtainedwhentakingintoconsiderationtherespondentswhoanswered“Agree”or“StronglyAgree”toQ5(see Fig.9).

Wedonothaveevidencethatcompanieshavebetterprocessesandtoolsinplace.However,basedoncurrentliterature onTD[3] andrelatedworkonchangemanagement[18],wehypothesizefuturematuritystepsthatcanbereachedbythe companieswhentheresultsofresearchwouldbeputinplace.Weidentifythefollowingthreesteps:

•^Measured:^In^this^stage,identiﬁcationtoolsforTD areinplace,forexample,theuseofthetoolSonarQube forsource code TD(suchasMcCabecomplexity)or, forexample,dependency checkersonthe architecturelevel (asreportedin companyF1).Themeasurementoftheinterestisalsoinplace,forexample,thereareindicatorsthatshowtheamount ofinterestpaidorpredictediftherefactoringisnotconducted.Suchtoolsarenotemployedinpracticeyetandshould beintegratedtoprovideoverallindicatorstoprovidehelptothestakeholderstoestimateandprioritizeTD.Theauthors ofthispaperareactivelyworkingonintroducingsuchtoolsandindicators,asexplainedinourrecentwork[20].

•Institutionalized:Accordingtochangemanagement [18],aprocessismaturewhenitisspreadandstandardizedacross thewholeorganization.ThiswouldallowanalignedprioritizationofTDacrossthesystem.Thiswouldalsoallow the practitioners toplan theallocationofresourcesaccordingto thequalityofthe(sub-)systemsinorder toplanforthe life-cycleoftheproduct.Asanexample,thereadercanconsiderateamwhoneedstobuildafeatureonasub-system developed by other teams: knowing howmuch TD ispresentin such sub-systemwould allowthe team toestimate whetherrefactoringisneededortheleadtimeforthefeaturestoreachthecustomer.

•^Fullyâutomated: În ^this ^stage, ^the ^decisions ôn ^the refactoring are completely data-driven, making use of statistics collected onhistoricaldataorbybenchmarkingthesystemagainstacollectionofreferencesystems.Forthispurpose, however,thepreviousstepsarenecessary.

4. Discussion

Thecombinationofdatafrom226participantsin15largesoftwareorganizationswiththein-depthcasestudyprovided an overallpictureofthecurrentstateofpracticewithrespecttoTDtracking.Inthissection,wediscussthecontributions in thismanuscript, withrespectto practitionersandresearchers,we compareourresultswithexistingliterature, andwe reportlimitationsandthreatstovalidityrelatedtoourstudy.

4.1. CurrentstateofpracticeoftrackingTDandimplicationsforpractitionersandresearchers

TheresultsrelatedtoRQ1tellusthatsoftwarecompaniesspend,onaverage,around25%oftheirdevelopmenttimeon TD managementactivities. Theboxplots(Fig.4andFig.10)show someconsistencyinthecompanies: Themediansrange

(16)

almostoneoutofﬁvedoitinanad-hocway(19%),thatis,byusingtoolsthatarenotmadeforTDtrackingandtherefore arenoteffective.Finally,only7%oftheparticipantstracksTDinamorededicatedway.

An interesting observationis that the resultsare not signiﬁcantly affected by the backgroundandthe role of therespondents.Thisdatumincreasesthereliability oftheresults:Independentofthe organizationandthebackgroundofthe participants, we found very similar resultsacross the respondents, whichcan be considered also more general. In other words,themeansandthevarianceacrossdifferentpractitionersaresimilarindifferentorganizations.

However,thisalsoledustoconsiderthefollowing:Differentroleswithdifferentprioritiesandviews(e.g.,managersand developers)agreedontheestimatedamountofeffortdonetokeepTDatbay,aswellasonthefactthatsucheffortisnot systematic(TDismostlynottracked).Then,anunansweredquestionis:IfTDissopainful,whydoorganizationsnottrack TDmoresystematically?OnepossibleansweristhatemployeesdonotknowhowtotrackTDeffectively.Thisissupported bythefact thatmostofthosewhotrackTDdonotusepropertoolsordocumentation,whilethefew whosystematically trackTD still do somanually andrarely usebasic measurements.For thisreason, we found itimportant topropose the SAMTTDmodel,tohelppractitionersunderstandwhatitmeanstotrackTDandwhatisnecessarytoimplementatracking processinpractice.

AnotheranswertothecurrentlackofTDtracking,despitethemanagementeffort,mightbefoundintheresultsrelated to RQ8 andRQ9 concerning the necessityof a Preparationphase and its cost,which is critical forthe introduction ofa TD tracking process in thecompanies. At the outset, the initiative needs to be conductedby one ormore champions in the organization. An initial budget should be allocated to allow the first activities related to the TD inventory, and this entailsaneedforacommitmentbymanagement,whichisachievedbycommunicatinghowasystematicTDmanagement processwouldbringbenefitstotheorganization.Unfortunately,thisisoneofthechallengesreportedbythepractitioners, whoclaimthatthereisalackofgoodinstrumentsandpubliclyavailableresultstoadvocatefortheneedofsystematicTD management.Other activities includethecommunicationandalignment ofwhatshouldbe collected asTD, theset-upof measurementsystems,theappointmentofaSub-SystemTDResponsible(SSTR),andthebreakdownanddistributionofthe TDitemstotheteams.Unfortunately,thefirstinvestmentcanbeburdensome.Forexample,atrialof150initialhoursfor a unit withthreeteams was barely enoughto identifypreliminarily theinitial TD list.It alsodidnot leave time forthe company toset up measurementsystems andaccuratelyestimate andprioritize theTD items, although updatingthe TD backlogbecomeslightweightinthefollowingiterations.

Fortools totrackTD, we found that manyparticipants usebacklogs, implementedin projectmanagement tools such asJira and Hansoft,andstatic analyzers.The results alsosuggest that these approachesrequire lessmanagement effort, andthey seemtogive slightlymoreawareness oftheTD inthesystem. However, itseems that,formostofthe respondents, the awareness of the amount ofTD present in the systemis not affected by the tool inuse, ifnot slightly.This means that TD tools are not only used by the teams to be aware of the TD, but also for communication, monitoring, and management purposes. The usefulness ofthese toolsis shown by the fact that the participantsusing backlogs and static analyzers spent less than the average time (18–19% compared to 25.9%) on TD management. However, the tools seem not to help raise the awareness of the respondents: The mean awareness remains between “somewhat disagree”

and“somewhatagree.”Manyqualitative answers,bothfromthesurveyandfromthecasestudy,alsoreport thefactthat many TD items cannot be automatically revealed because they are too context-speciﬁc and they cannot be represented by generic patterns.This leads to theconclusion that better andmore speciﬁc toolsformanaging TDneed to be developed.

Insummary,managingTDrequiresafew investmentsthatarenot wellknownbythepractitioners andaredifficultto bemotivatedbyaprecisecost/benefitsratio.Consequently,withoutaninvestmentinprocessesandtoolstotrackTD,itis difficulttomakeTDvisible,aswellastoadvocateforrefactoring“invisible”TD.Thisrepresentsaviciouscycle:companies suffer the negative effects of TD and try to contain it, but at the same time they do not find enough motivations to investinamoresystematicmanagementprocess. Bylookingatthemotivationsforstartingto trackTD,the resultsshow that organizationsdoso whenthey experiencetheinterest ofTD:slowfeaturedevelopment,qualityissues,andperformance degradation. However, at such a point, the interest associated withTD is already highand, as explainedin other recent papers—[6],[12]—fromtheauthorsofthismanuscript,itishardtorefactor,asthecosthasalsoincreasedandhasbecome tooexpensive.Inconclusion,theonlywayouttheviciouscycleseemstobe,forthepractitioners,toproactivelystarttracking TD.Usingbacklogsandstaticanalyzershelpreducethemanagementoverheadandincrease(evenifslightly)theawareness ofTD. Newtoolsneed tobe developed,in twomain directions:allowing the developersto communicatethe urgencyof refactoringTDtothemanagement,andbetter(semi-)automatictoolstoidentifyandtrackTDtoincreasetheawarenessof therespondents.

(17)

4.2. Relatedwork

Therearetwosurvey-basedstudiesregardingthefamiliarityandtoolusagerelatedtoTD.In[21],theauthorsconcluded that 50% ofrespondents saidthat no toolswere usedandonly 16%said that toolsgaveenough details. Theirstudyalso showsthat27%oftherespondentsdonotidentifyTD.Furthermore,Holvitieetal.[22] showthatover20%oftherespondents (inFinland)indicatedpoorornoTDknowledge.However, inthesestudies,wecannotﬁndanestimate oftheeffort spent onTD management, andthereis no explanationofhow a TD trackingprocess can be startedorimplemented. As a comparisonwiththesestudies,theresults fromoursurvey showthat familiaritywithTDandits trackingseems tobe higheramongthe respondentswhoanswered oursurvey.Thismayberelatedtothe differentsize,culture,ordomainof the organizations,butgiventhatourstudyismorerecent,wecould speculatethatthefamiliaritywithTDisgrowing. In ourresults,only8.4%ofourrespondentswerenot familiarwithTD,and27% oftherespondentsusedtools.Bothﬁndings arehigherthanintheothersurveys.

There areafewarticlesaboutindustrialpracticesconcerningTechnicalDebt,forexample[8,23],and[24],buttheyare single casestudies, and, in two cases,they were performedin smallcompanies. Also, such work doesnot focus on the currentstateofpracticeofTechnicalDebttracking,anestimationoftheTDmanagementeffort,themotivationsforstarting to trackTD,orthe maturity evolutionoftracking.This makesit difficulttocompare theresultswithour survey,butwe willtakethetopicsonebyoneanddiscusssimilaritiesanddifferences.AsforthecostoftrackingTD,[25] reportsdetailed resultsfromasinglecasestudy.Some resultsare inlinewiththebroadresultsreportedhereincluding,forexample,that the effortmightvarygreatly,reaching even70%ofthedevelopmenttime, andstarting theTDtrackingismoreexpensive in thebeginning butitbecomes more lightweightwhen theprocess is repeated.In [26],the TD managementprocess of several companies is analyzed withreported resultssimilar toour cases, forexample, thelimited use ofmeasurements andlackofasystematicprocess.However,incontrastwithourwork,thestudydoesnotfocusonTDtracking;itreportsa broadsnapshotofcurrentpracticesanddoesnottakechangemanagementperspectiveintoaccount.Forexample,wereport information suchasthequantified costof managingTD, thereasonswhyorganizationsstarttracking TD,andtheprepa- rationactivities andcosts necessarytotrackTD.Wepresenta maturitymodel,SAMTTD,that, takingchangemanagement aspects intoaccount, allows for the transferof knowledge to practice.This is visiblein the additionalfour levels added inour model.We canconsiderthefourthstepinourmodelasanespeciallyimportantadditionto ourworkbecausewe foundevidenceofasystematicprocessusingTD-specificdocumentationnotreportedin[26].Also,noneofthecitedstudies reports quantitative answers fromas manyas 226practitioners, which alsoshow trends andstatistical results reported here.

There are a few studies regardingTechnical Debt trackingandtools inthe literature. Asfor tools,mostofthe recent findingsreporttoolscreatedbyresearchers(e.g.[27–29]).Theexperiencereportsareusuallyrelatedtotheevaluationofthe tool inaspecificcontextand,therefore,cannotbeconsideredasstate-of-practice(atleast,notyet).Thisisunderstandable asnewtoolsarebeingdevelopedwhilethismanuscriptisbeingwritten,andtheattentiontoTDbysoftwareorganizations isquiterecent.Asfortracking,threeinitiativeshavebeenreportedintheliterature[28,30,31].Thefirstone,[28],presents a tool calledDebtFlag, which allows trackingTD and its propagation. However, the evaluationof such a tool in practice hasyettobereported.Thesecondone, [30],reportstheevaluationofatool(AnaConDebt)toassessandtrackTD.Afirst studyhasbeendoneinanindustrialenvironment,butmorestudiesareneededtounderstandwhetherthetool isusable in practice. Finally,the last paper,[31], reportsa new methodto analyze the TD reported in codecomments. Although some ofthefeatures ofthesemi-automaticapproach seeminteresting,itisnot clearhowmanyTDitemsarecovered by commentsandwhetherthisapproachcanbeusedinpractice(thepaperdoesnotreportapracticaluseofthemethodwith an evaluationfromthepractitioners).Forexample,ifwelookatthesurveyconductedinthispaper,currentlyonlyaround 1%oftheparticipants(three)statethattheytrackTDusingcomments.

4.3. Limitationsandthreatstovalidity

Here we report the main threats to validity regarding thisstudy, according to [11]:constructvalidity,internalvalidity, externalvalidity,andreliability.

Construct validityis concernedwiththeinvestigation deviceandthevalidity ofthedatawithrespectto theRQsthat are investigated.In a survey,this is usually one of the mainthreats to the validity ofthe results, asparticipants might interpret definitions andother terms differently fromeach other.Although this phenomenon is unavoidable, we took a few approachesto mitigate the consequences.As for the misunderstandings related to theinterpretation of what TD is, we havereported,beforethequestions,shortdefinitionsoftheissuesandmanagementactivities thatareassociatedwith TD accordingtothemostup-to-dateliterature.Inother words,we didnot askquestionson“TechnicalDebt” directlybut, instead,onmoreconcreteissuesthatareassociatedwithit.Inourexperience,thisshouldhavereducedthepossibilitythat therespondentswouldconsiderTDassomethingelse,forexample,bugsormissingfeatures(somethingthatmighthappen in practice, according to our experience). We also provided, in the last part of the survey, a definition operationalized fromthe variousexisting formal definitions.We askedaquestion aboutwhetherthepractitioners were familiarwithTD accordingtothedefinition,andtheymostlyagreed.Althoughthisdoesnotensurethatthepractitionershadansweredwith fullknowledgeofwhatTechnicalDebtis,webelievethatthetwomitigationstrategiestogethercontributedtoreducingthe threatstoconstructvalidity.

(18)

Asfortheresultsconcerning testinghypothesesstatistically,itisimportantto noticethat,inmostcases,we couldnot rejectthenullhypothesesthattheresultswoulddependonthebackgroundoftherespondents(roles,company,etc.).This meansthatwecouldnotﬁndenoughevidenceinthisdatasettosupporttherejectionofthenullhypotheses,butthereader shouldbewarnedthat wealsodidnot provetheoppositehypotheses.Insummary,we cannotclaimthatthebackground playedaroleintheresults.

Finally,itisimportanttoreportthethreatstoexternalvalidity.Weinvestigatedmostlylargecompaniesinvolvedinthe developmentofembeddedsystemsandfromtheScandinavianarea.Thisentailsthreepossiblethreats.

• Îtîs^possible^that,ⁱⁿôther^domains^(e.g.,^webdevelopment),thepercentofthecompaniesinthematuritystepswould differ.Tomitigatethisthreat,wehaveincludedacompanydeveloping“pure”optimizationsoftware.Inthiscase,wedid notfindastatisticaldifferencewithrespectto theothercompanies.However,moreresearch isneededtounderstand ifthereisadifference.

• ^Companiesⁱⁿ ^other ^countries, ^with^different ^contexts ^and^culturalbackgrounds, mightanswer thesurvey differently or have different ways of managing Technical Debt. However, all the companies investigated in this study employ developers from all over the world and have distributed development. It is therefore likely that the background of theparticipantsinthesurveywouldactuallybemoreheterogeneousthantheorganizationsthemselves,whoareonly Scandinavian.

• ^Small^companies^might^behave^verydifferentlywithrespecttoTechnicalDebtmanagement.

Therefore,thereadermustbeawarethattherearesomelimitationstotheextenttowhichwecangeneralizefromthese results.

Therearealsothreatstothereliabilityoftheresults,orelse,theresultsmightbebiaseddependingonaninterpretation givenbytheauthors,method,orsourceofevidence(e.g.,ifweaskedonlydevelopersbutnotmanagers),asreportedbelow.

• ^Thereîsâ ^threatⁱⁿ^the^quantitiesêstimated^by^therespondentswithrespecttoQ1. Wedonotknowwhatthegiven estimationsarebasedonsincemostoftheparticipantsdonotexplicitlytrackTDandtheir timespentonit.However, asthe demographic datashow, many participants can count several years (more than 10) of software development experience.Estimationsarebasedonexperience,andtheyarereferencedtothepractitioners’lastprojects,whichlimits a possibleretrospective bias.Practitioners are usedto estimatingthe amountof workthat has beendone orthat is upcoming,whichmitigatesthethreatthattheestimatedeffortwouldbeverydistantfromtherealone.

• ^As^for^the^authors’interpretation,wehavemadesurethat,especiallyforthequalitativedataanalysis,wehaveapplied observertriangulation:Twoormoreauthorshaveanalyzedtheinterviewsandeitherseparately codedthestatements orcheckedtheotherauthors’codes.Althoughthisdoesnotremovethethreatcompletely,itisthemainstrategyused whenqualitativedataanalysisisinvolvedinthestudy.

• ^Relyingônlyônquantitative datamightmiss importantdetailsthat are necessarytounderstandtheresults ormight showcorrelationsthatarenotrelatedtoanyrealcausality.Forexample,wecouldnotfindreasonsfromthequantitative backgrounddatathatwouldexplainthevarianceintheamountoftimethattheparticipantsareemployingtomanage TD.However,wecouldcombinethequantitativeresultstoqualitativeanswerscomingfromsomeoftheorganizations participatinginthesurvey,whichhelpedexplainthefactorsrelatedtotheirmaturitybyanalyzingtheinterviews.

• ^Finally,^there îs â ^threat ôfreliability ofthe results, asthepercentage ofdevelopers participatingin the survey was largerthan other roles.Thismeansthat theresults mightbeskewedby thedevelopers’biases. However, tomitigate thisthreat,weperformedachi-squaretesttounderstandifthedistributionoftheanswerswoulddependontheroles oftherespondents. Thetest didnot supportsuch ahypothesis, meaningthat there wasnot a statisticallysignificant differencebetweendifferentrespondingroles(differentrolesgavesimilaranswers).Byhavingsuch rolesparticipating inthesurvey,wecouldapplyamitigationstrategydenotedassourcetriangulation.

5. Conclusion

According to 226respondents in15 softwareorganizations, practitioners estimate spending, onaverage, a substantial amountoftimetryingtomanageTD(25%),althoughsuchanamountisaffectedbysomevariance.Softwarecompaniesin ScandinaviaaremorefamiliarwiththeTDmetaphorwithrespecttopreviousstudies,andtheytrackTDmore.Theaware- nessofTDinthesystemseemsto besomewhatknownby thedevelopers,independentofwhichapproachisused.Tools suchasbacklogs(themostpopularapproach)andstaticanalyzershelpreducethemanagementoverheadofapproximately