Saving behavior of Norwegian households: An empirical study of gender and marital status
Ida Wetlesen
Master of Philosophy in Economics Department of Economics
University of Oslo
May 11, 2018
Copyright © Ida Wetlesen, 2018.
Saving behavior of Norwegian households: An empirical study of gender and marital status http://www.duo.uio.no/
Summary
This thesis investigates the impact of gender and marital status on the saving behavior of Norwegian households. Household saving are in official statistics represented by total sav- ings of the entire household sector, not accounting for inequalities between different groups (Halvorsen, 2011). A general perception is that women are more conservative and less risk tolerant than men in investment decisions, a perception confirmed by a lot of empirical re- search. Married individuals are expected to hold riskier portfolios as it provides risk sharing opportunities and resource pooling. Using a longitudinal dataset constructed by Statistics Norway, I study how household saving behavior differ between men and women, controlling for marital status and other confounding factors. To see how these differences unfolds in the household’s tolerance towards risk, I examine the share of risky assets held in their portfolios.
The purpose of this thesis is to shed new light on questions such as; Do men save more than women? Are women more risk averse than men? Do couples take more risk than single individuals? The empirical method used for this research is an Ordinary Least Squares (OLS) regression model. This shows that various sources of heterogeneity such as gender, income and family characteristics affect households saving behavior. By controlling for observed char- acteristics, I find that there are no differences in the level of saving between men and women, but women are more risk averse than men and both married couples and cohabitants have a lower share of risky assets in their portfolio, i.e. are more risk averse than single individuals.
Furthermore, I investigate whether there are gender differences in the marginal propensity to save (MPS) rather than in the level of saving. By controlling for unobservable characteristics, I find that women have a higher marginal propensity to save than men, i.e. saves a higher proportion of an increase in income.
Preface
I would like to thank my supervisor Elin Halvorsen for invaluable guidance and support. I am extremely grateful for the help she has provided me with, and the time and patience she has devoted to this project. In addition to share valuable input and constructive feedbacks, she has shown a genuine commitment for my work. I believe the combination of my dedication along with Elin’s engagement and inspiration has contributed to make this an interesting thesis. Hopefully, it also provides a valuable contribution.
I would also like to thank Statistics Norway for providing me with office space and Nor- wegian registry data on income and wealth, which have been essential for the work of this thesis. I am also grateful for my fellow students at Statistics Norway, who has provided me with great company and encouragement.
Finally, I would like to thank Arild who took the time to proofread. However, all remaining errors remain my own.
Ida Wetlesen, May 2018
Contents
1 Introduction 1
2 Theoretical review 3
2.1 Keynes consumption theory . . . 3
2.2 The life cycle- and permanent income hypothesis . . . 4
2.2.1 The optimization problem . . . 5
2.2.2 The functional form of utility . . . 7
2.2.3 Household formation . . . 9
2.3 Expectations and previous literature . . . 9
3 Data 14 3.1 The dependent variable . . . 15
3.1.1 Definition of savings . . . 16
3.1.2 Weaknesses of the estimation . . . 19
3.2 Explanatory variables . . . 19
3.3 Descriptive statistics . . . 24
4 Empirical method 29 4.1 Ordinary least squares (OLS) . . . 29
4.2 Fixed effects model . . . 32
5 Analysis and results 35 5.1 Specifications and results of the OLS model . . . 35
5.1.1 Gender differences . . . 45
5.2 Specifications and results of the FE-model . . . 47
5.3 Discussion and main findings . . . 50
6 Concluding remarks 53
Appendix 57
List of Figures
1 Distribution of wealth . . . 21
2 Predictive margins of age- and time coefficients with 95% CIs . . . 39
List of Tables
1 Savings in absolute value and inverse hyperbolic tranformation . . . 182 The distribution of gender . . . 22
3 Constellations of marital status . . . 23
4 Educational level . . . 24
5 Descriptive statistics from the data set . . . 25
6 Descriptive statistics of used variables . . . 26
7 Descriptive statistics of gender differences (absolute values) . . . 27
8 Descriptive statistics of how men and women save . . . 28
9 Overall determinants of saving . . . 37
10 Saving by gender and marital status . . . 41
11 Risk share . . . 43
12 Saving by asset class (single households) . . . 44
13 Gender specific coefficients . . . 46
14 Fixed effects regression coefficients . . . 49
15 Data description . . . 57
1 Introduction
The existence of gender differences in behavior has been found for a wide range of issues in economic contexts, such as saving behavior, wealth accumulation and preferences towards consumption. In general, women have lower earnings, hold lower level of wealth and have smaller retirement benefits compared to men, this in addition to having longer life expectan- cies1. Ho et al. (1994) finds that women should hold riskier portfolios than men because they live longer, assuming identical preferences. However, empirical evidence confirms that women appear to be more risk averse investors than men. Even though women have become more interested and is better informed about investments, the measures of long-term finan- cial security are still higher for men compared to women (Hira and Loibl, 2008).
The review of individual risk attitude is central in financial decision making because it affects a household’s portfolio choices, which are crucial in achieving long-term financial goals like saving. Another potential source of behavioral gap is marital status, which provides risk sharing opportunities and resource pooling, making married individuals expected to hold riskier portfolios (Bertocchi et al., 2008). However, previous research is not consistent on whether couples are more or less risk tolerant than singles.
The purpose of this thesis is to shed new light on questions that is central for studying the saving behavior of Norwegian households. Despite the fact that aggregate savings in Norway has increased after the financial crisis in 2008, it currently exists little empirical research that examines saving behavior controlling for confounding factors. Previous work has mainly been descriptive or based on information from banks, insurance companies or budget surveys.
I employ a longitudinal dataset constructed by Statistics Norway that is based on Norwegian income and wealth registry data. The use of registry data in analyzing saving behavior is relatively new in research contexts, making it possible to study savings from a microeconomic perspective and identify some of the heterogeneity within the population which disappears
1Women are expected to live 3.6 years longer than men (Statistics Norway, 2018).
in aggregated data. Age, income, gender, education and family characteristics are all aspects that affects individuals’ propensity to save, in addition to time effects that documents saving in relation to business cycles and other time-specific events.
The thesis asks three primary research questions:
1. Do men save more than women?
2. Are women more risk averse than men?
3. Do couples take more risk than singles?
To be able to analyze these questions more thoroughly, this requires a definition of saving that reflects household’s active saving choices. This makes the definition of active savings appropriate, which is represented by the change in household’s financial wealth. This ex- cludes what is called passive savings, which refers to returns on existing assets such as shares or interest revenues (Fagereng and Halvorsen, 2017).
The data used in the analysis follows the structure of panel data, more specifically, it is an unbalanced panel2. To see how different levels of saving can be seen in the context of household characteristics, I use an Ordinary Least Squares (OLS) regression model. To control for unobserved characteristics, I expand the analysis by using a fixed effects trans- formation. Main findings and additional results are then discussed in the light of previous research and the theoretical framework of the life cycle theory.
The thesis is structured into six main sections, and proceeds in the following way: Sec- tion 2 presents the theoretical framework applied in this analysis, more specifically the life cycle theory. Section 3 introduces the data, the estimation procedure of savings and defini- tions of essential variables. The empirical method is presented in section 4, which review how saving is estimated using OLS and fixed effects regression. Section 5 presents the analysis, the associated results and a discussion of main findings3. The final section provides concluding remarks.
2This additional terminology refers to missing data for at least one time period for at least one individual.
2 Theoretical review
To be able to conduct accurate and realistic results from the analysis, it’s important to have a good understanding of the underlying theory. This section presents relevant theory of con- sumption and saving behavior that will be applied later on when the results are obtained.
The section is introduced by reviewing some of Keynes general theory, presenting some of his contributions and his view on individual saving motives. This approach will be followed by consumption theory related to intertemporal choices, presented by Modigliani and Brumberg (1954) and Friedman (1957).
2.1 Keynes consumption theory
The foundation of modern consumption theory was established by John M. Keynes in 1936.
His basic model of consumption was that current consumption expenditures are mainly de- termined by current disposable income. He called the relationship between aggregate con- sumption and current disposable income the "propensity to consume", creating two measures of the sensitivity of consumption to income; the marginal propensity to consume (MPC) and the average propensity to consume (APC). The MPC measures the amount out of disposable income individuals uses on consumption instead of saving, while the APC is the ratio of consumption to income. Both are generally believed to be between zero and one (Parker, 2010).
According to Keynes, aggregate consumption, Ct, depends on disposable income, Yt, and follows the linear form:
Ct=a+bYt (1)
The coefficient a is a constant term, and b represents the MPC. The APC is represented by Ct/Yt=a/Yt +b, and how this varies as income changes depends on a.
The propensity to consume is according to Keynes affected by objective and subjective factors. He classifies the objective factors as events that affect disposable income beyond the individuals control, such as fiscal policy or expectations about future income. The sub- jective factors are related to how individuals save, involving individual-specific needs and psychological tendencies to change consumption. Since saving is a residual of disposable in- come and consumption, a change in consumption will also imply a change in saving. Hence, Keynes states the following reasons for why household choose to save; buffer against unex- pected events (precautionary savings motive); consumption smoothing within their lifetime (life-cycle motive); interest rates and return on savings (patience) and the desire to leave inheritance for next generations (bequests) (Berg and Aarestad, 2014).
However, Keynes’ description of psychological tendencies made it difficult to construct coher- ent models that was consistent with a theoretical framework. He also received criticism for not obtaining a formal link between individual – and aggregate levels, making his predictions empirically inconsistent. In cross sections, saving rates seemed to change systematically with the level of income, and it was observed that saving was higher for individuals experiencing income increases and lower for individuals experiencing income decreases. These observa- tions could not be explained by Keynes theory, which led to the formulation of new theories, namely the life cycle and permanent income hypothesis (Attanasio and Weber, 2010).
2.2 The life cycle- and permanent income hypothesis
This section presents the theoretical framework used to analyze the saving behavior of Norwe- gian households. The life cycle model presented by Franco Modigliani and Richard Brumberg (1954) and the permanent income model introduced by Milton Friedman (1957), shows the- oretical consistency in that intertemporal consumption and saving choices are set within a coherent optimization problem.
These two theories are similar in the sense that both assume that individuals attempt to maximize their personal well-being by balancing a lifetime stream of income with a lifetime
pattern of consumption, also being forward-looking by caring about both the present and the future. They differ merely with regard to the horizon considered, where the life cycle model assumes that the consumer has a finite life and the permanent income hypothesis assumes the lifetime to be infinite. By not introducing any uncertainty, both models predict the same outcome: Given a concave utility function, individuals hold the marginal value of wealth constant over time, which leads them to smooth consumption over time (Helliesen, 2010).
I choose to work with a simple version of the life cycle model, which establishes a con- ceptual framework where individuals maximize utility given a set of intertemporal trading opportunities. The model emphasizes how individual’s disposable income varies at different stages of life, assuming they want to smooth consumption within their lifetime. Decisions depend on the total amount of resources, on preferences over present and future consumption and on relative prices (Attanasio and Weber, 2010).
Under the assumption that households are rational and forward looking in their decision making, I use the framework of the life cycle model to study household saving behavior. This simple version is flexible enough to be brought in a serious way to the data, allowing me to derive specific implications on the primary research questions.
2.2.1 The optimization problem
The version of the model I consider follows the approach of Helliesen (2010) and Romer (2012), and is one in which an individual maximizes expected utility over a finite interval subject to a set of constraints. The optimization problem show that consumption depends on lifetime resources where the consumption path is conditional on factors like the interest rate and individual preferences. More formally, lifetime utility and the associated constraints are presented as follows:
max
T−t
X
t=0
βtu(Ct) (2)
subject to
T−t
X
t=0
1
(1 +r)tCt ≤W0+
T−t
X
t=0
1
(1 +r)tYt (3)
WT ≥0 (4)
Here, period utility u(.) is increasing and strictly concave: u0 > 0 , u00 <0, and Ct is con- sumption in period t. The discount factor is represented by β = 1/(1 +δ)t, where δ is the consumer´s subjective time preference rate. Yt and Wt denote the consumer‘s income and wealth in period t. The first constraint is the intertemporal budget constraint, which implies that the present value of the individuals lifetime consumption must be less or equal to the present value of lifetime resources. Equation (4) gives the limit for total net wealth at period T, where the consumer has to die with without any levels of debt, that is, has to pay back debt with probability one.
The Lagrangian for the problem can be expressed as
L=
T−t
X
t=0
βtu(Ct)−λ W0+
T−t
X
t=0
1
(1 +r)tYt−
T−t
X
t=0
1 (1 +r)tCt
!
(5)
where λ denoted the marginal utility of the lifetime resources. The Langrangian is solved by taking the derivatives of the arguments and equate them to zero. The two first order conditions of the problem are presented by the following equations:
∂L
∂Ct =u0(Ct)−λ= 0 (6)
∂L
∂Ct+1 =β(1 +r)u0(Ct+1)−λ= 0 (7) Solving for λ and equating yields the Euler equation, presented by
or
u0(Ct+1) = (1 +δ)
(1 +r)u0(Ct) (9)
The Euler equation states that the marginal utility of consumption in one period is equal to the marginal benefit of waiting one period, which is consuming the good plus interest times the extra utility gained from extra future consumption discounted by impatience.
In the simple case where δ = r = 0, optimal consumption becomes the same in every period and proportional to lifetime income:
Ct∗ = 1
T W0+
T
X
s=0
Ys
!
(10)
Saving is by definition the part of income that is not consumed, i.e. S=Y - C, and thus saving in the simple lifecycle model with full certainty becomes the deviation between current income and consumption as a fraction of lifetime income:
St =Yt−Ct∗ =Yt−
"
1
T W0+
T
X
s=0
Ys
!#
(11)
2.2.2 The functional form of utility
Modern versions of the life cycle model depart from the assumptions about full certainty and assume that the income process is stochastic. In addition it frequently introduces a functional form of the utility function that implies risk aversion. By using the frequently applied CRRA (constant relative risk aversion), the specification takes the following form:
u(C) = C1−1σ
1− σ1 (12)
where 1/σ is the parameter of relative risk aversion and σ is the intertemporal elasticity of substitution. Risk aversion causes the household to assign a greater utility loss to losing an amount of consumption than utility gain from gaining the same amount of consumption.
Constant relative risk aversion implies that households have the same attitude towards losing a certain fraction of its consumption at all levels of consumption. Hence, the Euler equation becomes:
Ct=β−σ(1 +r)−σCt+1 (13)
Any utility function that has a positive first and a negative second derivative induces risk aversion. However, the CRRA utility function has another property that causes households to have a positive precautionary saving motive in the life cycle model. That is, the utility function also has a positive third derivative:
u000(C) = (1
σ + 1)1
σC−(σ1+2) >0 (14) With a CRRA utility function, uncertain income, and positive and non-zero interest rate and discount rates, it becomes impossible to solve for optimal consumption analytically. However, for expositional purposes I generalize the consumption and saving functions as follows:
Ct∗ =f(δ, r, T, σ)∗(W0+E{Ys }) (15)
St=Yt−f(δ, r, T, σ)∗(W0+E{Ys}) (16) where Ys denotes a stochastic income process. The implied effects on consumption and sav- ings of the variables can been derived from the different equations above. Equation (10) shows that the effect of a longer lifespan given that lifetime resources are given, is lower consumption in all periods, and therefore higher savings out of labor income. If women are rational and forward-looking individuals, this would mean that women should have higher savings compared to men. The effect of time preference depends as seen from the Euler equation, on whether the time preference rate is higher or lower than the interest rate. A patient consumer (δ > r) will save relatively more early in their life span relative to an im- patient consumer, and thus have an upwards sloping consumption profile over the life cycle.
An impatient consumer (δ < r) will consume more in earlier stages of life, i.e. take up more
debt, thus be saving more in down-payments of debt in the future.
The risk aversion parameter is the inverse of the intertemporal substitution elasticity. In other words, it says something about the consumer’s ability to smooth consumption over time. Equation (13) shows that it mainly affects how much the consumer respond to changes in the interest rate. However, the degree of risk aversion also affects the fraction of savings invested in risky assets. It is straightforward that an individual with high risk aversion will invest relatively less in risky assets.
2.2.3 Household formation
Standard consumption theory assumes a representative individual and that household size or composition does not affect consumption and saving decisions. However, it is possible to include such variables in the consumption function
u(C) = (Ce−f(z))1−
1 σ
1− σ1 (17)
The functional form of the utility function is still the specification of CRRA, where f(z) is a function of the household size or potential taste shifters related to household composition.
It has been perceived as a puzzle that observed consumption is much more hump shaped over the life cycle than the standard theory predicts. Browning and Ejrnæs (2009) show that family structure and children explain most of this hump shape.
2.3 Expectations and previous literature
This section discusses the empirical evidence of the theory presented in the previous section.
The aim is to present evidence that refers to individual saving behavior, which might be relevant in judging the validity of the life cycle model. In addition, this will also give some indications on what to be expected from the results in the empirical analysis. Further, the section ends by discussing some of the previous research.
Attanasio and Weber (2010) discuss how contributions that originated the life cycle the- ory emerged from cross-sectional studies and from observations of how saving rates vary with income within a cross section. The fact that these empirical regularities still hold is impor- tant for the empirical validity of the life cycle model. Some of these observations is obtained if one looks at the saving rates by current income level of groups that differ by the level of “permanent” income, such as households headed by individuals with different educational level. Analogously, considering individuals whose income has increased and individuals whose income has decreased, the saving rate of the former is higher than that of the latter.
However, it is important to address some of the limitations the life cycle model brings to the table. The model offers a coherent explanation for the findings mentioned above, but when bringing the life cycle model to more complicated data the simplest version is not suf- ficient. Most empirical research finds that the empirical evidence is limited, more specifically because consumption is more sensitive to changes in disposable income than the model pre- dicts. A general explanation is the fact that individuals faces precautionary savings, buffer stock savings and liquidity constraints (Berg and Aarestad, 2014). Hence, uncertainty causes individuals to save more to protect themselves against unpredictable fluctuations in income.
In addition to being essential in the theory of saving behavior, uncertainty also introduces several implications to the modeling aspects. As this would require an approach that is more empirically comprehensive, I choose to exclude uncertainty from the life cycle model. For simplicity, the model represents an underlying theory that creates a theoretical framework for the purpose of this analysis.
Further, one of the main ideas behind the use of the life cycle model is that consumers have concave utility functions and, therefore, prefers to smooth their consumption paths.
This is reflected in how savings vary throughout household’s lifetime. One general feature is that the life cycle profile of income is hump shaped, which means it increase during the first part of life before it reaches a peak a few years before retirement (Attanasio and Weber, 2010). The implications for saving within the theory states that individuals will borrow at a
young age when disposable income is lower than permanent income, and then save when dis- posable income exceeds permanent income. Hence, this is a general feature that is expected to be observed within the analysis. The theory also states that individuals with high time preference rate saves more by repaying debt than patient consumers. If women are believed to be more impatient than men, it is expected to find that women will be saving more by paying down debt than what men does.
As the life cycle consider the representative individual regardless of gender and marital sta- tus, expectations about differences in how men and women save is based on a more general perception and findings from previous research. One problem with this approach is the small amount of empirical research within the literature, especially in the Norwegian context. Some studies in other countries proves to be applicable for the investigation of household saving be- havior, but not in regard to distinguish between the behavior of men and women. In studies that actually examines such gender differences, these are mostly related to saving decisions or allocation of assets in retirement saving plans, making them unreasonable for an empirical foundation of this analysis. The greatest disparities between Norway and other countries are the good welfare- and public pension systems, which reduces the households need of private retirement saving (Berg and Aarestad, 2014). This may suggest that this social security net affects how the household save, making Norway incomparable with other countries saving behavior. Another problem with the empirical investigation is the variation in the methods used to study individual differences. Some evidence is only based on descriptive statistics or surveys, collecting misleading results, while other is more robust in controlling for confound- ing factors. The rest of this section presents relevant previous research that study the impact of gender and marital status on economic behavior.
Research done by Fisher (2010) shows that the financial saving behavior of male and female differ in regards to short term saving. She finds that by controlling for income, preferences, risk tolerance and other socioeconomic factors, having low risk tolerance negatively affect the likelihood of women saving in the short term and saving regularly. For men, each year of education made them more likely to save in the short term and save regularly.
For the Norwegian case, without controlling for confounding factors, Halvorsen (2011) presents descriptive statistics that indicates that women, on an average level, have lower savings than men, and that a larger share of women saves through bank deposits while men uses shares and other securities. She also reports higher savings for single individuals than couples with children, but that these are lower compared to couples without children. The ones with the highest saving rates are registered partners, which also report the highest average income.
Further, Hallingstad and Johansen (2017) investigates using surveys the differences in fi- nancial knowledge between men and women, and to what extend this effects how individuals choose to save. The results confirm that there exists gender differences in financial knowledge for the Norwegian population, and that men on average have a higher perception of financial decision-making. By expanding their analysis, they find that saving behavior is affected by factors such as income, age and the willingness to take risk, rather than the level of financial understanding.
Another survey that confirms gender differences in households saving behavior is the an- nual survey on saving agreements in mutual funds and combination funds conducted by The Norwegian Fund & Asset Management Association (VFF). A saving agreement implies reg- ular saving in mutual funds through fixed deductions from a bank account, usually monthly.
The survey shows that the difference between men and women has increased from 2006 to 2016, where men have both higher saving rates per month and more saving agreements com- pared to women (VFF, 2017).
As opposed to limited research on men and women’s saving behavior, there is extensive empirical evidence that suggest that women are more risk averse than men. Jianakoplos and Bernasek (1998) examine household holdings of risky assets to determine whether there are gender differences in financial risk taking for U.S households. They find that single women exhibit relatively more risk aversion in financial decision making than single men, controlling for economic and demographic factors that may influence the portfolio allocation. Olsen
and Cox (2001) investigates risk differences between male and female investors, concluding with the fact that women’s increased risk aversion affects their investment choices. Charness and Gneezy (2007) propose a novel approach, assembling data from 10 sets of experiments conducted by different researchers in different countries. They find a very consistent result that women invest less, and that they appear to be more financially risk averse.
Other contributions have investigated gender and marital status jointly, the results being more inconsistent. Bertocchi et al. (2008) study the impact of gender and marital status on financial decisions for Italian households. By controlling for several individual characteris- tics, they find that male and married households are more likely to invest in risky assets than female and single ones. Similarly, consistent with their later work, Bertocchi et al. (2011) shows that married individuals have a higher propensity to invest in risky assets than single ones, and that this marital status gap is stronger for women. However, Yao and Hanna (2005) investigates the effects of gender and marital status on financial risk tolerance, using a cumulative logit model that controls for demographic and economic characteristics. They find that single males have the highest risk tolerance, followed by married males, unmarried and married women. This result is also followed by Hartog et al. (2002), which documents that married individuals are less risk tolerant than singles.
Other studies have focused on gender, marriage and asset accumulation, like the study of Schmidt and Sevak (2006). They describe how household wealth in the United States varies by gender and family structure, documenting a significant wealth gap where households headed by married couples have more than twice the mean wealth as households headed by single females. By controlling for characteristics such as position in the life cycle, education and family earnings, this reduces but does not eliminate the estimated gender gap.
3 Data
This section introduces the data used in the analysis, and presents the data source and how these are computed and treated by following the description of (Fagereng and Halvorsen, 2017). Further, the estimation procedure of savings and definitions of the relevant variables are introduced.
Data source
The data used for this analysis is a longitudinal dataset (also called panel data) constructed by Statistics Norway. The construction is based on administrative records, specifically Nor- wegian income and wealth registry data.
The data set contains detailed information about household demographics, income and wealth as reported in the individual tax records. All income components (including governmental transfers) and almost all financial assets are reported to tax authorities directly from employ- ers, banks and other financial institutions, making the data very accurate and comprehensive.
In addition there is information about educational level added from other registers.
Savings is then derived as the first difference in wealth from one year to another. Since information about wealth is taken from tax registers, the stock of wealth is measured as of December 31st. Financial assets consists of bank accounts, shares (listed and non-listed), bonds, mutual funds, money market funds, cash value of life insurance, contributions to pri- vate pension accounts and other financial assets. Real assets consist of summer houses, cars, boat and other vehicles, and production capital (typically business vehicles and machinery).
Further, all sizes of monetary value are deflated with the consumer price index with base year 2000. In the following, all sizes stated in kroner are given in 2000-kroner4.
Individual pension savings and cash value of life insurance are excluded from the data. State pension plan or occupational pension schemes are not subjected to taxation through the
4As of January 2018, one Norwegian 2000-krone corresponds to 1,40 kroner.
wealth tax, and is therefore not reported in the tax registry. Even though it is possible to set aside savings in tax-exempt individual pension savings (IPS), only 1 percent of Norwegian household save in IPS. This is due to the fact that benefits are small, and the tax treatment has varied over time. The same applies to the cash value of life insurance, which is another asset category that is insignificant in the Norwegian context.
Data processing
The dataset covers a time span of 21 years, ranging from 1993-2014. The number of observa- tions included in the complete dataset is 10,277,928, but will be reduced when all calculations and exclusions have been made. Since this will vary within the analysis, information about the number of observations will be updated along the way. All types of households are included in the analysis, except for students and pensioners. As working adults are the households of interest, the age of the household head is limited downward to 24 and upward to 60.
The complete data set represents a 20% random sample of the total population, and tracks individuals as long as they can be observed. Information from administrative records is com- bined with family identifiers from the population register in order to aggregate all income and wealth information at the family level. The spousal information for the years the individuals is either married or in cohabitation is also merged. Together, this allows for observing the financial benefits individuals experience as a result of living in the same household, thus all monetary variables used in the analysis is measured at the family level.
3.1 The dependent variable
Most people can relate to the financial term savings, but the perception and the association of the term may vary across individuals and households. The most simple and common way is to refer to savings as money that is being put aside for future use rather than immedi- ate spending. However, when investigating the term more closely, it captures several angles that’s important for research questions, making it preferable as the dependent variable for this analysis.
Within the literature, savings is defined as the share of disposable income that is not being consumed. More formally, savings in period t can be defined as follows:
St =Yt−Ct (18)
The equation represents savings,St, as a residual of disposable income,Yt, and consumption Ct. What the individual chooses to save in period t, are added to existing wealth.
3.1.1 Definition of savings
There are two ways of defining saving: saving equal to income minus consumption (flow definition) and saving equal to the first difference in wealth. Both definitions yield the same result, but their approach is quite different. The flow definition of saving reflects individual decisions about consumption and saving more directly, but it depends heavily on the defini- tion of income. An alternative is to measure savings as the change in wealth, which will be the definition of savings used in this analysis.
The type of savings to be analyzed are active savings of each household. From the ob- served variables in the registry data, savings can be divided into different groups represented by financial savings, savings in real assets and savings abroad respectively. Financial savings consists of bank deposits, bonds, mutual funds and shares. The measure of bank deposits includes cash holdings, while the bond variable also includes money market funds. Both classify as relatively “safe” assets. As saving in real assets is mainly represented by changes in tax value, and because saving abroad only make up for a small proportion of total saving, they are excluded from the measurement. Thus, active savings is represented by the change in financial wealth.
When this is related to the data, active savings will be the difference in wealth measured at the end of the previous period, adjusted for interests on existing wealth:
St= ∆w=wt−(1 +rt−1)wt−1 (19)
where wealth in the previous period is represented bywt−1, andrt−1 is the associated return.
St represent savings in current period.
Using the change in wealth as a measure of savings requires a correct measurement of all wealth items in the portfolio. One problem is the inaccuracy capital gains or losses adds to the measurement, which makes it difficult to know the fraction of income households consume at different points in time. If a household is actively saving out of their income, a sudden drop in stock market value will make the overall change in net worth negative. To over- come this problem, the financial assets is adjusted for returns. Gains and losses from shares is calculated using the Oslo Stock Exchange index (OSE), a combination of the OSE and the MSCI Word index calculates the returns from mutual funds and the Treasury bill rate is used for bonds5. Further, all monetary variables are adjusted with the consumer price index.
The dataset contains housing values represented by assessed or tax value as opposed to mar- ket value, observing the change in mortgage but not the corresponding purchase or selling price. This creates noise in the data because housing value and debt do not correspond. Also, as housing wealth is represented by tax values until 2010 in the tax records, and imputed values after 2010, housing is excluded from the estimation of active savings. To mitigate potential measurement errors in household assets, in years where the household have an in- crease in total debt that exceeds their annual income, it is assumed that this represents a new mortgage acquired in connection with a house purchase. Thus, a dummy for new mortgage is included in the analysis.
5See Fagereng and Halvorsen (2017) for a more detailed description of the imputation method.
However, for the savings measure in absolute values the occurrence of new mortgages will re- sult in some observations of large negative values. As new loans and mortgages are measured in its full, the corresponding real asset, e.g. house, summer house, cars and boats is not. This means that the distribution of saving is negatively skewed (i.e. has a long left tail) and some extreme observations. The most common way to control for skewness when working with data in absolute values, is to implement a logarithmic transformation. Transformations using the natural log is in some cases problematic because it does not work on negative values and zero-observations (Friedline et al., 2015). An alternative way is to transform values using the Inverse Hyperbolic Sine transformation (IHS). This is a method suggested by Burbidge et al.
(1988), and is linear around zero and down-weights larger observations the same way as a logarithmic transformation. For real values x in the domain of all real numbers, the inverse hyperbolic sine satisfies
f(x) = log(x+√
x2+ 1) (20)
where x represents the variable of interest and f(x) represents the transformed version of this variable. The transformation can be interpreted the same way as a natural logarithmic transformation, i.e. a unit change in one of the independent variables expressed in levels corresponds to a percentage change in saving, while IHS on both the dependent and inde- pendent variable are interpreted as an elasticity.
Estimated savings before and after the transformation is presented in table 1. In addition to retaining zero and negative values, it adjusts for skewness and provides a more transparent picture of the variable.
Table 1: Savings in absolute value and inverse hyperbolic tranformation
Number of observations Mean Std. Dev. Min Max
Savings 9,463,288 -47989.42 1.01e+07 -8.23e+09 1.77e+10
Hyperbolic savings 9,463,288 1.189214 12.14868 -23.52442 24.29169
3.1.2 Weaknesses of the estimation
It exists several weaknesses in the empirical estimation of savings that must be accounted for and addressed. I consider it important to clarify these factors before the results from the analysis is obtained and discussed.
Registry data on income and wealth is not collected for research specific purposes. When these are used in the measurement of saving, it presents some challenges regarding the data’s content and structure. Even though very little of the collected data is self-reported, they are still collected for taxation purposes. This means that the real assets excluded from the measurement causes some misrepresenting in all saving.
Further, it is natural to assume that households saving decisions are made continuously within a year in addition to following a long-term perspective. The number of observations that emerge in the analysis is relatively low as well as being measured annually, hence, the estimation does not capture saving behavior within a year. This means that saving can be affected by short-term changes prior to the observation date that is not captured in the results (Berg and Aarestad, 2014).
3.2 Explanatory variables
Below I discuss the explanatory variables included in my analysis. I present some basic information about the variables, as well as a more detailed discussion about variables that requires more clarification. Some conceptual difficulties and changes to the variables are also presented along the way.
Disposable income
To control for differences in income and how this affects households’ potential to save, dis- posable income is included as an explanatory variable. This variable is calculated as the sum of labor income, transfers and net capital income, all measured after tax.
By its simplest form, disposable income is either consumed or saved. Obviously, individ- uals with high income can both consume and save more than low income individuals. Even so, when comparing households saving behavior, it can be more interesting to study the ac- tual fraction of disposable income that is being saved or the fraction of an income increase that an individual spends on saving rather than consumption.
The implications for saving within the life cycle theory states that individuals will save when disposable income exceeds permanent income. This is seen from equation (16) in section 2, where saving is a residual of disposable and permanent income.
In the data set, the income variable is stated in its absolute value, and is, in the same way as wealth and saving, an unevenly distributed variable. Skewness will make an estimation of average income to be far over median income, as a result of a long tail in the distribution.
By transforming the variable in the same way as savings, i.e. by following the hyperbolic transformation presented in section 3.1.1, the coefficient can be interpreted as an elasticity.
This means that a percentage change in income gives a percentage change in savings, which applies for all levels of income and savings. Hence, the coefficient becomes easy to interpret and understand.
Wealth
Wealth is a measure of financial resources, usually a measure of net worth; that is, how much an individual has in savings, investments, real estate and cash (Investing Answers, 2018).
It is determined by taking the total market value of all financial and real assets that an individual owns and then subtract all debt.
The data description presented in the appendix contains several variables that can be in- cluded in different measures of wealth. By generating new variables that expresses these measures, I achieve a measure of gross wealth that is included in my analysis. As pointed out in section 3.1.1, long right tails in the distribution causes the measure of gross wealth to follow the hyperbolic transformation from equation (20).
The term real wealth includes the family household contents, car/boat, cabin, family res- idents with assessed value and other property of value belonging to the household. The term financial wealth includes bank deposits, mutual funds, bonds, securities, listed and unlisted shares and other assets. Mutual funds and listed and unlisted shares are all adjusted for tax refunds, and the unlisted shares are also VPS-registered. I compute a variable of total gross wealth by adding up the measures of real- and financial wealth. Figure 1 shows the distribution of gross wealth, financial wealth and real wealth within the sample.
Figure 1: Distribution of wealth
Within the life cycle theory, savings increase as age increases, indicating higher levels of wealth at older stages of life. This means that age and wealth is somewhat correlated with each other within the analysis. This is also illustrated in figure 1, which shows an evident hump-shaped age profile.
Debt
This variable is in the dataset calculated as overall debt, which means that it does not distin- guish between credit debt or mortgages. It follows the hyperbolic transformation presented by equation (20).
Lottery, inheritance and gifts
Any lottery winnings, inheritance and gifts that are available in the data is also included to identify large transfers between individuals between and within families. These are also transformed by the hyperbolic function from equation (20). As these are collected from the self-reported fields in the tax return, they probably contain some underreporting.
Dummy variables Gender:
This dummy is included to capture the differences between men and women regarding their financial choices. Gender differences are also reflected in preferences, whereas the empirical evidence suggests that women tend to be more risk averse than men. Table 2 shows the distribution of gender within the sample:
Table 2: The distribution of gender Gender Number of observations Percent
Male 5,238,114 50,97
Female 5,039,206 49,03
Total 10,277,350 100
Marital status:
Another way of studying different stages of life is to see how savings is distributed across marital status. The analysis distinguishes between singe, married and cohabitant individuals, showing how savings differ between couples and single households. In addition to having risk sharing opportunities, couples are more flexible relative to labor supply and adjustments in household income. The distribution of marital status is presented in table 3.
Table 3: Constellations of marital status Marital status Number of observations Percent
Single 1,932,493 18,80
Married 5,313,173 51,70
Cohabitant 3,031,684 29,50
Total 10,277,350 100
Risky assets:
In order to generate a variable reflecting the households tolerance towards risk, I have singled out risky assets from the portfolio and generated a dummy representing risk taking. Based on the variety of variables included in the dataset, I single out those assets that within the risk category differ from being clearly safe. This makes mutual funds, unlisted and listed shares applicable variables to be included in the dummy. The interpretation of the variable is that investments in such assets indicates a higher willingness to take more risk, whereas no investments represents more risk averse individuals.
Age effects:
Age and the position in the life cycle are great determinants of how individuals choose to save. According to the theoretical framework, the relationship between age and savings are generally non-linear, making it appropriate to include a dummy for each age. This allows me to see how active savings can relate to the life cycle theory presented in section 2, and may help to provide a decent explanation of the observed saving pattern. The base age of the analysis is 24.
Time effects:
By including year dummies, saving can be documented in relation to business cycles and other year-specific events. This ensures that results that may be perceived as strange can be explained by such incidents. Because savings is a measure of the change in wealth from one year to the next, the base year will be 1995.
Number of children living at home:
The number of people included in a household varies between 1 and 16 individuals in this dataset. This gives a strong indication that the households saving behavior vary within the sample, and that the effect of the family size is non-linear relative to savings. As argued in section 2.2.3, whether a household provides for children will probably affect the economic situation. Since the cost of having children living at home relative to the total number of children belonging to the household is larger, I only include dummy variables representing the former. The dummies created represent 1,2 and 3 or more children within one household.
Education:
To see the relationship between education and income, I create dummy variables using educa- tion codes available in the registry data. The Norwegian education classification is a six-digit coding system that classifies education activities after level and subjects. The first digit of the system represents education level, and it is on the basis of this number the dummy variables is constructed. The classification is represented in the following table:
Table 4: Educational level
Division of educational level Level Education
0 No education or preschool education
Primary education 1 Elementary school
2 Middle school
3 High school, uncompleted Secondary education 4 High school, completed
5 Postgraduate education 6 College degree, lower level Tertiary education 7 College degree, higher level
8 Doctoral degree 9 Unspecified
3.3 Descriptive statistics
This section presents the descriptive statistics from the dataset. As mentioned briefly in the introduction, previous literature have mainly been descriptive. By not controlling for confounding factors in the analysis, such statistics does not provide us with accurate results.
These do not take into account how differences in income, education or household compo-
sition affect individuals saving behavior. To accentuate the gender differences within the sample, I have also included two tables that presents differences in essential variables as well as differences in how men and women choose to save.
The heterogeneity and the level of detail in registry data is challenging to work with. All monetary values are stated in their absolute value, which illustrates their skewness and the fact that they are not normally distributed. This is illustrated in the following table:
Table 5: Descriptive statistics from the data set
Variable Number of observations Mean Std. Dev. Min Max
Age 10,277,350 41.36283 10.35303 24 60
Year 10,277,350 2003.817 6.335089 1993 2014
Gender 10,277,350 0.490322 0.4999063 0 1
Couples 10,277,350 0.8119658 0.3907395 0 1
Married 10,277,350 0.5169789 0.4997117 0 1
Cohabitation 10,277,350 0.2949869 0.4560369 0 1
Number of children (living at home) 10,272,561 1.033661 1.136673 0 14
Education 9,878,307 4.111415 1.684115 0 8
Savings 9,463,288 -47989.42 1.01e+07 -8.23e+09 1.77e+10
Income 10,255,287 469585.1 1292571 -3.14e+08 2.14e+09
Gross wealth 10,253,693 1035013 1.70e+07 -3015049 1.98e+10
Overall debt 10,253,693 991629.4 1614101 0 3.37e+08
Risky assets 10,277,350 0.4461609 0.4970929 0 1
Lottery winnings 10,253,693 793.1782 90331.38 0 1.28e+08
Inheritance and gifts 10,253,693 10588.2 931924.2 -429521 1.74e+09
The computing process from section 3.1.1, where all monetary values has been transformed hyperbolically, has provided a more transparent picture of the all variables. This is illustrated in table 6, which presents the variables used in the analysis:
Table 6: Descriptive statistics of used variables
Variable Number of observations Mean Std. Dev. Min Max
Age 9,463,288 41.36283 10.35303 24 60
Year 9,463,288 2003.817 6.335089 1993 2014
Gender 9,463,288 0.490322 0.4999063 0 1
Couples 9,463,288 0.8119658 0.3907395 0 1
Married 9,463,288 0.5169789 0.4997117 0 1
Cohabitation 9,463,288 0.2949869 0.4560369 0 1
Number of children (living at home) 9,463,288 1.033661 1.136673 0 14
Education 9,463,288 4.111415 1.684115 0 8
Savings 9,463,288 1.189214 12.14868 -23.52442 24.29169
Income 9,463,288 13.43241 1.579401 -20.25904 22.17951
Gross wealth 9,463,288 13.29345 2.325968 -15.61227 24.40223
Overall debt 9,463,288 12.7741 3.985036 0 21.110474
Risky assets 9,463,288 0.4461609 0.4970929 0 1
Lottery winnings 9,463,288 0.0395379 0.6823597 0 19.35792
Inheritance and gifts 9,463,288 0.3892018 2.191519 -13.66357 21.96955
Although the estimation of savings has reduced the number of observations in the analysis, it still provides a decent presentation of representative households for the purpose of the thesis.
The next table presents important and meaningful differences in men and woman’s back- ground characteristics. One disadvantage with data collected at the household levels is that it is not possible to tell much about the financial well-being of individuals within couples, which make gender differences most evident in households of single individuals. The following table presents the descriptive statistics of all men and women in the data set, as well as all the single ones.
Table 7: Descriptive statistics of gender differences (absolute values)
All Singles
Variables Men Women Men Women
Mean Mean
Age 41.3329 41.39392 37.46912 39.99625
Number of children (living at home) 0.9074392 1.164852 0.079084 0.4557442 Tertiary education 0.2765235 0.3314512 0.2484007 0.3505137
Savings -51239.39 -44626.32 -19409.73 -5800.33
Income 460043 479479.9 214191.6 204854.8
Gross wealth 992656.4 1078933 410487.6 356362.1
Overall debt 976887.6 1006915 412965.8 372973.8
Risky assets 0.437222 0.4554527 0.2609014 0.2372562
The table shows that women have more children compared to men, and that there are more single mothers out there than single fathers. Women are higher educated than what men is, but single women own less risky assets compared to single men. It is worth noticing that the latter only applies for single women and not the entire sample of women in the dataset. By getting married, a woman becomes entitled to a share of aggregated household labor income.
This may decrease the overall risk in her asset position and increase the share of financial risky assets in the portfolio, if compared to a single woman. Savings is negative across all households due to new loans and mortgages, being more negative for single males than single females. This may be due to single men taking up more loans, i.e. having higher levels of debt than what single women has. Further, single men have higher levels of income, gross wealth and overall debt than single women.
The last table presents information on household’s financial savings, which includes mu- tual funds, bank deposits, bonds and down-payment of debt. The table shows the average amount placed in these assets, as well as the percentage share that has and saves in that particular asset. The descriptive statistics of how men and women save are presented in the following table:
Table 8: Descriptive statistics of how men and women save
All Singles
VARIABLES Men Women Men Women
Mean Mean
Mutual funds 23950.25 25969.69 12236.99 8478.126
Percentage share with this asset 31.32 32.98 17.29 17.82 Percentage share saving in this asset 40.50 41.95 32.02 32.04
Bank deposits 190660.1 207055.6 121538.2 100935.9
Percentage share with this asset 97.64 98.39 93.24 95.13 Percentage share saving in this asset 99.03 99.38 97.10 98.07
Bonds 6170.159 7069.439 3522.537 3689.568
Percentage share with this asset 7.12 7.58 3.61 5.37 Percentage share saving in this asset 15.79 16.12 18.29 19.41
Overall debt 976887.6 1006915 412965.8 372973.8
Percentage share with this asset 92.17 92.64 81.12 82.02 Percentage share saving in this asset 100 100 100 100
Without controlling for any characteristics, the table indicates that single men have larger amounts placed in mutual funds and bank deposits than single women, while single women place more than single men in bonds and has lower levels of debt. However, despite relatively small differences, the statistics indicates that single women save more in mutual funds, bank deposits and bonds compared to single men. As for the results from the percentage share saving by repaying debt, this is due to inflationary gains.
4 Empirical method
The estimation of saving follows the empirical methodology of Ordinary least squares and Fixed effects regression. This section presents their differences and a set of assumptions under which the models provides appropriate estimators of the regression coefficients. The assumptions follow Stock and Watson (2015) and the approach of Berg and Aarestad (2014).
Further, the OLS assumptions are stated for the cross-sectional data, while the fixed effects regression extends these assumptions, exploiting the dimension of panel data. The hyperbolic transformation presented in section 3.1.1, ensures that these models are applicable in the use of this research.
4.1 Ordinary least squares (OLS)
Ordinary least squares (OLS) is a method used in regression analysis where the purpose is to explain the variation in a dependent variable with the variation in one or more independent explanatory variables. This is done by estimating the following equation:
Yi =β0+β1X1i+β2X2i+...+βkXk i+εi, i= 1, ..., n, (21) where Yi is the i th observation on the dependent variable; X1i, X2i, ..., Xk i are the i th ob- servations on each of the k regressors; and ε is the error term.
The method is used to estimate the parameter of a linear regression model, where the esti- mators minimize the sum of the squared errors, that is, the difference between the observed values of the population and the predicted values of the model. The OLS regression per- forms the estimation by treating the dataset as multiple independent observations, showing the relationship between different levels of saving and other explanatory variables. In order for the method to give meaningful results, the following assumptions are necessary and must be satisfied.
Assumption 1: The regression function is linear in parameters. This assumption entails that the process I estimate is a linear relationship, which means that a one unit increase
in the independent variable changes the expected value of the dependent variable with the amountβ. When estimating a non-linear relationship between a dependent and independent variable, the relationship can be made linear by doing a transformation and taking logs.
This was shown by equation (20) in section 3.1.1. Typical example of this relationship is the constant elasticity model:
lnYi =β0+β1X1i+β2lnX2i+...+βkXk i+εi (22)
Further, using many dummies, for example one for each age, relaxes the assumption of a linear relationship compared to using age as a quantitative variable in the regression (i.e.
assuming that saving is linear in age). This relationship is illustrated later in figure 2a).
Assumption 2: (Xi, Yi), i = 1, ...n, are independent and identically distributed (i.i.d) across observations. This means that the sample must be randomly selected and representative for the underlying population. This is important for external validity, as the sample chosen for the analysis must be selected in a way that makes it representative so that the results can be related to the population in the best possible way.
Assumption 3: No perfect multicollinearity. This implies that none of the explanatory vari- ables is a perfect linear function of other explanatory variables.
Assumption 4: The conditional mean of the error term should be zero. This assumption is a statement about the “other factors” contained in the error term and asserts that these other factors are uncorrelated with explanatory variables, i.e. the mean of the distribution of these other factors is zero. Formally this is written as:
E(εi |X1i, X2i, ...Xk i) = 0 (23) This assumption is the most important to consider in practice. Violation of the assumption induces a systematic error to the model, making the estimates inconsistent.
Assumption 5: Spherical errors; there is homoscedasticity and no serial correlation in the error terms. Homoscedasticity means that the error terms in the regression should have the same variance. Error variance is a measure of the model’s uncertainty, where homoscedas- ticity implies that the uncertainty is identical across all observations.
V ar(εi |X1, ..., Xk) = σ2 (24)
For no serial correlation, the error terms of different observations should not be correlated with each other:
E(εiεj |X1, ..., Xk) = 0 (25)
Assumption 6: Normality; the error terms are normally distributed conditional upon the explanatory variables and follows:
ε∼N(0, σ2) (26)
If the first four assumptions is satisfied, then the OLS estimators are unbiased estimators of the population parameters. The Gauss-Markov Theorem states that if assumption 1-4 hold, and the error terms are homoscedastic, then the OLS estimator the best linear unbiased estimator.
In order for the results to be unbiased and consistent, dependence or other systematic errors cannot be present in the model’s error terms. This is generally checked by examining the error terms closer.
Using an approximately similar dataset, Berg and Aarestad (2014) uses an xtserial-test in STATA to examine the presence of serial correlation. They find that the second part of assumption 5 is violated, i.e. that the error terms are correlated, and uses a clustering method on the panel indicator. This implies that the variance and the standard errors in the model that originally is computed by assuming complete independent observations, now