- Research
- Open access
- Published:
Using microdata as a basis for long term projections of hospital care spending: the added value of more detailed information
Health Economics Review volume 15, Article number: 25 (2025)
Abstract
Background
Component-based projections are commonly used to predict future growth in healthcare spending. The current study aimed to compare pure component-based projections to projections using microlevel data to investigate their added value.
Methods
The microdata was used to find disease-specific time trends in the number of patients that use hospital care and in annual per patient hospital spending (APHS). Total expenditure projections were then based on APHS and hospital use per disease category combined with demographic projections. As comparator, we used projections with a composite growth term derived from total spending time trends. Furthermore, extensive uncertainty analyses were performed.
Results
Time -trends were present both in hospital care usage and in annual per patient hospital spending (APHS) for most disease groups. What is known as the “residual growth” category in many projections of healthcare spending can be split into these two time- trends, offering more insight into their sources. The advantage of explicit modeling as done in this paper is that trends in usage and per patient spending can be separated. The use of microdata allowed further refinement of component-based models for projections in healthcare spending and a more elaborate analysis of uncertainty surrounding these projections.
Conclusions
We found time trends in both hospital care usage and APHS in most disease groups. Incorporating these trends into cost projections for various disease groups results in more conservative estimates of future hospital spending compared to merely using demographic projections of per capita costs and adjusting them for observed historical growth. The use of microdata for component-based modelling has benefits but also downsides. A positive side of using microlevel data is that individuals could be followed over multiple years, a downside was the vast amount of computing power and time needed to perform these extensive analyses. Our results could support policy makers to adjust for hospital (staffing) capacity not purely on demographic changes but also based on observed trends in the use of specific types of hospital care, per disease.
Introduction
The world population will age at an increasing pace [1]. Especially in the western world, for example the share of elderly will reach 29% in the USA [2] and 30% in the European Union in 2100 [3].This demographic shift in the population could have consequences for healthcare utilization and healthcare expenditures, although the direction and extent of the change is subject to debate (e.g. [4,5,6,7,8]).
To adequately address future challenges, high-quality projections of future health cost growth are required. The European Commission uses different scenarios for future healthcare expenditure growth. One scenario estimate is that due to higher life expectancy, healthcare expenditures will increase in member states by 1.1 percent points of GDP between 2016 – 2070, implying growth of 20%. The underlying assumption is that the number of years that people live in good health will not change but life expectancy will increase. Another scenario assumes healthy aging i.e., the number of years in good health increases with life expectancy. In that scenario healthcare expenditures increase by 0.3 percent points of the GDP, or 5% growth over the same period [9]. The difference between the two scenarios highlights the importance of assumptions on future epidemiological trends. Both projections assume constant healthcare use given disease. However, next to demographic and epidemiological trends other developments influence future healthcare spending, including technological change, changes in healthcare systems and medical practice.
Projections of future healthcare spending are possible at different levels of aggregation, depending on the research or policy questions to be answered. In an overview of projection models used in the OECD, Astolfi et al. [10] distinguishes models at three levels: micro-level models taking the individual as the unit of analysis; component-based models where the unit of analysis is a group, e.g., the population is split in age – sex categories; and macro-level models with the whole sector as the unit of analysis.
The aggregation level of the models determines how detailed the process of generating healthcare expenditure can be modelled, and which insights from the model can be gained. The main purpose of macro-level models is to forecast total healthcare expenditures in relation to historical trends. Data should be available for a longer period so that the trend can be estimated. The projections are either purely based on historical healthcare expenditure data [11,12,13] or take estimated future trends of other macro-level variables into account, such as GDP, population size and changes in the demand for healthcare services [14,15,16].
Component-based models offer insights on group-level effects in the population, such as the relative increase of the elderly population due to aging of the society. Models in this category often take the national demographic forecasts concerning population sizes by age and sex category as input and keep all other variable values constant (e.g. healthcare usage rates) to demonstrate the effects of demographic trends on healthcare expenditure [17,18,19,20]. More sophisticated models, e.g. [4, 21,22,23,24] include additional trends, such as mortality or morbidity rates per age and sex group to account for epidemiological trends too.
Micro-level models simulate the entire process from individual health behavior and health risks, such as smoking, through developing a disease up to total healthcare demand and expenditures [25,26,27,28]. These models can be used to evaluate the expected effect of changes in lifestyle or risk factors on healthcare spending and analyze the potential effect of policy interventions. Micro-level models require individual microdata on a wide range of variables to construct a realistic representation of life courses related to health. These models can be extensive and are usually developed for policy evaluation rather than projection of future spending.
The current study aims to demonstrate how individual microdata can be applied to enrich a component-based model to improve projections of total healthcare spending, without the development of a complete micro-level model. Special attention is paid to trends in costs by disease group and usage of care over time as aspects that can be included in projections based on microdata. Projections accounting for trends in costs and in usage of care were compared to the alternative model that uses a correction factor based on unexplained historical growth.
Most studies on healthcare spending projections pay little attention to the uncertainty surrounding their projections. This uncertainty originates from multiple sources. One main source of uncertainty is that the future is uncertain, that is, exogenous future trends have high uncertainty. For example, future migration patterns are very uncertain [29]. A second source of uncertainty arises from the selection of models applied to estimate components of current spending, that is, structural uncertainty. A third source of uncertainty is parameter uncertainty reflecting that the models are estimated from a sample and hence have uncertain coefficients. In a component-based model, several uncertain elements are multiplied and added. Estimating the full compounded uncertainty is another aim of the current study.
The current study uses a three-step approach, combining population forecasts of Statistics Netherlands with new projections of hospital care use and detailed trends on annual per patient hospital spending (APHS) estimates. Attention was paid to the role of uncertainty and time trends in APHS and care usage. This research is focused on hospital care as the largest healthcare sector in terms of expenditure in the Netherlands [30].
Methods
A component-based model enriched with microdata
For the estimations of future hospital care spending we looked at the main cost drivers of care, the number of people making use of care and the costs of the care that is delivered. To determine the number of people making use of hospital care we looked at the total number of people in the population and the share and trends in hospital usage per disease (group). When looking at the costs of the care delivered we looked at the amount of care that is delivered in a year for a certain disease per patient, and the trends in these costs.
By this our hospital spending projections were based on three main elements (Fig. 1); a population forecast, a model of hospital use over time and a model for per patient annual hospital spending over time. These elements were combined in a cost-of-illness projection and were compared to a common alternative projection method. All analyses were performed with 2019 prices, and spending data from years before 2019 were converted to 2019 price levels using the GDP deflator to correct for the effect of inflation. Future expenditures were also projected in 2019 prices.
Traditional component-model based projections; the comparator
To validate and compare our projections with existing literature we applied a traditional component-based model with unexplained growth factor as a comparator. Equation A.1 in Appendix A describes how hospital spending projections were calculated in previous component-based model analyses for the Dutch Cost-of Illness studies. This method has been used in other studies [31, 32] and is based on group-level data. In essence, the Dutch Cost-of Illness projection combines population projections with per person, as opposed to per patient, annual spending, both stratified by age and sex. A correction factor for residual growth is then applied. The correction factor is calculated as the difference between time trends as suggested by initial projections and actual time trends, based on historical data. This is again done at the group and disease-level. Usually, the trend is calculated by comparing the most recent to the oldest historical data available. Equation 2.1b represents this approach, called the comparator approach in the rest of the paper. Projections with the comparator approach were calculated with the same data as this studie's main projections.
Data and data preprocessing
A rich microlevel dataset, containing individual level hospital claims data over the years 2012 – 2019 was provided by the Dutch Healthcare Authority (NZa). The dataset covers the whole Dutch population and contains information on diagnosis, age, sex and hospital spending in euros. Claims on intensive care days and certain expensive medicines that are not linked to a diagnosis were ignored. We also excluded claims with missing patient identification numbers. Furthermore, we included only patients with an address in the Netherlands so that our analyses are consistent with the population forecasts of Statistics Netherlands. A small proportion of patients had an unknown sex and were consequently excluded from the analyses. For detailed statistics of the claims included in and excluded from the analyses, see Appendix B.
The final dataset, covering about 80% of total hospital spending [33], was categorized by disease and disease group. The grouping was based on the Dutch Cost of Illness studies [30, 34], which distinguishes 17 overarching disease groups with about 130 diseases defined by ICD10 codes, and a rest group to contain all remaining hospital spending (Appendix C). Based on specific Medical Specialty Diagnosis Codes, as available in the data, ICD10 codes and consequently disease (groups) were assigned to each claim. Every year, a disease (group) was assigned to a patient when the patient had a minimum of one claim in that year for the given disease (group). This way, disease groups were assigned to claims and to individuals.
Analyses were performed for all (17) disease groups and the rest group, and for all eight diseases within the group of cardiovascular diseases as an illustrative detailed example. Projections over the 17 disease groups and the rest group were then added to present total projected hospital spending (excluding intensive care and part of expensive medicines). All spending outcomes were presented in 2019 pricing. For all analyses in this study, the statistical software R version 4.2.2 and RStudio 2022 were used. For data preparation, the packages data.table, tidyverse, fst and plyr were used; for analyzing we used stats, splines and gamlss; while scales, patchwork and ggplot were used for plotting [35,36,37,38,39,40,41,42].
The population forecast by Statistics Netherlands (CBS) was used to project future demography [43]. This forecast consists of projections until 2070 by age and sex category for ages 0 to 99 +. For our current study projections until 2050 were used. As in the Netherlands only active military personal (40 thousand people) are exempt for the mandatory health insurance, we assume the population forecast is equal to the insured population forecast.
Total hospital spending projections
First, both methods of projections were formalized. The projection using microdata calculates total hospital spending of disease (group) \(k\) in year \(y\) by aggregating the estimated spending in disease (group) \(k\) over all sex and age categories. (Fig. 1)
where \({DC}_{y,k}\) are the total costs in year \(y\) for disease (group) \(k\). The index \(s\) stands for sex (male or female) and \(a\) for a persons’ age ranging from zero to 99 + , aggregating ages above 99 into the group of 99+ years. Furthermore, ny,a,s denotes the estimated number of persons in the population, py,k,a,s denotes the percentage of these persons visiting a hospital for disease \(k\), in year \(y\) and APHSy,k,a,s denotes the estimated annual per patient hospital spending in year \(y\) for disease (group) \(k\) in the age-sex category \(a\),\(s\).
The comparator formulation for 2019 can be formalized asfollows:
where \({r}_{k}^{y-2019}\) is the correction factor for residual growth. Further details on the calculation and interpretation of \({\text{r}}_{\text{k}}\) are described in Appendix A. Note that \(p\) and \(APHS\) are set to their estimated values for the base year (in our analysis 2019). While often \(p\cdot APHS\) are combined into a single estimate of spending per person, rather than per patient, for comparability to 2.1a, in this study we split them explicitly.
In the next subsections it is explained how for each component of these projections is estimated from the microdata, including information on uncertainty of these estimates.
The distribution of hospital costs of disease-groups
Uncertainty analysis requires the distribution of hospital spending of age-sex-diagnosis groups (Appendix D). It should reflect both the uncertainty in the average per patient spending (\(APHS\)) and in the number of patients using hospital care (\(p\), see section below). A specific subgroup of age and sex within a disease group is used for illustration. For readability, subscripts \(a\), \(s\) and \(k\) were omitted. Given these subscripts the results of this praragraph apply to the population correspronding to those subscripts. The total costs of a group \(k\) consisting of \(N\) persons, where each person \(i\) has a random annual per patient hospital spending \(APH{S}_{i}\) is then defined by
To estimate the distribution of \({AHS}_{total,N}\), total spending for disease \(k\), it is important to note that both \(N\) and \({APHS}_{j}\) are stochastic. The number of care users \(N\) was assumed to be binomially distributed, given a demographic group of size \(n\), and assuming each person having an identical probability \(p\) of using hospital care in a certain year. Furthermore, each user \(i\) has random annual hospital spending \({APHS}_{i}\). The claims data displayed that the distribution of the \({APHS}_{i}\) at the individual level does not follow a smooth shape since it shows peaks at multiples of the underlying DRGs. It is also strongly rightly-skewed.
The random variable of total costs \({AHS}_{total,N}\) can be split into a mixture of sums of fixed length \(j\). This is done using the law of total probability. Each term in this mixture is a sum \({\sum }_{i=1}^{j}APH{S}_{i}\), weighed by the probability of that number of hospitalizations occurring. Fixed sums of variables are well estimated by the central limit theorem. Given the expected expenditure \(\mu\) of an individual and its variance \({\sigma }^{2}\), the sum \({\sum }_{i=1}^{j}APH{S}_{i}\) is approximated by a normal distribution with mean \(E({\sum }_{i=1}^{j}APH{S}_{i})=j\mu\) and a standard deviation \(\sqrt{j}\sigma\). Following the steps in Appendix D shows that \({AHS}_{total,N}\) is estimated by
With \(\text{P}\left({AHS}_{total,N}\le z\right)\) being the probability of the annual hospital care spending of group \(k\) being smaller or equal to \(z\); and \(\Phi\) is the cumulative distribution function of the standard normal distribution. Note, \(n\) is the total number of persons in a certain age-sex group, while \(p\) is the probability for each of them to be using hospital care for disease \(k\).
In summary, the parameters \(\mu\), \(\sigma\) and \(p\) fully characterize the approximated distribution of total annual spending for disease-group \(k\) for a demographic subset using Eq. (2.2.2). These parameters were estimated from the available microdata for each calendar year, by age and sex group.
The projected number of patients in hospital care
Logistic regression was used to model the probability of hospital care use for each disease (group) \(k\), for simplicity the subscript \(k\) is omitted, but all analyses were performed stratified by disease (group).
The function \(b{s}_{1}(a, s)\) was modelled as a B-spline. A separate B-spline was fitted for each sex with knots at ages 5, 20, 35, 50 and 70 years. Furthermore, \(y\) is the continuous variable for time trend effect (calendar year, where 2012 is 0), with coefficient \(\tau\), while a dummy was used to account for the effect of administrative changes of claim registration in year 2015, with coefficient \(\omega\). For the population size per year by age and sex, \({n}_{y,a,s}\), the total number of insurees in the dataset was used to fit the model.
This model predicts the individual probability of use of hospital care for disease \(k\) by age and sex, for year \(y\). Separate models were estimated for each disease and for each disease group. The age knots of the B-spline were selected so at represent relative variations in care utilization with age for a wide spectrum of diseases. Only when these knots very clearly did not fit, that is, models did not converge, an alternative location of knots was applied (e.g. pregnancies and perinatal conditions).By assuming that the trend \(\tau\) is constant in the future, hospital usage can be projected towards the future by disease (group) for each age/sex category.
Estimation annual per patient hospital spending
Annual per patient hospital spending had a right skewed distribution and hence a generalized linear model with a gamma distribution was used, where both the scale \({\theta }_{a,s,y}\) and shape \({\kappa }_{a,s,y}\) parameters were estimated as functions of age, sex and calendar year.
Similar to the logistic model of equation 2.3, splines were applied for both the scale and the shape parameter of the gamma distribution to model nonlinear effects of age and sex on costs. Effects of calendar year were assumed to be linear and included an additional dummy to represent 2015 exception effects, due to administrative changes.
As much as possible, this same model was applied to all disease(groups). However, For the diseases coronary heart disease, heart failure, stroke, peripheral arterial vascular disease and arrhythmias the knots were not applicable because of the low number of observations below the age of 20 years. For these diseases, only knots at age 35, 50 and 70 years were applied in Eqs. 2.4.1b and c. Furthermore, the disease groups ‘pregnancy, childbirth and puerperium’ and ‘conditions originating in the perinatal period’ only occur in certain age and sex groups. For these two disease groups, the splines were replaced by dummy variables. Instead of the continuous variable of age \(a\), age categories \({a}_{cat}\) were applied for these diseases. For ‘pregnancy, childbirth and puerperium’ these were 10 years age categories until 50 years and a group 50 + . For ‘Conditions originating in the perinatal period’ these were categories 0, 1, 2, 3, 4, 5 – 9 years, and 10 + years.
After fitting the models for each disease \(k\), the estimated mean and standard deviation of per patient hospital spending were computed for each age and sex using the fitted Gamma distribution’s shape and scale parameters by
These values could then be substituted into formula (2.2.2).
Validation
To validate our results, a sensitivity analysis was performed with trends estimated based on the period 2012–2017. The resulting projections for the years 2018–2019 could then be compared to actual observed hospital care use, per patient hospital spending and total hospital spending.
Additionally model fit was compared to input data for each model separately and our results were cross validated to other published projections of Dutch hospital spending.
Results
The Dutch (insured) population has grown from 16.8 million in 2012 to 17.8 million in 2019 (Table 1), while the mean age increased by 1.36 years. The share of people that utilized hospital care for one or more diseases decreased from 41.7% in 2012 to 39.8% in 2019. Hospital care expenditure per patient, expressed in 2019 prices, fluctuated over time.
Data from the year 2015 show a remarkable drop in the percentage of patients using hospital care and in hospital expenditure. This drop is due to an administrative change in the registration of claims.
Annual per patient hospital spending
Figure 2 A shows the average annual per patient hospital spending (APHS) by age and sex for the disease group ‘circulatory system’, while Fig. 2B shows how the standard deviation of APHS varies with age and sex. The model shows a good prediction of the mean APHS by age and sex. The model clearly underestimates the standard deviation. Similar results were found for most disease groups. Experimenting with different model specifications did not improve the fit of standard deviation.
Table 2 summarizes the results of the time trends in the model (model 2.4.1) for the disease group cardiovascular diseases and the underlying diseases in that group. In Appendix E the time coefficients for all disease groups can be found. For cardiovascular diseases all time trends were close to one except for the ‘other vascular disorders’, reflecting relatively small trend effects in the observational data over the period 2012–2019.
Figure 3 shows the effect of accounting for time trends in APHS on spending projections. In these figures, the proportion of patients visiting a hospital is assumed to be constant at the level of 2019. The projection without a time trend in APHS, therefore, corresponds to a simple demographic projection.
The disease group as a whole,’diseases of the circulatory systems’, shows a small effect of the time trend, reflected by the coefficient of 0.999 in Table 2. Of note, even coefficients close to 1 will result in relatively large differences in projections, when used over a sufficiently long time-horizon.
Number of individuals using hospital care per disease
Figure 4 shows the number of people visiting a hospital in a certain year for a specific disease by age and sex in the case of diseases of the circulatory system. Table 3 summarizes the results of the time trends in the model (equation 2.3) for the disease group cardiovascular diseases and the underlying diseases in that group. In Appendix E the time coefficients for all disease groups can be found. For cardiovascular diseases all time trends were negative. Some other diseases groups, such as neoplasms, showed an increasing time trend.
With the inclusion of hospital usage trends, a projected number of 11.9 million people will visit the hospital in 2050 for a specific disease (Appendix F). This includes double counting when a person visits a hospital for two diseases. Ignoring the effect of the time trend (artificially freezing the proportion of people visiting a hospital on the level of 2019 and only applying demographic projection) would result in a projected 14.0 million people visiting for a specific disease per year in 2050. That is, accounting for trends in hospital use will lead to more conservative projections on average.
To illustrate matters further, for cardiovascular diseases more details are provided (Fig. 5). Ignoring hospital use time trends a growth from 1.05 million people in 2019 to 1.44 million people in 2050 visiting the hospital for cardiovascular diseases is projected. Adding time trends in the projection the total number of hospital patients is expected to decrease to 0.73 million in 2050. The disease group cardiovascular diseases includes in total eight underlying diseases. All eight diseases show lower projected numbers of patients after the inclusion of time trends compared to ignoring time trends. For other disease groups (e.g. neoplasm) the projections for the underlying diseases showed a larger variation in time trends (see Appendix G).
Cardiovascular hospital use projections. Projections of the absolute number of individuals using hospital care as projected with and without accounting for time-trends in the percentage of individuals using hospital care for the overarching disease group cardiovascular diseases (H7) and the underlying diseases
Projections of the absolute number of individuals using hospital care as projected with and without accounting for time-trends in the percentage of individuals using hospital care for the overarching disease group cardiovascular diseases (H7) and the underlying diseases.
Note: The number of people in the different diseases does not necessary add to the total number of people in the disease group as people in the disease group can visit the hospital for multiple underlying diseases and hence be included in several subgroups.
Hospital care spending projection
Combining the results on APHS and hospital care usage, projected hospital spending by age and sex over time was calculated. The aggregated results over age and sex are presented. Figure 6 illustrates hospital spending projections for the group of cardiovascular diseases and the underlying diseases. The red line shows spending projections when only demographic change is considered, which would reflect the most common and simple component-based model method. Only taking demography into account, the expected hospital spending on cardiovascular diseases will increase over time due to an increasing and aging population. The green line shows the projected spending according to the method applied in this paper, accounting for a time trend in both the annual per patient hospital spending and the number of persons visiting a hospital. The blue line illustrates projected spending using the comparator method, adjusting the demographic projections with a residual growth factor. This residual growth factor is an estimate for the combined effects of time trends in APHS and in the number of patients visiting a hospital, based on comparing two reference years, 2012 and 2019.
For most diseases within the group of circulatory diseases, the projections of the comparator and the applied novel method in this current study were in the same direction and were closer to each other than to the demographic projection. However, for stroke and “other disorders of the heart” this was not the case. The advantage of the method used in the current study is that it offers more insight than the comparator method into the source of deviation from the demographic projections. For example, the demographic projections showed an increase in the spending on coronary heart diseases in the coming decades, while accounting for time trends, a decrease was projected. The current study splits this decrease into the effect of a time trend in the number of patients visiting the hospital and a time trend in the APHS. As previous figures showed, the decrease in the spending on coronary diseases was mostly due to the projected decrease in the number of patients visiting the hospital. Estimating time trends using regression models based on all available annual data rather than comparing two reference years, further influenced the spending predictions over time.
Combining all 18 disease groups, Fig. 7 illustrates total hospital spending projections. When only accounting for demographic changes an increase of 5.3 billion euros was projected over the years 2019–2050 (red line). Including time trends, according to the method presented in the current study, (green line) resulted in a more moderate projected growth of 1.1 billion euros. The comparator method (blue line) showed a higher increase of up to 8.2 billion euros. This major difference in results was due to a difference in projected spending for a couple of large disease groups, such as neoplasm. Results for all disease groups can be found in Appendix H.
Uncertainty
Figure 8 illustrates the uncertainty of the spending projection for specific age and sex groups of patients visiting a hospital with cardiovascular diseases, considering the uncertainty in the number of patients visiting the hospital and in the APHS. The grey areas represent the 95% prediction interval around the projected spending (black lines). Note that the limits and scale of the y axis differ per row in the figure.
The width of the prediction interval increases for smaller groups of patients. Uncertainty is also larger with projections farther into the future. For example, the width of the interval increases from 25 to 38% of the mean projected spending between 2019 and 2050 in the case of the 0 years old male patients. In case of 60 years old male patients, the width increases from 6 to 11% between 2019 and 2050. Overall, parameter uncertainty was largest for 0 years old male patients. Compared to the differences between projection methods or model specifications, the effect of parameter uncertainty was small.
Figure 2B showed that the model on APHS underestimated the standard deviation. This suggests that the calculated uncertainty at population level is an underestimation as well.
Discussion
Main findings
Time trends were present both in hospital care usage and in annual per patient hospital spending (APHS) for most disease groups. What is known as the “residual growth” category in many projections of healthcare spending can be split into these two time trends, offering more insight into their sources. As with all projection methods, it remains fundamentally uncertain whether observed trends over the period 2012–2019 will continue in the future, as the COVID-19 pandemic has illustrated, for example. The advantage of explicit modeling as done in this paper is that trends in usage and per patient spending can be separated. The latter also includes the “technological growth” part that is usually seen as most of the rest-factor.
The use of microdata allowed further refinement of component-based models for projections in healthcare spending and a more elaborate analysis of uncertainty surrounding these projections. These projections illustrate how microdata can enrich cost of illness predictions. Yet, the analysis of uncertainty was limited to parameter uncertainty, while structural uncertainty and uncertainty in demographic projections will contribute most to the overall uncertainty.
The projections predict growth in overall hospital spending over time in the Netherlands and in many underlying disease groups. However, our projections showed a more conservative estimate of total hospital spending growth than the comparator method or the pure demographic projection.
Uncertainty
Multiple sources of uncertainty are encountered. First, the uncertainty of the true distribution causes uncertainty in the observed costs. The decision on what kind of model to use is to be made based on the distribution of the underlying data and the fit of the model. The use of diverse models can generate a variance in results. We applied the same basic model as much as possible for all disease groups in this research, this could enhance uncertainty but increases comprehensibility.
The current projections are uncertain, being based on only 8 years of observations to estimate time trends. Having a limited period to fit a trend can be challenging, especially when policy changes have possibly affected the response variable. This relates to two different sources of uncertainty. The first uncertainty concerns the possibility that the eight years of data do not contain enough information to accurately fit the time trend. The second uncertainty is that despite a good fit of the trend, costs can be significantly influenced by policy changes. Parameter uncertainty, as was captured by the approximation methods, only offers partial insight into this uncertainty. Using different models rather than the current GLM with Gamma distribution and logarithmic link function and compare results would enhance the insights into uncertainty inherent in the chosen model, that is, the structural uncertainty, while working with uncertainty ranges around demographic projections would help quantify the uncertainty introduced by these projections. Even though the relative short time period comes with uncertainty for these projections, we do see added benefits of presenting long-term projections. Medical staffing adjustment policies often take a long time period to have effects, due to long training time of medical personal, often more than a decade. Because of the longevity in these adjustments it is important to know trends in usage for a longer time period. To address the influence of uncertainty from the time trend estimates, we performed a sensitivity analysis, using only data over the period 2012–2017 to estimate projections for 2018–2019 and then compared these to actual spending in these years (see Appendix I). Results were satisfactory and showed that the novel method outperformed traditional component-based projections.
Use of microdata
The use of microdata for component-based modelling has benefits but also downsides. A positive side of using microlevel data is that individuals could be followed over multiple years, which would allow adding further covariates like comorbidity or disease history into the models. To determine comorbidities multiyear, multiple healthcare sectors, and/or medical history microdata is needed. Individual level data was also indispensable to model uncertainty in APHS.
One of the downsides of working with microdata on population level healthcare spending projections was the vast amount of computing power and time needed to perform these extensive analyses. However, efficient coding approaches could help to increase feasibility. Another downside of the use of microdata is data availability. Due to privacy regulations, it is difficult to get access to individual level microdata. This could influence the replication of research.
Demographic cost projections often do not align with the true growth, requiring correction with an unexplained historical growth component. The current study replaced this correction factor by two explicit models, one for disease-specific prices and one for disease-specific volumes. Our results showed that these disease-specific trends overall implied more conservative projections of hospital spending growth than the comparator method. Hospital use and per patient hospital spending were declining over time for half of the disease groups in our dataset.
Limitations and strengths
The models used in this paper only concerned hospital spending and no other healthcare sector. This was a deliberate choice, since it is the only healthcare sector where a full microdata dataset is available with clear diagnosis codes of all insured in the Netherlands. For other Dutch healthcare sectors, the diagnosis information is not as rich, and it would be harder to follow individuals over a longer period. Further investigation is therefore needed into how and to what extent this methodology can be used in other healthcare sectors. For settings outside the Netherlands, with wider availability of diagnosis information, the approach could be followed in other sectors as well.
Given our data concerning hospital spending, trends in care use referred to visits to the hospital for a certain disease each year, which is different from disease prevalence. Substitution of care, where the location of providing care changes, could influence hospital usage. For instance, when GPs take over tasks from hospitals, hospital usage would decrease, while disease prevalence could stay the same or even rise. When interested in disease prevalence, the whole (health) care system should be considered. This requires data with sufficient diagnostic information in all healthcare sectors.
While our models estimated the mean APHS very close to the real data (Fig. 2A), they showed consistent under estimation of the standard deviation (Fig. 2B). We used a gamma-distribution to model this effect, which is a common approach for modeling right-skewed data. Nevertheless, the tail of the Gamma distribution applied underestimated the tail in the actual data. Consequently, the model tends underestimate the standard deviation. However, even with a twice higher standard deviation, resulting prediction intervals would still be very narrow, thanks to the very large sample sizes in our dataset. Uncertainty was quantified per disease group on the level of age and sex. The current implementation did not allow for computation of uncertainty at an aggregated level, for example the uncertainty of total costs added over the whole age range for a single disease group. To obtain such estimates would require applying convolution to the cost-distribution by sex and age to find the cost-distribution for the whole group. This is possible numerically but was computationally too expensive and hence left for future research. The analyses would take several days and were not expected to offer much further insight, given the small size of the parameter uncertainty compared to the effect of structural uncertainty. It seems more important to focus on further sensitivity analyses using different models for APHS and hospital usage, as well as investigating the influence of various demographic scenarios [44].
Due to evolving healthcare technology the life expectancy of patients with certain diseases is increasing and some diseases are transforming into chronic diseases. This is the case, for example, for certain neoplasms (cancers). As most neoplasms in a chronic phase are still treated in hospital settings this would mean that the total number of patients keeps growing while incidence could be decreasing. For the long term this could influence APHS, and we would expect to see rising usage with decreasing APHS, as often most costs for neoplasms occur during the first years. When such time trends are not well aligned, projections based on historical data could result in large over- or underestimations of spending. This indicates that using linear time trends is always surrounded by large uncertainty, not sufficiently captured in estimates of parameter uncertainty.
Furthermore, projections for each disease (group) should preferably be checked by experts with sufficient knowledge on the disease (group) and its care. For this current methodological paper, no such expert checks were performed, and our results should hence be interpreted with care. Specifically, we aimed at uniformity in the models applied and only used alternative models for a better fit on APHS when the Gamma models did not converge. Using expert panels, a deliberate choice of the best model could be made for each disease group separately. Especially the disease groups where the comparator method and the new method present diverging results (e.g. symptoms and injuries) should be considered with caution and analyzed by healthcare experts to interpret the source of deviation.
In total eight years of data were available for the current study. For some disease groups with a large variation in total spending per year this may be insufficient to estimate time trends. However, the choice of period to estimate a time trend should balance length of follow-up versus stability of the healthcare system. For the Netherlands, data before 2012 is less useful since the DRG system changed considerably pre-2012. Such caveats will introduce uncertainty in the estimated time trend. For the same reason, a dummy variable was applied for 2015 to counter inconsistencies caused by known changes in administration in that year.
The 8-year period used in this analysis is relatively short. Most long-term projections of healthcare spending use longer periods to assess macroeconomic trends (e.g. OECD [45], RIVM [32], Dieleman et al. [14]), but most studies using short-term micro-level data, e.g. microsimulation models (e.g. Gaudette et al. [25], Thiebaut et al. [28]), render projections only for the short term. Our method demonstrates that in combination with long-term historic trends, short-term micro-level data can be incorporated into long-term projections. While additional micro-level data would enhance robustness of individual disease trends, the addition of epidemiological trends appears to enrich current methodologies to estimate long-term health expenditures, as is also reflected in our comparison to real world data over the period 2017–2019 (Appendix I).
Strong points of our approach are the large dataset covering almost all Dutch inhabitants. In The Netherlands, health insurance is mandatory, and the size of the dataset implied a relatively small parameter uncertainty. Previous studies on projections ignored uncertainty, while our study highlights the importance of model choice and method of projection for the resulting outcomes.
Policy implications
Policy makers require accurate cost projections to adequately plan and assess policies. Our study aimed to show how the correction factor of unexplained growth could be split into changes in number of patients and changes in per patient annual hospital spending. What is still not captured in these models are the effects of novel diagnoses and treatments. Partly, such effect should be reflected in the time trends of APHS, but some are hard to estimate from historic data. Our results demonstrate the importance of additional research into the unexplained historical growth component and the value of more explicit models to capture these effects.
The Netherlands has a history of agreements to contain total hospital costs, with the latest agreement aiming to cap real volume growth at 0% in 2026 [46]. Based on our projections, this would be a realistic target, conditional on continuation of current policies, for quite some disease groups and for total health care spending. However, this may result in unrealistically low hospital usage when currently observed trends in the reduction of use of care per patient are projected into the future. To account for this, careful expert-based reflection on projections per disease category would be needed. Furthermore, novel drugs and treatments could pressure hospital budgets and put additional strain in dwindling hospital margins.
After expert-based validation, our results could support policy makers in planning for increasing or decreasing hospital capacity for the coming years. Expert-based validation could be conducted for each of the 137 diseases and conditions separately with field experts and hence would require a lot of clinicians and epidemiologists. This was out of the scope of the current methodological study. More detailed projections allow policy makers to adjust hospital (staffing) capacity not purely on demographic changes but also based on observed trends in the use of specific types of hospital care. The Netherlands employs explicit nursing and medical specialist training and education planning to ensure adequate staffing in the future. Given the longduration of medicine education (6 years in the Netherlands) and additional medical specialization (approximately 6 more years for most medical specializations), long-term projections are valuable. Long-term hospital usage projections, combined with macro-level trends would be valuable for adequate staffing; mere epidemiological projections may insufficiently capture long-term macroeconomic trends (e.g. productivity, technological innovation), while projections based on historical growth may introduce historical inefficiencies to expenditure growth. Our approach could suggest more conservative staffing planning than currently proposed for some specialties (e.g. cardiovascular care) and an increase for other groups (e.g., neoplasm care) [47].
Our results also indicate disease groups that will be the large cost drivers for the coming decades, such as neoplasms. From a policy perspective, it would make sense to start preventive measures to reduce patients or costs per patient, depending on what drives spending growth in these disease groups.
The approach can be readily employed in other countries with DRG-based hospital reimbursement systems, which includes most OECD members [48]. International comparison does depend on country-specific factors; for example, the Netherlands has relatively low hospital visits, with many chronic conditions being treated in a primary care setting. This could result in underrepresentation of chronic conditions and overrepresentation of acute conditions in total spending trends; on the other hand, if other countries would move towards the Dutch trend of outpatient treatment of chronic conditions, this could limit overall hospital growth. In that sense, international comparison of (effects of) epidemiological trends would be valuable.
In sum, this paper displayed how the use of microdata can enrich the insights that component-based models provide regarding healthcare spending projections. We have shown that the projected developments in spending can be split into changes due to developments in hospital usage and in changes due to developments in the mean annual hospital spending per patient. Interestingly, for quite some disease groups this results in more conservative projections of future hospital spending than would be obtained simply taking a demographic projection of costs per capita and correcting these for observed historical growth.
Data availability
Data supporting this study cannot be made publicly available due to privacy and ethical concerns.
References
Report on ageing and health. World Health Organisation. 2015, p. 246.
2023 National Population Projections Tables: Main Series. Projections for the United States: 2023 to 2100. Census Bureau United States of America. 2023.
Population on 1st January by age sex and type of projection (EU). 2020 - 2100. Eurostat. 2020, Eurostat Editor.
The “red herring” after 20 years: ageing and health care expenditures. Breyer, F. and Lorenz, N. 2020, Springer.
Costa-FontCosta-Font J, Vilaplana-Prieto C. ‘More than one red herring’? Heterogeneous effects of ageing on health care utilisation. Health Econ. 2020;29:8–29.
Aging and health care costs in Oxford Research Encyclopedias: Economics and Finance. Karlsson, M., Iversen, T. and Øien, H. 2018, Hamilton Editor Oxford University Press.
roximity to death and health care expenditure increase revisited: A 15-year panel analysis of elderly persons. von Wyl , V. 2019, Health economics review, pp. 1–16.
Zweifel P, Felder S, Meiers M. Ageing of population and health care expenditure: a red herring? Health Econ. 1999;8:485–96.
The 2018 Aging Report: Economic & Budgetary Projections for the 28 EU Member States (2016–2070). Directorate-General for Economic and Financial Aff. 2018, European Economy Institutional Papers.
Astolfi R, Lorenzoni L, Oderkirk J. Informing policy makers about future health spending: A comparative analysis of forecasting methods in OECD countries. Health Policy. 2012;107(1):1–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.healthpol.2012.05.001.
Steiner, C.A.R., Barrett, M. and Weiss, A. HCUP Projections: Cost of Inpatient Discharges 2003 to 2013. HCUP Projections Report. s.l. : U.S. Agency for Healthcare Research and Quality, 2013.
Weiss, A.J., Barrett, M.L. and Andrews, R.M. Trends and Projections in Inpatient Hospital Costs and Utilization 2003–2013: Statistical Brief #175. Agency for Healthcare Research and Quality (US). Rockville (MD) : Healthcare Cost and Utilization Project (HCUP) Statistical Briefs, 2014.
Weiss, A.J., Barret, M. and Andrews, R.M. Trends and Projections in U.S. Hospital Costs by Patient Age 2003–2013: Statistical Brief #176. Rockville (MD) : Agency for Healthcare Research and Quality (US), 2014.
Dieleman JL, et al. National spending on health by source for 184 countries between 2013 and 2040. Lancet. 2016;387:2521–35.
Jevdjevic M, et al. Forecasting future dental health expenditures: Development of a framework using data from 32 OECD countries. Community Dent Oral Epidemiol. 2021;49:256–66.
Using morbidity and income data to forecast the variation of growth and employment in the oral healthcare sector. Ostwald, D.A. and Klingenberger, D. 2016, Health Economics Review , p. 6.
Conway A, et al. The implications of regional and national demographic projections for future GMS costs in Ireland through to 2026. BMC Health Serv Res. 2014;14:477.
Harris A, Sharma A. Estimating the future health and aged care expenditure in Australia with changes in morbidity. PLoS One. 2018;13:e0201697.
Kalbarczyk M, Mackiewicz-Łyziak J. Physical Activity and Healthcare Costs: Projections for Poland in the Context of an Ageing Population. Appl Health EconHealth Policy. 2019;17:523–32.
Wong, Z.S.Y., Hoshino, E, Ikegami, N. A Cost Projection of Scheduled Physician Home-Visit Services in Japan: 2014 to 2064. J Aging Soc Policy. 2020:1–16.
Blanco-Moreno Á, Urbanos-Garrido RM, Thuissard-Vasallo IJ. Public healthcare expenditure in Spain: measuring the impact of driving factors. Health Policy. 2013;111:34–42.
Forecasting lifetime and aggregate long-term care spending: accounting for changing disability patterns. De Meijer, C.A.M., et al. 2012, Med Care, pp. 722–9.
. Population ageing and healthcare expenditure projections: new evidence from a time to death approach. Geue, A., et al. 2013, European Journal of Health Economics , pp. 1–12.
Counting the time lived the time left or illness? Age proximity to death morbidity and prescribing expenditures. Moore, P.V., Bennet, K. and Normand C. 2017, Social Science and Medicine , pp. 1–14.
Health and Health Care of Medicare Beneficiaries in 2030. Gaudette, É., et al. 2015, Forum Health Econ Policy , pp. 75–96.
Assessing the future medical cost burden for the European health systems under alternative exposure-to-risks scenarios. Goryakin , Y., et al. 2020, PLoS ONE.
Developing a dynamic microsimulation model of the australian health system: A means to explore impacts of obesity over the next 50 years. Lymer, S. and Brown, L. 2012, . Epidemiology Research International.
Ageing chronic conditions and the evolution of future drugs expenditure: a five-year micro-simulation from 2004 to 2029. Thiebaut, S.P., Barnay, T. and Ventelou, B. 2013, Applied Economics, pp. 1663–1672.
Assessing uncertain migration futures: A typology of the unknown. Bijak, J. and Czaika, M. 2020, Changes.
De Weerdt, A.C., et al. Kosten van ziekten. vzinfo.nl. [Online] 09 05, 2022. https://www.vzinfo.nl/kosten-van-ziekten.
Vonk, R.A.A., et al. Health care expenditures foresight 2015–2060 : Quantitative preliminary study at the request of the Scientific Council for Government Policy (WRR). Part 1: future projections. Bilthoven, the Netherlands : National Institute for Public Health and the Enviroment (RIVM), 2020.
National Institue for Public Health and the Enviroment (RIVM). Methodologie Trendscenario VTV-2018. Versie 2. Bilthoven, the Netherlands : s.n., 2018.
Zorginstiuut Nederland. Kosten Medisch specialistische zorg. Zorgcijfersdatabank. [Online] 06 30, 2023. https://www.zorgcijfersdatabank.nl/databank?infotype=zvw&label=00-totaal&tabel=B_kost&geg=jjverdiepnew22&item=206.
ost of illness: an international comparison: Australia, Canada, France, Germany and the Netherlands. Heijink, R., et al. 2008, Health Policy, pp. 49–61.
R: A language and environment for statistical computing. R Core Team . Vienna, Austria : s.n., 2022, R Foundation for Statistical Computing.
RStudio: Integrated Development Environment for R. RStudio Team. Boston, MA : s.n., 2022, RStudio, PBC.
data.table: Extension of `data.frame`. R package version 1.14.6. Dowle, M. and Srinivasan, A. 2022.
The Split-Apply-Combine Strategy for Data Analysis. Wickham, H. 2011, Journal of Statistical Software, pp. 1–29.
scales: Scale Functions for Visualization_. R package version 1.21. Wickham, H. and Seidel, D. 2022.
patchwork: The Composer of Plots_. R package version 1.1.2. Pedersen, T. 2022.
fst: Lightning Fast Serialization of Data Frames_. R package version. Klik, M. 2022.
Generalized additive models. Rigby, R.A. and Stasinopoulos, D.M. 2005, Appl. Statist, pp. 507–554.
Kernprognose 2021–2070: Bevolkingsgroei trekt weer aan. Stoeldraijer, L., et al. 2021, Statistics Neterlands (CBS).
Baker, J., et al. Forecasting Uncertainty. Cohort Change Ratios and their Applications. 2017;83–105.
OECD. iscal Sustainability of Health Systems. How to Finance More Resilient Health Systems When Money Is Tight? Paris : OECD Publishing, 2024.
Ministry of Health, Welfare and Sport. Kamerbrief Aanbieding Integraal Zorgakkoord: samen werken aan gezonde zorg. 3434315–1034974-Z. 2022.
Capaciteitsorgaan. Capaciteitsplan 2020–2023. Deelrapport 1. Utrecht, the Netherlands : s.n., 2019. https://capaciteitsorgaan.nl/app/uploads/2019/03/Capaciteitsplan-MS-2020-2023-Deelrapport-1.pdf
Innovative providers’ payment models for promoting value-based health systems: Start small, prove value, and scale up. OECD Health Working Papers. Lindner, N. and Lorenzoni, L. Paris : OECD Publishing, 2023, Vol. No. 154.
Acknowledgements
We thank the Dutch Health Authority for providing and giving access to their data for this research, with additional gratitude for Gertjan Verhoeven. We also would like thank our advisory board for giving us insight and directions during this research. Finally, we would like to thank our scientific review committee for their sharp and constructive comments.
Author information
Authors and Affiliations
Contributions
PPK, SG and KK did the data-analysis. All authors contributed to the study conception and design and interpretation of results. PPK, SG, KK, TF drafted the manuscript and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Klein, P.P.F., Gouwens, S., Katona, K. et al. Using microdata as a basis for long term projections of hospital care spending: the added value of more detailed information. Health Econ Rev 15, 25 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13561-025-00607-w
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13561-025-00607-w