Matching 5.0 Alpha (Limited release): Enhanced modeling of child-to-provider distances

CUSP uses 3Si’s proprietary matching algorithm to match children served by providers and funding programs to child population data from the American Community Survey (ACS). Where actual child-level data with home addresses is available from state data sources, the CUSP model uses those records. In instances in which child-level data on children served by providers or funding programs is not available, served children need to be matched with synthetic children in the population data that have the applicable age-income-employment characteristics for providers and/or programs. CUSP’s matching algorithm uses spatial analysis techniques (which take into account child-to-provider distances observed in available child-level data) and specific provider/program eligibility criteria to create a child-level table that has one unique record for every served child.

In Q2 2023, 3Si completed a limited alpha release of a major upgrade to its CUSP matching algorithm. A key enhancement in Matching 5.0 Alpha is the implementation of probabilistic modeling of child-to-provider distances and geolocations for children for whom address data is not available. With this new feature, CUSP now applies a powerful approach to create realistic and controlled spatial patterns that closely resemble real world geographic distributions of children around providers. Matching 5.0 Alpha represents an enhancement over prior versions that used optimized placement techniques that factored in geographic constraints but not variations based on funding programs. More importantly, where prior versions of the matching algorithm required manual adjustments to sampling rates based on geographic constraints, the probabilistic approach in Matching 5.0 Alpha requires no manual adjustments.


  • Matching 5.0 Alpha uses a probabilistic approach to model child locations based on empirically observed distributions of child-to-provider distances in available child-level data from state and other program data sources. Specifically, the model assigns geolocations to synthetic served children based on probabilities calculated from an optimal probability distribution, selected based on known statistical patterns observed in the data and the desired spatial patterns that the model seeks to achieve.
  • The parameters of the selected probability distribution determine the shape and spread of the distribution of child locations, allowing flexibility and control in modeling child distances in different geographic regions (e.g., rural vs. urban areas) based on their characteristics and the child-level data available for them.
  • The assignment of children to child care providers based on this probabilistic approach is more accurate and representative of child-to-provider distances across geographies and funding programs in actual child-level data.


  • More comprehensive estimation of child-to-provider distances for all served children in the database.
  • Probabilistic modeling of served children’s geolocations results in spatial patterns that reflect real-world distributions of child-to-provider distances.
  • Higher replicability of spatial patterns of children’s addresses allows greater consistency across data releases. The matching algorithm can use the same probability distribution with identical characteristics to generate similar spatial patterns of child-to-provider distances over multiple data releases.
  • The inherent flexibility of probabilistic spatial modeling allows adjustments to the parameters of the statistical probability distribution—for example, degree of clustering, dispersion, and spatial density—to refine spatial patterns of the geolocations of children served by different funding programs.