Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations

Avrachenkov, Konstantin ; Borkar, Vivek S. ; Kadavankandy, Arun ; Sreedharan, Jithin K. (2018) Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations Computational Social Networks, 5 (1). ISSN 2197-4314

[img] PDF
2MB

Official URL: http://doi.org/10.1186/s40649-018-0051-0

Related URL: http://dx.doi.org/10.1186/s40649-018-0051-0

Abstract

Background In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. Methods This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. Results We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time.

Item Type:Article
Source:Copyright of this article belongs to Springer Nature.
ID Code:135151
Deposited On:19 Jan 2023 10:00
Last Modified:19 Jan 2023 10:00

Repository Staff Only: item control page