论文标题
采样算法,从调查采样到蒙特卡洛方法:教程和文献综述
Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review
论文作者
论文摘要
本文是有关采样算法的教程和文献综述。我们在统计数据中有两种主要的抽样类型。第一种是调查抽样,该抽样从集合或人群中绘制样本。第二种类型是从概率分布中取样,我们的概率密度或质量函数。在本文中,我们涵盖了两种样本。首先,我们回顾一些要求的一些必需的背景,均方根误差,差异,偏差,最大似然估计,伯努利,二项式和高几何分布,Horvitz-Thompson估计器和Markov属性。然后,我们解释了简单的随机抽样,自举,分层抽样和群集抽样的理论。我们还简要介绍了多阶段抽样,网络采样和雪球采样。之后,我们从分布中切换为采样。我们解释了累积分布函数,蒙特卡洛近似,简单的蒙特卡洛方法和马尔可夫链蒙特卡洛(MCMC)方法的采样。对于简单的蒙特卡洛方法,其迭代是独立的,我们涵盖了重要性抽样和拒绝采样。对于MCMC方法,我们介绍大都市算法,大都市 - 危机算法,吉布斯采样和切片采样。然后,我们解释了蒙特卡洛方法和更有效的蒙特卡洛方法的随机行走行为,包括哈密顿(或混合)蒙特卡洛,阿德勒的过度递延,并有序过分递延。最后,我们总结了彼此相比采样方法的特征,优缺点。本文对于统计,机器学习,增强学习和计算物理学的不同领域可能很有用。
This paper is a tutorial and literature review on sampling algorithms. We have two main types of sampling in statistics. The first type is survey sampling which draws samples from a set or population. The second type is sampling from probability distribution where we have a probability density or mass function. In this paper, we cover both types of sampling. First, we review some required background on mean squared error, variance, bias, maximum likelihood estimation, Bernoulli, Binomial, and Hypergeometric distributions, the Horvitz-Thompson estimator, and the Markov property. Then, we explain the theory of simple random sampling, bootstrapping, stratified sampling, and cluster sampling. We also briefly introduce multistage sampling, network sampling, and snowball sampling. Afterwards, we switch to sampling from distribution. We explain sampling from cumulative distribution function, Monte Carlo approximation, simple Monte Carlo methods, and Markov Chain Monte Carlo (MCMC) methods. For simple Monte Carlo methods, whose iterations are independent, we cover importance sampling and rejection sampling. For MCMC methods, we cover Metropolis algorithm, Metropolis-Hastings algorithm, Gibbs sampling, and slice sampling. Then, we explain the random walk behaviour of Monte Carlo methods and more efficient Monte Carlo methods, including Hamiltonian (or hybrid) Monte Carlo, Adler's overrelaxation, and ordered overrelaxation. Finally, we summarize the characteristics, pros, and cons of sampling methods compared to each other. This paper can be useful for different fields of statistics, machine learning, reinforcement learning, and computational physics.