## Predicting Long Term Impact of Scientific Publications

Stegehuis, C.
(2014)
Abstract: | In this thesis, first the distribution of the total number of citations that articles receive is studied. A discretized lognormal distribution and a negative binomial distribution are compared. The discretized lognormal distribution fitted well onto articles in the scientific field of Physics that were published in 1984, whereas the negative binomial distribution did not. The tail of the distribution of the number of citations was also studied. This tail seemed to have a Pareto distribution, with tail index independent of the Impact Factor and the number of citations in the first year after publishing. Then, we propose a model to predict the quantiles of the distribution of the number of additional citations that scientific publications receive after the first year. We study three variants of the model: one uses only the Impact Factor as a covariate, one only the number of recent citations, and the last model uses both covariates. Quantile regression is used to fit the coefficients of the model. The model that uses both covariates fits the quantiles better than the other two variants. Then a well known estimator for the quantiles of a Pareto distribution is used to describe the coefficients of the quantile regression estimator for the high quantiles. Furthermore, confidence intervals for both estimators are given. The model that was proposed, predicted the quantiles of the distribution of the number of additional citations after the first year well. However, the model did not predict the quantiles correctly for groups of articles from the same country or university. We present a simple example to give a possible explanation for this phenomenon. |

Item Type: | Essay (Master) |

Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |

Subject: | 31 mathematics |

Programme: | Applied Mathematics MSc (60348) |

