论文标题

使用搜索日志对现场软件异常的实证研究

An Empirical Study of Software Exceptions in the Field using Search Logs

论文作者

Hassan, Foyzul, Bansal, Chetan, Nagappan, Nachiappan, Zimmermann, Thomas, Awadallah, Ahmed Hassan

论文摘要

软件工程师使用Web搜索花费大量时间来完成软件工程任务。此类搜索任务包括查找代码段,API文档,寻求调试方面的帮助等。在调试错误或崩溃时,软件工程师的常见实践之一是搜索有关Internet上关联的错误或异常痕迹的信息。 在本文中,我们分析了来自领先的商业通用搜索引擎(GPSE)的查询日志,例如Google,Yahoo!或Bing进行大规模的软件例外研究。据我们所知,这是第一次分析如何使用Web搜索来查找异常信息的大型研究。我们从50亿个网络搜索查询的随机样本中分析了大约100万个与例外相关的搜索查询。为了从非结构化查询文本中提取例外,我们建立了一个新颖的高性能机器学习模型,F1得分为0.82。使用机器学习模型,我们从原始查询中提取了异常,并进行了受欢迎程度,努力,成功,查询特征和Web域分析。我们还进行了针对语言的分析,以更好地了解异常搜索行为。这些技术可以帮助改善现有方法,文档和工具,以进行异常分析和预测。此外,可以针对API,框架等应用类似的技术。

Software engineers spend a substantial amount of time using Web search to accomplish software engineering tasks. Such search tasks include finding code snippets, API documentation, seeking help with debugging, etc. While debugging a bug or crash, one of the common practices of software engineers is to search for information about the associated error or exception traces on the internet. In this paper, we analyze query logs from a leading commercial general-purpose search engine (GPSE) such as Google, Yahoo! or Bing to carry out a large scale study of software exceptions. To the best of our knowledge, this is the first large scale study to analyze how Web search is used to find information about exceptions. We analyzed about 1 million exception related search queries from a random sample of 5 billion web search queries. To extract exceptions from unstructured query text, we built a novel and high-performance machine learning model with a F1-score of 0.82. Using the machine learning model, we extracted exceptions from raw queries and performed popularity, effort, success, query characteristic and web domain analysis. We also performed programming language-specific analysis to give a better view of the exception search behavior. These techniques can help improve existing methods, documentation and tools for exception analysis and prediction. Further, similar techniques can be applied for APIs, frameworks, etc.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源