The Use of Google Trends in Health Care Research: A Systematic Review
Google Trends is a novel, freely accessible tool that allows users to interact with Internet search data, which may provide deep insights into population behavior and health-related phenomena. However, there is limited knowledge about its potential uses and limitations. We therefore systematically reviewed health care literature using Google Trends to classify articles by topic and study aim; evaluate the methodology and validation of the tool; and address limitations for its use in research.
Methods and Findings
PRISMA guidelines were followed. Two independent reviewers systematically identified studies utilizing Google Trends for health care research from MEDLINE and PubMed. Seventy studies met our inclusion criteria. Google Trends publications increased seven-fold from 2009 to 2013. Studies were classified into four topic domains: infectious disease (27% of articles), mental health and substance use (24%), other non-communicable diseases (16%), and general population behavior (33%). By use, 27% of articles utilized Google Trends for casual inference, 39% for description, and 34% for surveillance. Among surveillance studies, 92% were validated against a reference standard data source, and 80% of studies using correlation had a correlation statistic ≥0.70. Overall, 67% of articles provided a rationale for their search input. However, only 7% of articles were reproducible based on complete documentation of search strategy. We present a checklist to facilitate appropriate methodological documentation for future studies. A limitation of the study is the challenge of classifying heterogeneous studies utilizing a novel data source.
Google Trends is being used to study health phenomena in a variety of topic domains in myriad ways. However, poor documentation of methods precludes the reproducibility of the findings. Such documentation would enable other researchers to determine the consistency of results provided by Google Trends for a well-specified query over time. Furthermore, greater transparency can improve its reliability as a research tool.
New tools are emerging to facilitate health care research in the Big Data era. One form of Big Data is that which accumulates in the course of Internet search activities. Internet search data may provide valuable insights into patterns of disease and population behavior. In fact, the Institute of Medicine recognizes that the application of Internet data in health care research holds promise and may “complement and extend the data foundations that presently exist”. An early and well-known example of utilizing Internet data in health has been the surveillance of influenza outbreaks with comparable accuracy to traditional methodologies..
One tool that allows users to interact with Internet search data is Google Trends, a free, publically accessible online portal of Google Inc. Google Trends analyzes a portion of the three billion daily Google Search searches and provides data on geospatial and temporal patterns in search volumes for user-specified terms.Google Trends has been used in many research publications, but the range of applications and methods employed have not been reviewed. Furthermore, there are no guidance or agreed standards for the appropriate use of this tool. A critical appraisal of the existing literature would increase awareness of its potential uses in health care research and facilitate a better understanding of its strengths and weaknesses as a research tool.
Accordingly, we performed a systematic review of the health care literature using Google Trends. To characterize how researchers are using Google Trends, we classified studies by topic domain and study aim. We conducted a subanalysis of surveillance studies to further detail their methods and approach to validation. We also assessed the reproducibility of methods and created a checklist for investigators to improve the quality of studies using Google Trends. Finally, we address general limitations in using Google Trends for health care research.
Overview of Google Trends
Google Trends provides access to Internet search patterns by analyzing a portion of all web queries on the Google Search website and other affiliated Google sites. A description of the user interface is shown inFigure S1. Users are able to download the output of their searches to conduct further analyses.
The portal determines the proportion of searches for a user-specified term among all searches performed on Google Search. It then provides a relative search volume (RSV), which is the query share of a particular term for a given location and time period, normalized by the highest query share of that term over the time-series.,  The user can specify the geographic area to study, whether a city, country, or the world; data is available for all countries worldwide. Furthermore, the user can choose a time period to study, ranging from January 2004 to present, divided by months or days. The user is also able to compare the RSV of up to five different search terms or the RSV of a particular search term between geographic areas and between time periods. In addition, the user can choose from 25 specific topic categories to restrict the search, each with multiple sub categories for >300 choices in total, such as “Health → Mental Health → Depression”.
With respect to search input, multiple terms could be searched in combination with “+” signs and terms can be excluded with “-” signs. Quotations can be used to specify exact search phrases..
The review was conducted in accordance with PRISMA guidelines. We included all studies that used Google Trends to answer research questions within the domain of health care. After an initial review, we included letters because they contained substantial original content. We also included studies using Google Insights for Search, a similar tool to Google Trends that was merged into Google Trends in 2012 (hereafter we will refer to studies using Google Insights for Search as using Google Trends for ease of reading).
We excluded studies that primarily focused on Google Flu Trends, a separate tool to specifically track seasonal variation in influenza trends. This tool is distinct from Google Trends and is therefore beyond the scope of this review. We also excluded articles that had no substantial use of Google Trends.
We identified relevant studies by searching Ovid MEDLINE (from inception to January 3, 2014) using a comprehensive search strategy. The list of subheadings (MeSH) and text words used in the search strategy for MEDLINE can be found in Appendix S1. We only included studies of humans written in the English language, and identified 1249 potential articles for inclusion. Since PubMed contains articles from life science journals in addition to articles indexed in MEDLINE, we conducted a search of PubMed (from inception to January 3, 2014) using a similar search strategy, but excluding the articles already identified from MEDLINE. This search identified an additional 871 potential articles, for a total of 2120 potential articles.
Two reviewers (S.V.N. and K.M.) independently reviewed the titles and abstracts of retrieved publications, and 92 articles met our inclusion criteria for full text review. We then excluded 25 studies that did not utilize Google Trends or that met at least one of our exclusion criteria (See Figure 1). We also included 3 articles found from the review of references.