PROPOSAL: Effect of Branding on Perceived Search Term Relevance: A Comparison of Google, Yahoo, and MSN APIs
I. Introduction
II. Prior Work
III. Process and Method
IV. User Audience
V. Project Goals
VI. Resources
VII. Constraints
VIII. IRB Review & Human Subjects Protection..
IX. Evaluation & Success Metrics
X. Sponsorship
XI. Learning Experiences
XII. Project Planning
XIII. Gantt Chart
XIV. References
Introduction
Over the past 10 years, there has been significant research into information retrieval behavior in an effort to understand how information consumers interact with websites based upon (among other things):
- personal preferences[1],
- perceived affordances[2],
- collocation of multiple partners[3],
- motivations[4],
- captions[5], and
- branding[6].
Branding, in particular, is a promising research track that looks at how information consumers choose preferred search engines based upon personal, salient factors like logo, font, and “look-and-feel” affordances that search engines use to distinguish one another.
The present work for this capstone project is important because only a handful of websites (e.g., Google, Yahoo, and MSN-Live) control more than 90% of the multi-billion dollar search market, with Google controlling more than 70% alone in the last quarter of 2008.[7] Added to the “centralizing” force given these websites is a sense of “authority” that arises from the arbitrary perceptions of consumers and may be unduly deserved.
With this in mind, our project aims to add to the literature by devising a search engine system that would generate anonymous search engine results and then have consumers re-evaluate and rank-order those results post and pre-test. This general idea has been conducted before, but not using re-ratings to validate results.[8] For purposes of this project, health subject inquiries would be made following a specific task scenario.[9] Using a health topic for search is particularly relevant because online health websites undergo more scrutiny that non-health websites by information consumers. [10],[11]
Two main research questions for this project are:
- Which search website has the most perceived “relevant” searches according to re-ranking by information consumers?
- Which website benefited the most from branding bias?
Prior Work
Several studies have before addressed the issue of branding and search satisfaction.
Peer-Reviewed/Published: A system has been devised where first search engine results page (SERP) from Google, Yahoo, and MSN-Live were compared to one another in four different types of searches: travel, health, real estate, and music.[12] In order to measure search branding effects, mock-up interfaces were constructed to imply that results were from one particular search site. The researchers found that, given the same set of search results, users still preferred Google over the other websites by 25%.
Unpublished: Bing Liu (Computer Science Department at the University of Illinois, Urbana-Campaign) found that 25 students preferred Google (against Yahoo and MSN-Live) for both informational and navigational queries; but the study did not take into account branding effects or familiarity.[13] A website has been built which takes results from Google, Yahoo, and MSN and shows then side-by-side, and then keeps track of whether the page was “good,” “bad,” or “neutral.” [14]
Some new work has begun to incorporate eye-tracking data as a means of reporting on search behavior and intentionality which may compliment “thinking aloud” and user pre/post interviews.[15],[16] There is a difference in regions of web pages that users view based on search term subject. [17] For example, users tend to look at the upper left-hand side of informational sites (implying the value of top search result positions), while scanning the whole left-hand side of transactional sites (using a more distributed approach). [18]
Process and Method
The purpose of this project is to report on whether there is an effect for branding when users are asked to perform a health-related search.[19] A set of about 20-30 unique student users will be recruited for a 45-minute session. A “moderator” will conduct each session, answer questions as they arise, administer a brief questionnaire, and make notes on the user performance while searching. The “Search setting” will be in an office space with no distractions with some type of broadband internet access.[20]
Process: Students will be read a short scenario (e.g., “flu shot”) which will require them to perform a specific search task (e.g., “find information about adverse vaccine reactions”). Users will input search queries into branded websites (Google, Yahoo, and MSN-Live). The first 10 dynamically-produced results will be displayed and users will be instructed to rank order results as they feel is the best order to represent search relevance (e.g., “Relevant,” “Not Relevant”) Each search result will be given a unique identifier. Next, the pool of up to 30 search results will be randomized and duplicates will be combined. Finally, users will input the same query into a generic interface that will be made to incorporate several prominent search affordances (e.g., a blank screen and search bar with no competing text or images). The first 10 results from the pool of combined results will be displayed, and users will comment on the relevance of the search results (e.g., “Relevant”, “Not relevant”).
Method (Tentative): A comparison measure will be made between initial relevancy ratings of branded search results and those of the unbranded search website. This method can be broadly described as a type of “manual judgment of results” which is described in the literature.[21]
User Audience
According to the most the Pew Online/American Life Project, more than 80% of internet users in 2006 looked for health related information at some time, with 60% of the searches related to a “specific disease or medical problem.”[22] An Doctors preferred Google more than 50% over other search brands, even though they believed too many results required sifting extraneous and erroneous (65%) results.[23],[24]
This project has wide audience, especially where information seeking exists for experts and non-experts (e.g., education information by teachers, finance information by accountants, travel searches by travel agents, etc.)
Project Goals
Because this is a research-oriented project, timing of goals is ordered such that there are dependencies that must be completed in order. Timeframes are from acceptance of this proposal. One aim of this project is ensure that further research agendas can be instantiated in the long-term.
Short Term Goals (0-1 months): Work with sponsor on experimental design. Begin to assess the current research in relation to the project and develop a working plan for implementing new extensions to existing coding libraries, methods, and publically-available APIs. Begin researching Google, Yahoo, and MSN-Live APIs and read white papers. Create a “technical deficiency” to inform the intermediate stage.
Intermediate Term Goals (1-3 months): Begin on setting up interview instruments. Advertise the project. Gather data.
Intermediate Term Goals (3-5 months): Analyze data and iterate on project steps (re-evaluate project planning). Make conclusions and summarize findings. Begin project finalization and promotion. Formulate publication plan.
End of Project Life Goals (5+ months): Package information for future reference. Finalize a transition document for future research. Write documentation for website and programming methods (Java/C++).
Project goals will support completion of our two central research questions:
- Which search website has the most perceived “relevant” searches according to re-ranking by information consumers?
- Which website benefited the most from branding bias?
Resources
This project requires a small set of manageable resources.
Time: Spring Quarter, 2009.
Money: No money has been allocated for this project, but there is a possibility that $300 could be granted from private donors. This money could compensate users for their participation in the study.
Technology: All software to be used in this project has already been purchased and loaded on the principal project laptop:
Microsoft Office Suite (Excel, Access, Project, Word)
Design: Adobe Dreamweaver CS3, Acrobat, Flash, Axure RP Pro 5
XML/Programming: Altova XMLSpy, Visual Studio IDE
Windows Vista SP2 (32-bit)
Team: Al Youngblood (Day MSIM Candidate)
Stakeholders: Users with a valid UWNETID, Google, Yahoo, and Microsoft
Constraints
The following are potential constraints that accompany this project.
Core Competencies Required: Particular APIs will require the following programming skills
- MSN-Live API (v.1.1b): Uses a SOAP (Simple Object Access Protocol) model specification and uses an XML wrapper to pass information. http://www.w3.org/TR/soap
- MSN-Live API (v.2): Extends the types of protocols used to include JSON (JavaScript Object Notation), XML (Extensible mark-up Language), in addition to SOAP. http://msdn.microsoft.com/en-us/library/dd251062.aspx
- Yahoo Search API: Uses REST (Representational State Transfer) with JSON or XML Protocols (http://developer.yahoo.com/common/json.html) In an effort to increase developer presence, Yahoo has released the BOSS (Build Your Own Search Service) to simplify modifications to Yahoo Search (http://developer.yahoo.com/search/boss)
- Google Search API: Uses REST (Representational State Transfer) using an AJAX (Asynchronous JavaScript and XML)protocol. http://code.google.com/apis/ajaxsearch.
Terms of Service by Search APIs: These terms will bind this project in perpetuity:
- Google “terms of service” were changed in Dec 2008 to make non-attribution to Google more difficult, and a search API key is recommended for this project. http://code.google.com/apis/ajaxsearch/terms.html
- Yahoo is governed by the BOSS policy which is flexible: http://info.yahoo.com/legal/us/yahoo/search/bosstos/bosstos-2317.html
- MSN-Live API is governed by an non-extensive use policy: http://tou.live.com
Ownership of Devised System: If any part of the system is patentable, then UW guidelines on patent ownership may apply.
Re-use of Code: This project intends on using “off the shelf” components for running and executing coding. Such components are more stable in the long-term and are easier to debug.
IRB Review & Human Subjects Protection
The chief investigator in this project has completed a UW required course on Protection of Human Subjects (Track: Non-biomedical, Staff-researcher in behavioral or social sciences) which was administered by the Collaborative Institutional Training Initiative (CITI) in summer of 2008. [25],[26]
Based on a cursory examination of the UW Information School Policy[27] this research may be “exempted” from IRB review, and it carries “minimal risk” to participants because it is of “Type 2” in nature, is “not damaging to financial standing, employability, or reputation,” and will not involve harm greater than what is “ordinarily encountered in daily life”:
Type 2: Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement), survey procedures, interview procedures or observation of public behavior, unless: (i) information obtained is recorded in such a manner that human subjects can be identified, directly or through identifiers linked to the subjects; and (ii) any disclosure of the human subjects' responses outside the research could reasonably place the subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing, employability, or reputation.
[…]
Minimal risk means that the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life or during the performance of routine physical or psychological examinations or tests.
There are six main questions which the UW Information School IRB would like addressed in consideration for application of IRB clearance:
- Have the risks to participants been minimized?
- Do the benefits to society outweigh the risks to the participants?
- Does the sampling technique place an undue burden on the participants?
- Are adequate procedures in place to ensure privacy and confidentiality?
- Has informed consent been sought and appropriately documented? Or are the proposed consent waivers appropriate?
- Are safeguards in place to protect vulnerable populations?
Generalization: Benefits to society outweigh the inconvenience of performing routine search queries. Learning about branding and search engine type will allow information consumers to make better choices about which search engine is best for them, and avoid trusting websites arbitrarily.
Unique identifiers: A Catalyst form will be produced that captures unique UWNETIDs. None of these identifiers will be shared outside of the study. Risks in the study have been minimized to ensure that participant identities are protected and access to information is password-protected.
Informed Consent: A Catalyst form will feature a section where users will need to acknowledge receipt of informed consent. The purpose of the study will be revealed to participants in advance.
No Vulnerable Populations: No vulnerable populations are used in this project.
Next Steps: This project will need further refinement to understand whether the need for IRB clearance is needed, or whether a clearance statement will suffice this burden. This project will defer to the sponsor for further instructions.
Evaluation & Success Metrics
The validation step should establish if project goals are met, whether the goals and purposes of IMT595 have been met, and whether the research questions have been answered. Two considerations govern these metrics.
- Internal Validity[28]: Internal validity refers to how well results allow for inferences to be made from causal relations. That is, does the independent variable really manipulate some change on the dependent variable? In our project, if we can establish that branding affected perceived search term relevance, then we can say that there is some relation which has some internal validity.
- A working search engine: The design, implementation, and documentation of a “brand-less” search engine which pulls results from Google, Yahoo, and MSN, and then randomizes the results, and finally passes the first 20 terms to the user.
Goals for the Capstone experience: The following goals underlie the IMT595 capstone course[29]. All should be met by this project:
- Student-defined Information Problem: This problem is to be defined through iterations, and upon approval will be due to part-student/part-faculty for its inception.
- Student-defined Method(s) of Investigation: A proposed experimental design will be laid out for the project before March 1, 2009.
- Synthesis of Human-Centered, Technical and Management Strands: This project is not simply an engineering project, but involves elements of user-centered design, information architecture (Information retrieval), and project management to run a study.
- Make a Difference: This will be a new study which may be publishable.
- Passion: Search term branding incorporates my training in sociology and computer science.
Sponsorship
---------- Forwarded message ----------
From: xxx xxxxx <xxxxx@u.washington.edu>
Date: Thu, Feb 5, 2009 at 10:08 PM
Subject: RE: I want to work on a capstone with you
To: Al Youngblood <jalisco@u.washington.edu>
Cc: xxx xxxxx <xxxxx@u.washington.edu>
Al hi,
[omitted for privacy reasons]
Cheers
xxxxx
Learning Experiences
- Learn Google, Yahoo, and MSN-Live APIs. This will entail understanding programming frameworks, procedures for calling particular queries, and back-end storage and retrieval.
- Learn about perception and cognition in performance of web searches, and how branding affects user beliefs about search result relevance.
- Learn about IRB procedures and handling of human subjects protections.
- Learn about current information retrieval theory and how search engines are made with different audiences in mind. Understanding subtle “give and takes” of search engine branding, and how branding is a salient factor in other online discourses, such as advertising and multimedia artifacts.
- Get an opportunity to conduct a research study from scratch. Learn about using methodologies in a research setting.
Project Planning
This project is scheduled for the spring quarter of 2009. Preliminary research will need to be conducted during the winter quarter of 2009. A more extensive Gantt chart is found in the following section.
Phase One (February-March): Finalization the experimental design. Evaluation of technical necessity. Finalization of IRB. Working set of DB and APIs. Finalization of design and construction of search engine.
Phase Two (April-May): Data collection.
Phase Three (May-June): Data Analysis and organization. Summary of findings, and suggestions for future work. Presentations.
Gantt Chart
[See attached PDF]
References
[1] Park, S., A. Harada, and H. Igarashi (2007). "Influences of personal preference on product usability," In Proceedings of CHI '06 Extended Abstracts, pp. 87 - 92.
[2] Wildemouth, B. and Carter, A. (2002). “Perceived Affordances of Web Search Engines: A Comparative Analysis” University of North Carolina, Chapel Hill, School of Information and Library Science, Technical Report 2002-02.
[3] Morris, M.R. (2008) “A Survey of Collaborative Web Search Practices.” In Proceedings of CHI 2008, pp.1657-1660.
[4] Zerba, A. (2000). “Perceived motives for clicking on multimedia features on news web sites: an exploratory study” Thesis, University of Florida, School of Mass Communication.
[5] Clarke, C. L., Agichtein, E., Dumais, S., and White, R. W. (2007). “The influence of caption features on clickthrough patterns in web search.” In Proceedings SIGIR '07. ACM, New York, NY, pp.135-142.
[6] Jansen, B. J., Zhang, M., and Zhang, Y. (2007). “The effect of brand awareness on the evaluation of search engine results.” In CHI '07 Extended Abstracts, pp. 2471-2476.
[7] http://www.seoconsultants.com/search-engines
[8] http://www.cs.uic.edu/~liub/searchEval/SearchEngineEvaluation.htm
[9] Borlund, Pia (2003). "The IIR evaluation model: a framework for evaluation of interactive information retrieval systems" Information Research, 8(3), Paper no. 152 [Available at: http://informationr.net/ir/8-3/paper152.html]
[10] Luo, W. and Najdawi, M. (2004). “Trust-building measures: a review of consumer health portals.” Communications of the ACM 47, 1 (Jan), pp.108-113.
[11] Sillence, E., Briggs, P., Harris, P., and Fishwick, L. (2006). “A framework for understanding trust factors in web-based health advice.” Int. J. Hum.-Comput. Stud. 64(8), pp.697-713.
[12] Jansen, B. J., Zhang, M., and Zhang, Y. (2007). “The effect of brand awareness on the evaluation of search engine results.” In CHI '07 Extended Abstracts, pp. 2471-2476.
[13] http://www.cs.uic.edu/~liub/searchEval/SearchEngineEvaluation.htm
[14] http://www.searchrater.com
[15] Buscher, G., Cutrell, E., and Morris, M.R. (2009) “What Do You See When You're Surfing? Using Eye Tracking to Predict Salient Regions of Web Pages.” Proceedings of CHI 2009, in press.
[16] Guan, Z. and Cutrell, E. (2007). “An eye tracking study of the effect of target rank on web search” In Proceedings of CHI '07. pp.417-420.
[17] Cutrell, E. and Guan, Z. (2007). “What are you looking for?: an eye-tracking study of information usage in web search.” In Proceedings CHI '07. pp.407-416.
[18] Gisbergen, M., Van der Most, J., Aelen, P. (2005) “Visual attention to Online Search Engine
Results” Technical White Paper.
[19] This section summarizes possible methods discussed with the capstone advisor. A specific description of experimental design will be determined before March 1, 2009. See Gantt Chart for more details.
[20] This might be in the LUTE Lab at the Sieg Hall (Dept. of Technical Communication).
[21] Thomas, P. and Hawking, D. (2006). “Evaluation by comparing result sets in context.” In Proceedings of CIKM '06, pp.94-101.
[22] http://www.pewinternet.org/pdfs/PIP_Online_Health_2006.pdf
[23] http://www.imediaconnection.com/content/15028.asp
[24] http://www.searchmedica.com
[25]http://www.alyoungblood.com/files/research-certificate.jpg
[26] He is currently conducting funded research with Dr. Sharon Oviatt and has some familiarity with UW IRB and Western IRB (WIRB) clearance procedures and protocols for a full-panel review
[27] http://www.ischool.washington.edu/msim/docs/human%20subjects%20research%20policies.pdf
[28] McGrath, J.E. (1995) “Methodology matters: Doing research in the behavioral and social sciences.” In Baecker, Grudin, Buxton, & Greenberg (eds.). Readings in human-computer interaction: Toward the year 2000 (2nd). San Francisco: Morgan Kaufmann, pp. 152-169.
[29] http://www.ischool.washington.edu/msim/capstone/msim_timeline.aspx
Virtual Representation of Me!

Contact Me
My email address is if you want to contact me. I am usually available via email on a regular basis.