Personalizing ChatGPT's Behavior and Tone — Stephanie K Tong

Personalizing ChatGPT’s Behavior and Tone

Client Problem Statement

Model behavior—how ChatGPT responds to user requests—directly affects user satisfaction. There are two primary ways to improve model fine-tuning:

Designing the model to produce optimal responses for all the ways people use ChatGPT.
Allowing users to explicitly specify how they want responses to be phrased and formatted.

This case study focuses on work conducted for OpenAI’s Model Behavior and Personalization teams, responsible for model design and user-driven response personalization.

Objectives + Structure

Project Objectives

Evaluate usability and appeal of two personalization concepts for the "Custom Instructions" feature.
Understand competitive positioning by comparing user perception of ChatGPT’s tone of voice with Anthropic Claude’s.
Determine the ideal tone of voice users prefer for "Jobs To Be Done" (JTBD) where tone is more important.

Research phases

“Custom Instructions” concept testing
Tone of voice audit comparing ChatGPT and Claude, focusing on user preferences and ideal tonal characteristics.

High Level Timeline + Research Tools

Who I worked with

Design

Product Designer: prototypes for concept testing
Content Designer: prototype design, behavioral “traits” to test

Model Behavior

Head of Model Behavior
Model Designer
Program Manager / Product Ops

Data Science

Identifying data sources to understand top JTBD / use cases

Informed

Product Managers, AI Researchers, Global Affairs

Research toolkit

UserTesting: Large-scale participant recruitment and task completion.

Figma: For prototype review and product design collaboration.

Google Sheets: Quantitative coding and analysis.

ChatGPT: Data visualization and summarization.

Custom Instructions

Problem statement

The "Custom Instructions" feature in ChatGPT enables ChatGPT users to tailor responses to their ideal output. However, it was buried in settings and the “blank page” nature may be intimidating for less experienced users. Additionally, the public perceives AI chatbots as potentially biased, raising questions about tone neutrality and inclusivity.

The team aimed to improve user experience by refining ChatGPT’s personalization tools to cater to diverse audiences, from casual users to power users, across the political spectrum.

The current version of ChatGPT’s “Custom Instructions” feature

Research Questions

Which one of two Custom Instructions concepts is more intuitive for users to navigate and understand?
What preset traits should be included to appeal to different demographic groups?

Methodology

Participants: 16 users, heavy and occasional ChatGPT users, representing a mix of left- and right-leaning political ideologies.

Activities: In unmoderated User Testing sessions, participants tested two lightweight prototypes and evaluated a list of predefined traits, responding to prompts such as:
- What do you use ChatGPT for? In the past, have you had issues with the response you receive from ChatGPT?
- In this prototype, you see a list of “traits".Which traits stand out to you and why?
- Based on how you use ChatGPT, are there any “traits” missing?”
Analysis: Data was analyzed through qualitative coding and sentiment analysis to identify user preferences for traits.

Key Findings + Deliverables

Trait Analysis

By “trait,” positive or negative sentiment based on user political ideology and ChatGPT use case.

Design Strengths and Recommendations

Presented design recommendations that combined the strengths of both concepts,

Business Outcomes

The final design was shipped as an experiment in December 2024 and launched to the general public in early January 2025, receiving widespread media coverage.
- During experiment, the team saw relatively high engagement for the “traits” buttons
- Pre-determined traits allow OpenAI to leverage the strengths of specific models available to the public.
Findings from this research were also cross-socialized to our Government Affairs + Model Behavior teams, and incorporated into discussion guides for external listening sessions with political organizations.

Press release from TechCrunch (1/17/25): “ChatGPT’s newest feature lets users assign it traits like ‘chatty’ and ‘Gen Z’”

Competitive Tone of Voice Audit

Problem statement

Many AI chatbots are available to the public, and user preferences often depend on the individual or their specific Job to Be Done (JTBD).

While the team had hypotheses about why some users might prefer competing AI models, such as Anthropic Claude, over ChatGPT, they wanted real user feedback to understand how ChatGPT’s “tone of voice” influences its suitability for different use cases.

Research Questions

How do users describe the tone of voice for ChatGPT and Claude? When and why do they choose one over the other?
What tone of voice resonates most with users for key JTBDs?

Methodology

Literature review

Conducted a literature review of 50+ sources to analyze tone descriptors.
Triangulated findings from Product Operation’s “Social Listening” Program, where they were tracking online chatter.

Unmoderated comparative study

Compared ChatGPT and Claude outputs using predefined prompts, with feedback from 40 participants across 5 JTBDs.

Participants were identified through UserTesting, and screened for

User type: Power User or Occasional ChatGPT User
JTBD: In what ways do they use ChatGPT

Activities

A default prompt is presented. Participants were asked to describe what the ideal response would look like for this prompt.
The prompt is entered into ChatGPT and Claude (order randomized), and participants were encouraged to interact with follow-up prompts. The output is reviewed and participants were asked to verbally describe the tone of voice of the response they received from ChatGPT and Claude.
Participants also described tone of voice by multi-selecting from a pre-defined of descriptors identified through the Literature Review
Participants were asked to share which response they preferred and why.
Repeat Steps 1-4 for a 2nd prompt.

Key Findings + Deliverables

Preference factors outside of tone

Are there other factors that played into users’ preferences between ChatGPT and Claude?

ChatGPT and Claude’s Tone of Voice

What types of words are people using to describe ChatGPT and Claude’s tone of voice. How does tone of voice change based on JTBD?

Artifacts

*An overview of how tone of voice differs across ChatGPT and Claude by JTBD*

*Example slide output in final report - this image is populated with dummy data.*

Business Outcomes

Identified overarching themes across JTBDs and provided recommendations to refine tone presets for each. The final report was widely shared across OpenAI.
Contributed directly to the Model Behavior team’s model refinement released February 2025 and subsequent launch cycles.
Developed a scalable, repeatable framework to measure model improvements through user feedback in future iteration cycles.

Reflection

What worked well

Planning Personalization & Model Behavior work in similar timeframes provided mutually beneficial insights across two teams, allowing designers to connect the dots in what’s going on in other teams. Engaging cross-functional teams during study design enhanced alignment between UX, Product, and Global Affairs goals.
During “Custom Instructions” study, sharing interim insights through Slack accelerated decision-making.
Presented “Tone of Voice” findings in OpenAI UXR’s first-ever standalone readout. Since the study was commissioned by Model Behavior, the readout was a good opportunity for Product Managers to connect findings to product design and strategy.

What could be improved

Greater inclusion of quantitative methods could bolster user preference data and increase confidence in a heavily data-driven team.
Alignment on UXR study “prompts” with the prompts used for model training would allow AI Researchers to more successfully merge the qualitative findings with other research and experiments they’re owning.

Future research opportunities

Tone of voice exploration in multimodal inputs and outputs, specifically Voice.

Quantitative follow-up to explore tone of voice at scale internationally.