You Are What You Post

Online personality algorithms put astrological profiles to shame, but UCSC psychologists are raising questions about sharing personal data

“You can be perceived as compulsive,” it began. Not flattering, I thought, but it’s possible. “You are consistent; you enjoy familiar routines. You are motivated to seek out experiences that provide a strong feeling of self-expression.”

These observations didn’t come from an astrologer. In fact, they didn’t even come from a person. I was reading my “hyper-personal” profile, a description of my personality traits and values created solely from what I’ve posted online.

Imagine collecting your tweets and Facebook posts and reading them like a stranger would for the first time. What would your online words reveal? A few years ago, not much. But now, researchers and marketers have invented algorithms designed to decipher users’ personalities based on what they post. The technique can make these profiles for anyone who uses social media.

These tools are a lot like classic personality tests. But instead of drawing from answers on a questionnaire, the computer churns out profiles based on what users have written for the cybersphere to see. How we choose words reflects our thoughts, feelings, motivations, and behaviors—even certain categories of words reveal a lot about personality. The algorithms sift through a user’s online activity and place words in different bins. They infer how extroverted or neurotic the user might be. They deduce whether excitement or obedience motivates him or her through life. These systems are primitive, but they systematically distill our humanity—and they don’t need to know where the planets were when we were born.

However, these algorithms are hidden from nearly all of us. To understand how people react to this unfamiliar technology, psychology graduate student Jeffrey Warshaw and his colleagues at UCSC, IBM and Google recently tested how volunteers responded to seeing and sharing their hyper-personal profiles. The researchers published their results in advance of the Computer Human Interactions (CHI) 2015 Conference held in April in Seoul, South Korea.

For the study, Warshaw recruited 18 volunteers from a Bay Area business. He gave them access to an iPad app that generated their personality profile using their Facebook posts or Twitter tweets. Then he interviewed each person to explore this question: If you give people full power over their profiles, how will they choose to use or share them, and why?

“Our role,” says Warshaw, “was to make [a hyper-personal profile] more understandable, then to see where people would want to share it.”

He found that most volunteers felt apprehensive about the technology, but many shared their profiles anyway. In general, people felt they had little control over their own data.

A New Kind of Data

One-quarter of the world’s citizens use social media. In early 2015, Twitter had 236 million active users. Facebook had 1.44 billion. The world generated 98,000 tweets and 695,000 Facebook posts every minute in 2012, and the numbers keep rising. Social media is like a “firehose” spewing an unfathomable amount of data, says Warshaw.

This torrent is so relentless that social media has become a source of “big data”—large, complex and continuous sets of information, like the DNA sequence of a species or every purchase ever made at Wal-Mart. Until recently, computers lacked the processing capabilities to analyze anything at the scale of social media. The firehose of tweets and posts overwhelms a regular computer’s processing capabilities. It’s like an ant trying to fathom a skyscraper; the difference in scale is too large.

But with faster computers and specialized machine-learning software, our understanding of the world can transcend such barriers of scale. Some insights from social media are not what we might have guessed. For instance, one study examined what people “like” on Facebook and their IQs. The strongest social media indicator of a person’s intelligence, the results showed, is whether he or she likes curly fries.

The algorithm in Warshaw’s study drew from two well-known models in psychology, called “Big-5 Personality” and “Schwartz’s Basic Human Values.” The Big-5 traits are openness, conscientiousness, extroversion, agreeableness, and neuroticism. Schwartz’s Values “describe the beliefs and motivations that guide a person throughout their life,” as the team’s paper explains. Five values make up this model: self-transcendence, openness to change, conservation, hedonism, and self-enhancement.

The algorithm links specific categories of words to different dimensions of personality and values, based on findings from previous research. The system scores each person for each trait, compares their score to the rest of the population, and assigns a percentile rank between 0 and 100. The language used in these two models is clear to psychologists, but ambiguous to the rest of us. To make profiles easier to interpret, the UCSC team presented a paragraph summary of each participant’s most defining traits.

To companies, information like this equals dollar signs. It’s no secret that they track consumer behavior. For example, Amazon recommends products based on previous online purchases—not just from Amazon, but from any retailer the company has data about.

There has been a shift, however. Until now, researchers and companies have focused on user behavior. Now they want to understand us more deeply by probing our personalities.

“It might affect the ways things are sold to you, not just what’s sold to you,” says UCSC psychologist Steve Whittaker, co-author of the study.

Employers are also intrigued by this technology. What traits or tendencies might they glean outside of an interview?

Uncanny Accuracy

When I used the same algorithm to generate my own profile, I fed it posts from my old Facebook account. I was impressed by the algorithm’s accuracy, especially since it saw only the silly things I had written in high school. Before seeing my own profile, I felt this technology could be as esoteric as an astrology reading. But it was surprisingly straightforward.

The study’s 18 volunteers also acknowledged the system’s accuracy. When Warshaw asked them if the algorithm did a good job of capturing their personalities, all but one participant agreed it did. Some people used only professional accounts. Others rarely posted on social media. Still, their profile results were uncannily accurate.

“I don’t know how it would derive that from the limited number of tweets that I made,” one participant said. “I guess I’m a little shocked that it works so well.”

Then, the team randomly presented several hypothetical scenarios to each participant. In each scenario, the volunteers could choose whether to share their profiles. The incentive to share ranged from getting an online shopping discount to being matched with professional mentors.

More than half of the participants shared their profiles in each situation. Thirteen out of 14 people shared them for the reward of recommendations about local events. Fewer volunteers—just 10 out of 17—shared their profiles in a mock job application.

The team also gave participants the choice to post their computer-generated profiles on their real social media accounts. Just over half of them decided to do this.

The next step was to understand wh
y participants decided to share or not to share. The volunteers perceived several risks. For instance, they feared being pre-judged by employers. What if an employer decided not to interview them based on their profile?

“People don’t feel good about that at all,” Warshaw says.

The realities of data sharing and privacy unnerved the group. Companies already take our data without asking, Warshaw says. “This technology is out there, and some versions aren’t requiring user consent,” he notes.

For example, a study by High-Tech Bridge, an information security company, revealed that both Facebook and Google+ “click” on links found in users’ private messages. Soon after, two Facebook users filed a class action lawsuit against Facebook. Campbell et. al v. Facebook alleged that the company was reading private messages, not to search for scams or spam, but to collect valuable data about users. Any link found in a private message would be counted as a “like,” information useful for developing better advertisements. The case is ongoing.

Demanding Privacy

Not all reactions from participants in the study were negative. People saw value in the event recommendation scenario, since people with similar personalities might give better tips about upcoming happenings. Subjects also liked the idea of adding their profile to résumés. “Someone like me has a lot of between-the-lines interpersonal skills that are hard to build into a résumé,” said one volunteer.

Also, people simply enjoyed reading the profiles. “What was really striking about this research was just how captivated and intrigued people were with this information about them,” says Warshaw. They felt like they were learning about themselves—or at least how they appear to others, he says.

Although participants saw the technology’s benefits, they still felt uncomfortable about sharing. But if most people didn’t want to share their profiles, why did more than half of them share in every scenario? Whittaker sees the paradox: “One sad thing about the study is that it shows people don’t seem to believe they have a lot of control.” Participants feared that in real life, their information would be used regardless of their consent. So in their minds, there was no use in trying to keep it private.

Aside from privacy, people also worried about not sharing. “Non-sharing is interpreted as hiding terrible information, pressuring non-sharers to share against their wishes,” the team writes. This phenomenon is known as the “unraveling effect.” “If they know you decline, that’s more of a red flag to them,” said one participant.

Despite these attitudes, Warshaw and Whittaker hope social media users eventually feel empowered enough to demand privacy and consent. An underlying purpose of the study was to “draw people’s attention and shock people,” says Whittaker. “I think these systems will continue to be deployed, but if papers like ours have an impact on people’s consciousness, it will lead them to be more careful.”

This trend already is taking root. In 2007, just 20 percent of Facebook profiles were private. Now, 70 percent are private. “I think companies see a role for these kinds of analytics in employment situations and market research,” says Warshaw. “Right now, companies assume they can get data without consent. But now that people are going more private, eventually that won’t be viable.”

The researchers appreciate the strangeness of using computers to condense our characters into neat packages of words. “Granting the algorithm this level of ‘humanity’ simultaneously reduces our humanity by supplanting people as the sole judges of character,” the team writes. “This result raises the ethical question: Should an algorithm judge character?”

Indeed, should it? “People aren’t perfect, and systems aren’t perfect,” Warshaw says. “We’re still at the point where it might be better for a person to be wrong than for an algorithm to be wrong.”