Healing Social Platforms with Martin Saveski

13 min readMar 8, 2023

Social science has been revolutionized by advanced computational methods that have enabled analysis of data and phenomena that were previously impossible to study. Stanford has long been a hotbed for research in this area; this quarter, Professor Diyi Yang introduced a course on computational social science, CS224. In this blog post, I’ll be recapping a particularly exciting class discussion we had on February 28th.

Last week, we had the pleasure of hearing from Martin Saveski, an incoming assistant professor at the University of Washington and expert in computational social science. Before joining Stanford as a postdoctoral scholar, he received his Ph.D. from MIT. Martin’s work broadly focuses on leveraging data science to create healthier social platforms.

Over the course of his lecture, Martin provided an overview of this space in academia writ large, offered a window into his own research agenda, and dove deep into an exciting example of his work. I’ll address each of these themes, along with some of the insights that arose from questions posed by students. I’ll conclude with a summary that emphasizes the key takeaways from the discussion.

Martin’s Research Agenda and the State of Computation Social Science

Overview and Themes

We began by discussing Martin’s broader research agenda, which is focused on understanding and improving the ways that social networks bring us together. As Martin noted, much of our social interactions occur on digital platforms. We don’t read newspapers; we scroll on Twitter or Facebook. We increasingly use instant message services like WhatsApp or Discord instead of calling. We turn to job networks like LinkedIn and Indeed instead of periodicals. These behaviors are not only new, but incredibly important to understand as they facilitate the ways we understand and work with one another. While social media platforms claim to foster communities and connect individuals, a large body of research has discovered that there are many dangerous aspects to these platforms as well.

To draw on just a few examples that we have studied in CS224 itself, Kramer, Guillory, and Hancock (2009) document evidence of emotional contagion on Facebook; that is, when people observe negative (or positive) posts on the platform, their own posts begin to become more negative (or positive) as well. Meanwhile, Burke, Cheng, and de Gant (2020) demonstrate that people who spend more time on Facebook are more likely to experience social comparison; in particular seeing happy posts from friends or seeing others receive many likes, comments, etc. all correlated with more unhappiness for users. Finally, Vosoughi, Roy, and Aral (2018) find that fake news is far more contagious on Twitter than true news. Specifically, fake news reaches orders of magnitude more people and diffuses much faster than real news.

While researchers continue to document the effects of social media, the growing literature about the potential harms seems clear: for various reasons related to human behavior and the companies themselves, social platforms may not bring out the best in us. Martin’s research focuses on these questions, but also asks where we go from here. What solutions, features, or dynamics can be exploited to change social platforms for the better? Martin’s work has studied the social structure and behavior of social platform users, designing new features to address challenges, and developing the methodological tools necessary to understand massive online social structures. Thematically, Martin’s research agenda can be broken down into three major areas of study: political polarization, online conversations, and casual inference in social systems. His work on political polarization has led him to collaborate with media organizations and social media platforms to understand what causes people to avoid information or articles that challenge their beliefs. I’ll discuss one such paper later.

Online Conversations

While this blog post will highlight Martin’s work on political polarization, I did want to spend some time discussing other aspects of his research. Martin’s research on online conversations has studied the features of people who act as catalysts, sparking conversation with other individuals. Interestingly, this research found that just the 1% of the top catalysts account for 31% of catalyzed interactions, despite having similar network characteristics to the rest of the user base, meaning they are not necessarily highly-followed influencers. That study analyzed Facebook conversations and comments, clearly linking to real social platforms and their design. Additionally, Martin has studied toxic conversations, finding that the conversational structure of a tweet is highly predictive of whether or not it receives a toxic reply. In both of these cases, we can see how this work could provide key insights for designing healthier social platforms. For example, if we can recognize who is sparking conversations online, we can better target policies, such as pre-bunking, to ensure that the interactions they catalyze are factual and based in fact. Or, if we can identify conversations that are very likely to become highly toxic, we might be able to intervene or educate users about their language. We might also take the results of these two papers together to consider which social catalysts are most likely to generate toxic conversations.

The Development of Computational Social Science

Martin also offered some general thoughts on the factors that make this an exciting area for research. He noted that digital platforms make it possible to access seemingly-endless amounts of data that are incredibly fine-grained and often at the individual level. Additionally, we can observe information from micro-interactions that we would never have been able to measure otherwise, such as time spent viewing an advertisement, amount of time scrolling, times one visits a friend’s profile. Martin offered a really great example, which I repeat below, of just how different these types of analyses are today than they were in the some of the first network and information studies. Below, the left image is from a study by Leon Festinger, a social psychologist, who looked at how misinformation moved between people in his neighborhood. Festinger started a rumor, then talked with people to figure out how it spread. On the right, we have a modern example of a network graph, which depicts connections between IP addresses. Clearly, we have come a long way and developed far more advanced techniques than were available in previous decades.

Left: Festinger’s diagram of information networks based on the spread of rumors. Right: A network diagram of IP Addresses created by opte.org.

Moreover, social platforms are highly malleable and focused on influencing human behavior, making them good places to study the causal impact of certain design or mechanism changes. Companies like Facebook and Twitter are always refining their platforms, implementing minor tweaks to create a particular user experience. When researchers are able to study these changes or implement their own features, there are important insights to be gained. As platforms continue to evolve and new platforms arise, this setting offers a deep and rich area for further research.

Engaging Politically Diverse Audiences

Introduction and Prior Work

Now, let’s turn to Martin’s paper “Engaging Politically Diverse Audiences on Social Media”, which was published in 2022. I’ll discuss key details about the paper and add key discussion points raised by students during the lecture.

The paper is a collaboration with Doug Beeferman, David McClure, and Deb Roy. As mentioned above, this paper thematically fits into the area of making social networks healthier by reducing the degree of polarization observed online. In particular, this study examines how the language media outlets use might influence the kinds of users who interact with their content. Then, Martin and his coauthors create and test models that reframe language to make it less polarizing, offering journalists a valuable tool to reduce political polarization related to their content.

This paper ties into two streams of prior literature. First, while prior work has focused on how language used in headlines and social media posts influences the how viral content was, Martin and his coauthors take a novel approach by examining what kinds of language results in polarized engagement. The paper also intersects with another vein of research that studies language transformation and reframing. This literature has largely focused on adjusting language to make it less polarized. However, these are largely black-box models that can be difficult to understand and act upon. By contrast, Martin and his coauthors attempt to create a system that offers journalists clear directives for adjusting their language and highlights how certain changes impacts the expected polarization of the language.

Measuring and Modeling Audience Diversity

The first part of the paper focuses on measuring the political diversity of engagement with tweets. Let’s review the data used. First, the tweets analyzed come from an assortment of five major news media sources with various political leanings: New York Times, CNN, Wall Street Journal, Fox News, and Breitbart. They also secure tweets from Frontline, a PBS-sponsored documentary production company. From January 2017 to March 2020, the authors collect about 566,000 tweets and 104 million retweets. The next step was to measure the political leanings of users who retweet content from the news sources. Martin and his coauthors classify users as either right- or left-leaning based on the content they share. Building on prior work, they calculate alignment scores based on the kinds of sources shared by a given user over time. Finally, the authors calculate an entropy score to determine the political diversity of retweets received for tweets from each of the media organizations.

An important point that came up at this stage of the discussion was if the entropy diversity score might be biased by who is actually following the account. That is, if a given media organization only has right-leaning followers, then it seems deterministic that the retweets will largely be from right-leaning users. More broadly, we might be concerned that even with different language, it might be difficult to engage diverse audiences if an account’s followers are not diverse to begin with. Martin noted that this was an important point that they had considered in designing this study. However, he argued that retweet networks might allow others to view this content and that even if this is the case for large accounts, evaluating these accounts as-is allows one to really measure the impact of language without introducing confounding factors. I think this point highlights the importance of tradeoffs that have to be made when searching for causal relationships. It is critical to prioritize the core research question at hand in designing the study, and I think that an interesting extension to this work could pick up the question of language reframing strategies across different kinds of accounts.

Moreover, some of the results in the paper help validate the measure of political alignment. Indeed, the authors find strong correlations between the fraction of users they marked as right-leaning and the Republican voteshare in the 2016 and 2018 elections. Additionally, they conduct a survey via Amazon Mechanical Turk to evaluate if respondents, who report a political alignment, would retweet examples of tweets from the various sources. Again, they find a positive correlation between the conservative responses and the fraction of right-leaning-classified users who retweeted that tweet. These measures contribute some helpful validity to the construction of these measures.

Then, using the data on tweets and the political diversity of their audiences, the authors develop several models that predict the diversity a new tweet’s audience, which the authors call “bridginess”. The authors try several models, including using TF-IDF, Recurrent Neural Networks, word2vec, but ultimately settle on a fine-tuned BERT model. They chose BERT because it minimized Mean Absolute Error and Mean Squared Error relative to the other options. Furthermore, the model was then analyzed to understand aspects of language were driving bridginess. Interestingly, definite articles had large attribution values, suggesting that specific language resulted in higher bridginess. This is an important aspect of the project because it allows the model to be more interpretable and permits the creation of clearer guidance for journalists.

A critical point made by Martin at this juncture was they adopted the BERT model because it offered a substantial improvement over the other options. Rather than jumping to the newest or fanciest model, Martin noted that it is always good to default to simple models and proceed to other more complex models as needed. I think this is an important tip to keep in mind because it can be very tempting to immediately apply the latest methods we have learned about when there are perhaps other, better suited methods. Moreover, this more broadly connects to the importance of keeping the research question at the top of one’s mind when running these analyses. Each step should be in service of crafting the best, most appropriate way to answer the question at hand.

Deploying and Evaluating Tools for Audience Diversity

Returning to the paper, the authors then developed a web application in collaboration with Frontline. Over a multi-year collaboration with several journalists, they iteratively built a tool that Frontline staff could use whenever they were getting ready to disseminate a tweet. Two key features emerged from this iterative process, both focused on providing context to the bridginess scores. First, the tool highlighted key words influencing the score which were frequently retweeted by either left- or right-leaning users. Second, they offer similar tweets that scored high in bridginess to give journalists a tangible example that they could reference. Below, I have included a screenshot of the platform.

Screenshot from Saveski et al. (2022) of the user interface for the web application developed by the authors and their partners

They then ran experiments to determine if the tool was actually changing the bridginess. Using promoted tweets, the authors could control who was viewing the tweet and ensured that an equal amount of conservative and liberal users viewed the tweet. The treatment tweets had been reframed to encourage more diverse engagement; the control was reframed to encourage less diverse engagement. An important point raised by Professor Yang here was why the control was not just the normal text written by a journalist without using the tool. Martin noted that such an experiment would also be interesting but would measure the degree of improvement from the journalists themselves. Instead, he and his coauthors were focused on validating if their bridginess score translated to more diverse engagement in the real world. This raised a broader discussion about the importance of nuance in understanding exactly what is being measured when performing causal inference. Martin noted the importance of carefully considering each step of the experimental design is a critical part of doing good research. Rather than attempt to analyze many different topics, he mentioned it is far more valuable to focus on building extremely robust ways to understand even one question.

The authors ultimately found that almost equal numbers of both right- and left-leaning people were engaging with the treatment tweets optimized for bridginess, but left-leaning people were much more likely to engage with the control groups. In particular, relative to control tweets, treatment tweets had a 20% smaller gap in engagement between left- and right-leaning users. These are very encouraging results for the effectiveness of the bridginess measure! However, a few further considerations should be taken into account before we proceed with any generalizations about these results. For example, it is somewhat unclear in which direction the engagement gap shifts between the treatment and control. Are fewer left-leaning people engaging, more right-leaning people engaging, or neither? Martin mentioned that Twitter’s advertising confidentiality policies does not allow such an analysis to be conducted, but it would certainly be interesting to see how bridginess interacts with this dynamic. Moreover, it seems important to continue to evaluate the robustness of these tools outside of the advertising context. Researchers cannot access the algorithms that Twitter uses to display ads to users, so it is quite difficult to know if the advertisements were shown to users, who despite being similar prima facie, are also of similar likelihood to engage with tweets as the general public. Continuing to develop and deploy experiments that bring this technology into everyday interactions strikes me as important work that ought to be pursued.

Additional Thoughts

I want to close with a few additional thoughts about this paper and Martin’s work more broadly.

Incentives

One interesting point raised during the discussion was about incentives. I think this is a critical point for any work that has to do with reforming social platforms. News media organizations have a lot to gain from promoting divisive content. They might be able to build a rabid, loyal reader base and capitalize on people’s fears or biases. If a news source opts for bridginess, they might lose that advantage and instead have a smaller, albeit more diverse audience. Martin’s response was that while such platforms might suffer a short-term loss in overall engagement, they may gain in the long-term by appealing to a wider base of users.

While I think Martin’s point is valid for some media organizations with the funding and existing readership to survive the decrease in viewership, and therefore subscription and advertising revenue, many sources might not be able to survive this change. It may indeed also be the profit-maximizing strategy for firms to use low-bridginess language if it means that they can guard their readership base. And while smaller organizations might have fewer financial resources, legacy media organizations also seem to benefit from polarization. Below, I include a figure created by Bloomberg that highlights a rapid rise in the number of New York Times customers coinciding with the 2016 Presidential Election. This is just descriptive evidence but highlights that news organizations broadly might have a strong financial incentive to engage only specific audiences. I think an interesting avenue for policy research would be to document that media organizations with a known bias, ie. right-leaning, can leverage content high in bridginess to convert readers into subscribers or customers with the opposite bias, ie. left-leaning. Such evidence might be necessary to convince companies to pursue and invest in technology like this.

New York Times gained large customer base during 2016 Election (from Bloomberg)

Research and Downstream Use Cases

More broadly, I think that this work offers a really creative and productive way to think about research. So often, research, across all fields, remains siloed in publications and relies on practitioners in industry or policy to implement. As techniques grow in power and reach, it seems more and more important for researchers to consider the implementation of their findings from the very start. Granted, this might be less so the case in computer science and natural language as technology companies increasingly hire such scholars, but in other fields this disconnect between academia and the real world does seem to exist. There are a few potentially negative aspects to this dynamic. For example, we might be concerned about the lag between when breakthroughs are made and when practitioners pick them up. Considering in advance the ways that research can be converted into products or use cases might reduce this lag. We might also be concerned about unethical uses of technology developed by researchers. If we preempt potentially dangerous uses of technology, we might be able to avert harms before they happen. Remaining cognizant of how research can be leveraged to create practical solutions is thus increasingly important. The aspect of Martin’s work that I find so compelling is that it centers the ultimate goal of impacting platforms for the better.

All in all, I really enjoyed learning more about Martin’s work and broader research agenda. His advice and suggestions for doing good research will no doubt be of much use in my own research and for my fellow students as well.