X Launches New Recommendation Algorithm, Prioritizing User Behavior Over Likes

Original Author: David, DeepTide TechFlow

On the afternoon of January 20th, X open-sourced a new version of its recommendation algorithm.

Musk's reply was quite interesting: "We know the algorithm is dumb and needs a major overhaul, but at least you can see we're struggling in real time to improve it. Other social platforms wouldn't dare to do it this way."

This statement has two meanings.First, they admit that the algorithm has issues; second, they use "transparency" as a selling point.

This is X's second open-sourcing of its algorithm. The 2023 version of the code hasn't been updated for three years and has long since become outdated compared to the actual system. This time, it has been completely rewritten, with the core model switching from traditional machine learning to the Grok transformer. According to the official statement, this change "completely eliminates manual feature engineering."

In the past, algorithms relied on engineers manually adjusting parameters, but now AI directly analyzes your interaction history to decide whether to recommend your content.

For content creators, this means the old "black magic" beliefs—such as "what time is best to post" or "which tags will increase followers"—may no longer work.

We also browsed through open-source GitHub repositories and, with the help of AI, found that there are indeed some hard-coded logic elements hidden in the code that are worth digging into.

Algorithm logic evolution: from manually defined to automatically determined by AI

First, clarify the differences between the new and old versions; otherwise, the following discussion is likely to be confusing.

In 2023, the version of Twitter's open-sourced model was called Heavy Ranker, which essentially was traditional machine learning. Engineers had to manually define hundreds of "features": whether the post had images, how many followers the poster had, how long ago the post was made, whether there were links in the post...

Then assign weights to each feature, adjust them back and forth, and see which combination works best.

The newly open-sourced version is called Phoenix, with a completely different architecture. You can think of it as an algorithm that relies more heavily on large AI models. Its core uses Grok's transformer model, which is the same type of technology used by ChatGPT and Claude.

The official README document states it clearly: "We have eliminated every single hand-engineered feature."

All the traditional rules that relied on manual extraction of content features have been completely discarded.

Now, what criteria does this algorithm use to determine whether a piece of content is good or not?

The answer depends on you.Behavior sequenceWhat you have liked in the past, whom you have replied to, which posts you have viewed for more than two minutes, and what types of accounts you have blocked—Phoenix feeds all these behaviors into a transformer, allowing the model to learn the patterns and make summaries on its own.

For example: The old algorithm is like a manually written scoring table, where each item is checked off and points are assigned accordingly;

The new algorithm is like an AI that has seen all of your browsing history.Guess directly.What would you like to see in the next second.

For creators, this means two things:

First, previous tips such as "best times to post" or "golden hashtags" have become less valuable to refer to.Because the model no longer considers these fixed features; instead, it focuses on each user's individual preferences.

Second, whether your content can be promoted more and more depends on "how people who see your content will react."This reaction has been quantified into 15 behavioral predictions, which we will elaborate on in the next chapter.

Algorithms Predicting Your 15 Reactions

After Phoenix obtains a post to be recommended, it will predict 15 possible actions that the current user might take upon seeing this content:

Positive behaviorSuch as liking, replying, forwarding, quoting a post, clicking on a post, visiting the author's profile, watching more than half of a video, expanding images, sharing, staying on the content for a certain duration, and following the author.
Negative behavior: Such as clicking "Not Interested," Blocking the author, Muting the author, or Reporting

Each action corresponds to a predicted probability. For example, the model determines that you have a 60% probability of liking this post, a 5% probability of blocking the author, and so on.

Then the algorithm does a simple thing: multiply these probabilities by their respective weights, sum them up, and obtain a total score.

The formula looks like this:

Final Score = Σ ( weight × P(action) )

The weights of positive behaviors are positive numbers, and the weights of negative behaviors are negative numbers.

Posts with higher total scores appear at the top, while those with lower scores sink down.

Breaking away from formulas, put simply, means:

Nowadays, whether a piece of content is successful is no longer determined solely by how well it's written (although readability and helpfulness form the basis for dissemination). Instead, it's more about "what reaction this content will provoke from you." Algorithms don't care about the quality of the post itself; they only care about your behavior.

Following this line of thinking, in extreme cases, a vulgar post that provokes people to comment and express their opinions might receive a higher score than a high-quality post that receives no interaction. The underlying logic of this system might just be like that.

However, the newly open-sourced version of the algorithm does not disclose the specific numerical values of behavior weights, although the 2023 version did.

Old reference: 1 report = 738 likes

Next, we can take a look at the data from 2023. Although it's old, it can help you understand how much the algorithm values different behaviors differently.

On April 5, 2023, X did indeed release a set of weight data on GitHub.

Directly to the numbers:

Translate from Chinese (Simplified) to English (United States) in a more literal way.

Data Source: Legacy Version GitHub twitter/the-algorithm-ml repository, click to view the original algorithm

A few numbers are worth a closer look.

First, likes are almost worthless. The weight is only 0.5, the lowest among all positive behaviors. In the eyes of the algorithm, the value of a like is roughly equivalent to zero.

Second, dialogue and interaction are the real currency. The weight of "you reply, and the author replies back" is 75, which is 150 times that of a like. The algorithm is most interested in back-and-forth conversations, not one-way likes.

Third, the cost of negative feedback is very high. One Block or Mute (-74) requires 148 likes to offset. One Report (-369) requires 738 likes. Moreover, these negative points accumulate into your account's reputation score and affect the distribution of all your future posts.

Fourth, the video completion rate has an absurdly low weight. It's only 0.005, which is almost negligible. This is in sharp contrast to Douyin and TikTok, where watch-through rate is considered a core metric.

The official also stated in the same document: "The exact weights in the file can be adjusted at any time... Since then, we have periodically adjusted the weights to optimize for platform metrics."

Weights can be adjusted at any time, and they have indeed been adjusted before.

The new version hasn't disclosed specific numerical values, but the logic framework written in the README is the same: positive points are added, negative points are subtracted, and a weighted sum is calculated.

The exact numbers may have changed, but the scale relationships are likely still valid. Replying to others' comments is more useful than receiving 100 likes. Making people want to block you is worse than having no interaction at all.

After knowing all this, what can we, as creators, do?

After analyzing the new and old algorithm code of Twitter, here are several actionable conclusions: 1. **Content Relevance is Key**: The algorithm prioritizes content that is highly relevant to the user's interests, based on their past interactions (e.g., likes, retweets, replies). To increase visibility, create content that aligns closely with your audience's interests. 2. **Engagement Drives Visibility**: Posts that generate high engagement (likes, retweets, replies, clicks) are

1. Reply to your commenters. In the weighting table, "author replies to a commenter" is the highest-scoring item (+75), which is 150 times higher than a one-sided user like. It's not that you should go asking for comments, but if someone does comment, you should reply. Even a simple reply like "Thank you" will be recorded by the algorithm.

2. Don't let others swipe away. One negative action, such as blocking someone, requires 148 likes to offset. Controversial content indeed tends to generate interaction, but if the interaction is in the form of "This person is so annoying, I'll block them," your account's credibility score will continue to decline, affecting the distribution of all your future posts. Controversial traffic is a double-edged sword; before cutting others, cut yourself first.

3. Place external links in the comment section.The algorithm does not want to direct users outside the platform. Main content with links will be devalued.This was publicly stated by Musk himself. If you want to drive traffic, write the content in the main text and put the link in the first comment.

4. Don't spam the chat. The new version of the code includes an Author Diversity Scorer, which de-weights posts from the same author that appear consecutively. The design intent is to make the user's feed more diverse, with the side effect that posting ten messages in a row is less effective than carefully crafting and posting just one.

6. There's no "best time to post" anymore. The old algorithm included a manual feature called "posting time," but the new version simply removed it. Phoenix only considers user behavior sequences and no longer takes into account the time at which a post was made. As a result, tips like "posting at 3 PM on Tuesdays yields the best results" are becoming less and less relevant.

The above is what can be read at the code level.

There are also some additional scoring rules from X's public documentation that are not included in the open-sourced repository: the blue verification badge provides a boost, all-caps text is penalized, and sensitive content triggers an 80% reduction in reach rate. Since these rules are not open-sourced, I won't elaborate further.

In summary, the open-sourced content this time is quite substantial.

Complete system architecture, candidate content retrieval logic, ranking and scoring process, and implementations of various filters. The code is primarily written in Rust and Python, with a clear structure. The README is more detailed than that of many commercial projects.

But a few key things weren't released.

1. The weight parameters have not been made public. The code only mentions "adding points for positive behaviors and deducting points for negative behaviors," but it doesn't specify how many points a like is worth or how many points are deducted for a block. The 2023 version at least provided the actual numbers, while this time only a formula framework is given.

2. The model weights are not publicly available. Phoenix uses the Grok transformer, but the model's parameters themselves are not disclosed. You can see how the model is called, but you cannot see how the calculations are performed internally within the model.

3. The training data is not publicly available. The training data used for the model, how user behavior is sampled, and how positive and negative samples are constructed are all not mentioned.

To give an analogy, this open source is like telling you, "We calculate the total score using a weighted sum," but not telling you what the weights are; it's like telling you, "We use a transformer to predict the probability of actions," but not telling you what the internal structure of the transformer looks like.

In a horizontal comparison, neither TikTok nor Instagram has ever disclosed such information. This time, X (formerly Twitter) has indeed made more information open-source than other major platforms. However, it's still some distance away from achieving "full transparency."

This is not to say that open source has no value. For creators and researchers, being able to see the code is always better than not being able to.