The basic methodology and stats for Olympic tweets analysis

This is a summary of the methodology and the basic stats we tried in an published article, Tokyo Olympics: A symbol of the divide in Japanese society(
This summary assumed that you are familiar with R and basic statistics. You ead the analysis itself in the article to be published. Here, I will note the basic code and procedure.

There are many limitations on the Twitter API available to the public (tweets can only be retrieved for the past week, the number of tweets that can be retrieved at one time is 18,000, etc.), but I tried to collect tweet data within that range.
If you are an academic, you can access the full archive with Twitter's approval, so if you are interested, please give it a try.

Obtaining tweet data about Olympic
I only did basic aggregation and organization.
First, I used a package called "rtweet" library for R to get the data from the Twitter API.

OtweetJ=search_tweets("Olympics OR Olympics since:2021-05-15_23:00:01_JST until:2021-05-16_00:00:00_JST",n=20000,include_rts = TRUE,langs = "ja")

Since we can only retrieve 18,000 tweets at a time, we repeated the process several times. Note that since it is not the full twitter archive, the data probably fluctuates in real time due to tweet deletions or something else.
In this case, the data was collected for one week. The summary is below.
Data collection period: From May 9, 2021 00:00:00 to May 16, 2021 00:00:00 (7 days, 168 hours)
Total number of tweets: 2,524,416
Number of retweeted tweets: 2,000,580
Source of RTs: 113,015
Ratio of retweeted tweets to total tweets: 79.24
Number of IDs that said something during the period: 420,643
Number of IDs that retweeted at least once during the period: 276,845

The following process is a general R-based aggregation analysis, so I'll skip it. For detailed results, please refer to the data sheet to be released at the same time as my article to be published.

●Basic Stats
DATA : From May9,2021 00:00:00 Until May 16 00:00:00 7days 168 hours

Number of tweets 2,524,416
Retweeted tweets 2,000,580
Number of Original tweets of RTs 113,015 tweets expanded to 2,000,580
rate of retweeted tweets 79.24%

total number of IDs 420,643
Number of IDs that retweeted at least one time 276,845

Number of tweets par a person 6 tweets, par a person and par a day 0.86
Number of retweets par a person 7.23 tweets, par a person and par a day 1.03

517 of the most retweeted tweets covered 30% of all tweets
1,924 of the most retweeted tweets covered 50% of all tweets

the most retweeted tweets from 69 IDs covered 30% of all tweets
the most retweeted tweets from 281 IDs covered 50% of all tweets

- supporters and oppositions
We focused on the most retweeted 1,924 tweets that covered 50% of all tweets.

Category case %
Support Olympic 178,554 14.15%
Oppose Olympic 1,056,103 83.68%
Others 27,459 2.18%

*We observed fear of covid19, worrying children and family, worrying medical conditions, accusing government responses, unfair and contradictive government's decisions etc.
*Only 15 tweets which were retweeted 6,529 times were the bright topics about Olympic. Most of them were about activity of Japanese Olympic athletes.

2,524,416 Tweets Opinion Breakdown

- Occupations
We focused on the most retweeted 1,924 tweets that covered 50% of all tweets. We categorized the IDs into professions that tweeted them.


Category case %
Cultural Industries 510,197 40.42%
Other Commentators 472,834 37.46%
Journalism 162,505 12.87%
Politics&Gov 69,983 5.54%
Healthcare 41,531 3.29%
Sports 5,166 0.41%


Other Commentators : Ordinal people include influencers and bloggers
Cultural Industries : a person of culture, writers, professors, freelance journalists, magazines and entertainers etc
Journalism : news organizations and staffs of news organization
Politics&Gov : politicians and parties
Healthcare : doctors, nurses and other medical staffs
Sports : athletes and organizations related sports

- Time and tweets
In Japan, there are 3 peaks of tweets in a day. First one is from 8am to 10pm, second one is from noon to 2pm and last one is from 10pm to 24pm. The peaks of tweets about Olympic are almost the same. The last one starts earlier than usual pattern. The peaks of retweeting is in 3 hours after original tweet is posted. This case is almost same.




Number of most retweeted tweets in 24 Hours
RTs % in 168 Hours (7 Days)
RTs % in 24 Hours


- Ranking
The Most Retweeted 10 tweets spreads 2.82% of all tweets

1 18,765 times
2 15,211 times
3 6,594 times
4 4,802 times
5 4,742 times
6 4,481 times
7 4,436 times
8 4,138 times
9 4,068 times
10 4,062 times

The Most Retweeted IDs 10 spreaded 10.86% of all tweets
1 50,391 times
2 43,801 times
3 28,992 times
4 24,817 times
5 24,710 times
6 22,581 times
7 20,972 times
8 19,686 times
9 19,408 times
10 18,765 times

10 IDs that Retweeted Mosts generated 0.004%(9,863 tweets) of all tweets
1 1,995 times
2 1,807 times
3 1,166 times
4 922 times
5 823 times
6 810 times
7 796 times
8 786 times
9 758 times
10 750 times

*Numerous number of retweets is highly suspicious of being bot. However, account created date, number of followers and profile are looks normal. Additionally, there is no tangible evidence of hidden organized activity as far as we observed.
* We observed several organized activities such as hash tag, online march and signatures campaigns on both sides.