[David Strom's Web Informant] Researching the Twitter data feed

David Strom david at strom.com
Mon Feb 5 10:10:08 EST 2018


Web Informant, February 5, 2018: Researching the Twitter data feed

A new book by UCLA professor Zachary Steinert-Threkeld called Twitter as
Data
<https://www.cambridge.org/core/elements/twitter-as-data/27B3DE20C22E12E162BFB173C5EB2592>
is
available online free for a limited time, and I recommend you download a
copy now. While written mainly for academic social scientists and other
researchers, it has a great utility in other situations.

Zachary has been working with analyzing Twitter data streams for several
years, and basically taught himself how to program enough code in Python
and R to be dangerous. The book assumes a novice programmer, and provides
the code samples you need to get started with your own analysis.

Why *Twitter? Mainly because it is so transparent*. Anyone can figure out
who follows whom, and easily drill down to immediately see who are these
followers, and how often they actually use Twitter themselves. Most Twitter
users by default have open accounts, and want people to engage them in
public. Contrast that with Facebook, where the situation is the exact
opposite and thus much harder to access.

To make matters easier, *Twitter data comes packaged in three different
APIs, streaming, search and REST*. The streaming API provides data in
near-real-time and is the best way to get data on what is currently
trending in different parts of the world. The downside is that you could be
picking a particularly dull moment in time when nothing much is happening.
The streaming API is limited to just one percent of all tweets: you can
filter and focus on a particular collection, such as all tweets from one
country, but still you only get one percent.That works out to about five
million tweets daily.

Many researchers run multiple queries so they can collect more data, and
several have published interesting data se <http://www.docnow.io/catalog/>ts
that are available to the public <http://www.docnow.io/catalog/>. And there
is this map that shows patterns of communication across the globe over an
entire day <http://www.necsi.edu/research/networks/globalsync/>.

The REST API has limits on how often you can collect and how far back in
time you can go, but isn't limited to the real-time feed.

Interesting things happen when you go deep into the data. Zachary first
started with his Twitter analysis, he found for example a large body of
basketball-related tweets from Cameroon, and upon further analysis linked
them to a popular basketball player (Joel Embiid) who was from that country
and lot of hometown fans across the ocean. He also found lots of tweets
from the Philippines in Tagalog were being miscataloged as an unknown
language. When countries censor Twitter
<https://www.buzzfeed.com/craigsilverman/country-withheld-twitter-accounts>,
that shows up in the real-time feed too. Now that he is an experienced
Twitter researcher, he focuses his study on smaller Twitterati: studying
the celebrities or those with massive Twitter audiences isn't really very
useful. The smaller collections are more focused and easier to spot trends.

So take a look at Zachary's book and see what insights you can gain into
your particular markets and customers. It won't cost you much money and
could payoff in terms of valuable information.

Comments always welcome here: http://blog.strom.com/wp/?p=6362
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.webinformant.tv/pipermail/webinformant_list.webinformant.tv/attachments/20180205/2b11630e/attachment-0002.html>


More information about the WebInformant mailing list