Pages Menu
TwitterRss
Categories Menu

Posted by on Feb 20, 2013 in Featured Maps, R Spatial, Slideshow, Visualisation | 8 comments

Mapped: Twitter Languages in New York

NY_twitter_sml

Following the interest in our Twitter Tongues map for London, Ed Manley and I have teamed up with Trendsmap creator John Barratt to offer this snapshot of New York City’s Twitter languages. We have visualised the geography of about 8.5 million geo-located tweets collected between Jan 2010 and Feb 2013. Each tweet is marked by a slightly transparent dot coloured according to the language it was written in. Language was detected using Google’s translation tools. The above map (click for interactive version courtesy of Oliver O’Brien) has the top ten languages plotted together and the one below takes the top 24 in turn (excluding English) and orders them by popularity. English (in grey above) is by far the most popular with Spanish (in blue above) taking the top spot amongst the other language groups.  Portuguese and Japanese take third and fourth respectively. Midtown Manhattan and JFK International Airport have, perhaps unsurprisingly, the most linguistically diverse tweets whilst specific languages shine through in places such as Brighton Beach (Russian), the Bronx (Spanish) and towards Newark (Portuguese). You can also spot international clusters on Liberty Island and Ellis Island and if you look carefully the tracks of ferry boats between them. Ed has written up some more in depth analysis of the data here.

twitter_NY_lang_facet_sml

Making the Maps

For those interested, the maps above were produced using the R software platform with the ggplot2 package. Both coped surprisingly well with plotting 8.5 million points (it took about 15 minutes on my two year old iMac) and the results are really great. Here is the code I used to produce the black and white map above:

#two input data frames here. "lang_freqs" has the total frequency of each language and is ordered highest to lowest (this is used for the facet ordering) and "twit_lang" is a data frame with each tweet's location (lat, long) and its language (lang) (it therefore has 8.5 million rows).

#here I create a new column lang1 to twit_lang which is used to order the faceting.

labs<-as.factor(lang_freqs$lang)
twit_lang$lang1

p

p1<-c(geom_point(data=twit_lang,aes(x=long, y=lat),colour="white", alpha=0.1, size=1.2))

p+p1+ quiet + facet_wrap(~lang1, ncol=4) + opts(strip.text.x=theme_text(size=8))+opts(strip.background = theme_rect(colour="white", fill="white"))

Share on Facebook
Bookmark this on Google Bookmarks
Share on reddit
Bookmark this on Digg
Share on StumbleUpon
Share on LinkedIn

8 Comments

  1. I wonder if the language detection differentiates between Hebrew and Yiddish. I know there are some Hasidic Twitter users who tweet at least partially in Yiddish.

    • Hebrew is a Semitic language, while Yiddish is a Germanic one. Language detection algorithms shouldn’t have any problems with that.

  2. I’d like to know your denominator — that is, to see the distribution of all tweets. This would help to know if some neighborhoods are black in your map because they are not multilingual or because they just don’t tweet very much at all.

    • Every sent tweet that we have collected gets a dot on this map. Areas with mostly english tweets (and are therefore not multilingual) will appear grey. Black areas have no tweets. HTH James

  3. Great work, congratulations.

    I’m wondering if you plan to share the dataset that was used for creating this map, even if only by request (I don’t even know if know if, according to the terms of service from twitter, you’re allowed to keep a copy of the data).

    The only large public twitter dataset (actually, a script for slowly downloading the messages) that I know of is the one used in the TREC microblog task, but only a short number of messages from that dataset contain geospatial coordinates.

    There’s a great dataset regarding photos and georeferenced photos from Flickr, called the COntent-based Photo Image Retrieval (CoPhIR) dataset, and trying out twitter data on tasks such as finding relations between gender ou language usage (e.g., opinions), and other geospatial properties, is indeed very interesting.

  4. Very interesting work. I wonder if transliterated languages are detected by Google’s translation tools. I know for instance that many South Asian language users(Hindi, Bengali, Urdu…) prefer to use the Latin alphabet when using technology.

  5. How hard would it be to do this for other cities and places? It could be incredibly interesting and even valuable in measuring diversity.

    • Le immagini generalmente sono molto curate per quanto riguarda gli aspetti formali
      ed estetici (luce, composizione, pose dei soggetti), ma la resa un po statica, in quanto non essendo dei fotomodelli professionisti, i soggetti fanno fatica a rimanere rilassati e naturali di fronte alla macchina fotografica.
      They capture the real moments not the choreographing preconceived
      moments for last couple of years. As well as they are good in wedding photography they are also
      good in assignments for any news paper or magazine.

Trackbacks/Pingbacks

  1. Language Map of NYC « centroonline - [...] Language Map of NYC [...]
  2. Urban Omnibus » Roundup — Silicon Alley, Park Funding, Twitter Linguistics, Cutting Carbon, Biotopes, Bronx Boutiques, Topsy Turvy, and Architecture on Screen - [...] Composite map of tweets in ten languages across New York (L) and individual maps for nine languages (R) | …
  3. Google, Ubuntu, New Jersey, More: Big Sunday Morning Buzz, February 24, 2013 | ResearchBuzz - [...] is interesting: Mapped Twitter languages in New York. More mapped data: Missed connections on [...]
  4. This Map Captures New York’s Glorious Diversity In Tweet Form | SafetyFist.com - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  5. This Map Captures New York’s Glorious Diversity In Tweet FormDon't Call Me Tony | Don't Call Me Tony - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  6. This Map Captures New York’s Glorious Diversity In Tweet Form | This Is Jah Smith DOT com - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  7. This Map Captures New York’s Glorious Diversity In Tweet Form | TechKudos - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  8. This Map Captures New York's Glorious Diversity In Tweet Form - Daily Small Talk - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  9. This Map Captures New York’s Glorious Diversity In Tweet Form | Tips for the Unready - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  10. This Map Captures New York’s Glorious Diversity In Tweet Form | 5 For Business - [...] the New York Public Library Map Division, here is an amazing map from the geniuses at London-based Spatial Analysis …
  11. Mapped: the Twitter Languages of New York | Jen Wolf - [...] Click the map below to link to the full-page interactive map, and here’s a link to the short article …
  12. Mapping out the languages spoken in NYC based on Twitter - Doobybrain.com - [...] mass amount of data from January 2010 to February 2013, James Cheshire, Ed Manley, and John Barratt mapped out …
  13. Twitter Language Maps | englishinamerica - [...] http://spatialanalysis.co.uk/2013/02/mapped-twitter-languages-york/ [...]
  14. Expodomain.com » New York & London Mapped According to the Languages that are Tweeted - [...] FastCoDesign. More information als available here and here. Details on the data processing [...]
  15. Image: Wednesday 27th February 2013 @ 00:59:01 | FMP of Quarina Sultana - [...] Twitter languages map of New York project from James Cheshire, Ed Manley (of CASA) and John Barratt (of Trendsmap) – as well …
  16. New York & London Mapped According to the Languages that are Tweeted | Innovar.org - [...] FastCoDesign. More information als available here and here. Details on the data processing [...]
  17. A map of New York according to the language of tweets - [...] the tracks of the ferry boats between the two locations. You can find out more about the project on …
  18. VisLing » The languages of the Twitter messages (London and New York) - [...] Japanese take third and fourth respectively. More details on the data processing and map here and here. Ed …
  19. Languages Of New York, According To Twitter [MAP] - AllTwitter - [...] via Shutterstock, map via Spatial [...]
  20. New York & London Mapped According to the Languages that are Tweeted | One Step To Information - [...] FastCoDesign. More information als available here and here. Details on the data processing [...]
  21. Línguas do Twitter mapeadas em Nova York – The Epoch Times em Português | Leia a diferença. - [...] NOVA YORK – Quatro pesquisadores e cartógrafos trabalharam com mensagens do Twitter enviadas da cidade de Nova York nos …
  22. Languages of New York, via Twitter - [...] their map on most used languages in London, James Cheshire and Ed Manley, along with John Barratt, mapped the …
  23. Beautiful Data Gallery « Hearing the Oracle - [...] TOP RIGHT: A colorized map of NYC-area twitter posts. Each color represents a different source language for a tweet. …
  24. ベスト・インフォグラフィック(週間)2013年3月1週 - [...] image by Spatial Analysis [...]
  25. Weekend Reading | Backslash Scott Thoughts - [...] New York’s Twitter Languages. [...]
  26. New York & London Mapped According to the Languages that are Tweeted | RSS Reader - [...] FastCoDesign. More information als available here and here. Details on the data processing [...]
  27. Starting Analysis and Visualisation of Spatial Data with R | Spatial Analysis - [...] Mapped: Twitter Languages in New York [...]
  28. Analysing Languages in the New York Twittersphere | UrbanMovements - [...] has blogged over on Spatial Analysis about the map creation process and highlighted some of the predominant [...]
  29. Data Visual: 6/10 | Austin+Wehrwein - […] an update but today should more then make up for that. Today the visual is from James Cheshire and …

Post a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>