Welcome to the WikiWorld!
The world created by Wiki users
Tell Me More

Introduction

I am sure you have already heard of Wikipedia, right ? Every high-school student has already used it in order to cook up the best presentation for his History class ! With more than 6 million English articles, Wikipedia can be seen as the world's largest and most-read reference work in history. It might be easy to get lost in this mass of information. However, the platform proposes a quick and easy way to navigate between the millions of articles through the usage of hyperlinks connecting different articles. For example, these blue links are super useful to go from Christopher Columbus’ Wikipedia page, to the one of the Americas, but not only ! Indeed, these hyperlinks can give a real insight into the cultural connections that exist between diverse concepts, such as historical figures and geographical locations. As a game, Wikispeedia provides even more information in that direction !


Wikispeedia who?

Wikispeedia is a game developed by EPFL Data Science Lab based on Wikipedia articles. The rules are simple: you are given two Wikipedia articles, and starting from the first article, your goal is to reach the second one, exclusively by following hyperlinks in the articles you encounter.
The information about the article paths chosen by the players is collected and allows to build the Wikispeedia dataset.

Interested in playing a game of Wikispeedia ? Click the yellow link !


Why is it useful ?

The Wikispeedia dataset provides link paths made by players attempting to join two different concepts. But why is it useful and how can we use this ?
In fact, these connection paths between articles exhibit patterns from the player’s notions of these concepts. For example, in people's minds Christopher Columbus is known for discovering the Americas and not for music or arts. Thus, players will most likely click on the link targeting the Wikipedia article on Americas.
The interesting part is that from these connection paths we can build a connectivity graph. This graph allows us to construct a representation of Wikispeedia users’ reality, described through the centrality of countries, peoples and other concepts in this graph !
In the end, a new and reduced vision of the world can be built using data from the players’ games. This website is dedicated to constructing this summarizing vision of the world: WikiWorld ! We also question the existing differences with our real world: does this representation of Wikispeedia users' reality correspond to our own ?


Ready to build the Wikiworld ?

Countries

What countries will constitute the world shaped by Wiki users ?

When we think about a whole new world, we usually start to think about land, continents and countries. So a good starting point would be to determine what the WikiCountries are. From the existing countries that constitute our planet Earth, which ones are the most important according to Wiki users ? By using the connectivity graph mentioned earlier, we can compute the centrality score for each country. The countries with the higher centrality are kept as countries constitutive of our WikiWorld.
Are you curious to know if your favorite country made it into the WikiWorld ? Take a look at the map displaying the centrality scores below !



As you can see on the map, the United States displays the highest centrality score. They are closely followed by the United Kingdom. This makes sense as they are both populated by English speakers and the subset of Wikipedia we used is from the English Wikipedia, thus the english speaking users have more chance to know more about these countries. Other countries with a high centrality are Norway, Germany and Mexico.


As all countries belong to a continent, we can also look at the centrality of entire continents, thanks to the centrality information of their countries. The bar plot below gives you the scores for each continent. Here, we take the centrality score of a continent as the mean of the scores of its countries. Without much surprise, North America possesses the highest centrality. It is followed by Europe, but from quite a distance…




Now that we have the centrality information for both countries and continents, it would be nice to see how these centralities relate to the population size of each country. Indeed, one might naively think that the more populated a country is, the more influence on the world it has, and the higher its centrality will be. But is this really the case ?
The plot below gathers all this information, to give us a better insight. Each country is represented by a box, whose size is proportional to the size of its population. The color associated with the box depends on the centrality of the country and follows the same colour scale as the one used previously for the map.
You can play with this plot by clicking on the boxes, to see which country is associated with which continent, its population size and centrality. Have fun :)



Well… It seems that our first intuition that the centrality of a country depends on the size of its population is not entirely true. Indeed, the Republic of China possesses the bigger population, with more than 1.3 billion citizens, and yet, its centrality score of 2.02 is pretty average. On the other hand, the WikiCountries selected previously, such as the US, UK and Norway showed the highest centralities but clearly don’t have the biggest populations.
Thus, it seems that the secret for having a high centrality lies elsewhere… Maybe the centrality of a country is more determined by its role in the socioeconomic scene. For example, the United Kingdom has a population of only 60 million citizens, and yet presents the second highest centrality! But, if you look at the economy of the U.K., you can notice that it is the fifth largest exporter and importer, as well as one of the most globalized countries.
Note that the data were collected before Brexit.

Fortunately (or unfortunately ?), as we just saw, globalization holds a huge place in our world, and countries interact with each other through people, companies, and governments worldwide. As a result, the WikiCountries must also be connected to one another in a certain way. If one counts the number of times Wikipedia’s users link two countries, one can easily measure the strength of their connection.
So let’s get a sneak-peek at these connections for the 15 most central countries of the WikiWorld ! In the following figure, each node is a country and the node size corresponds to its overall importance. With the United States as the central hub in this graph, it seems that in our WikiWorld, the saying would rather be "all roads lead to Washington"...




Amongst the 15 most represented countries of the WikiWorld, we recognize some that take a predominant position on the international scene. The Big Five, i.e the permanent members of the United Nations Security Council , that is to say the United States, the United Kingdom, Russia, France and China are all present in this graph.
Furthermore, from an economical point of view, 12 of these countries are in the top 15 of countries possessing the biggest GDP, from the most recent data available from WorldBank, as of December 2022.
In addition, 6 out of the 7 countries of the G7 are in the top-15 central countries in Wikiworld. The Group of Seven (G7) is an intergovernmental political forum with members making up world's largest advanced economies that maintain mutually close political, economic, diplomatic, and military relations.
The fact that the predominant countries on the international scene are also leading countries in the WikiWorld underline the ability of Wikipedia to capture the political and economic organization of our world.

To sum it up, it seems that, as in the real world, the importance of countries in WikiWorld depends neither on the size nor on the population of these countries but rather on their politico-economic place in our current world. Also, don’t forget that the articles used for our study are extracted from the English Wikipedia. So, the centralities of the countries will be biased in the direction of English speaking countries.

Let’s zoom in !

What words are associated to each WikiCountry ?

We now have our WikiCountries, but so far they seem quite impersonal don’t you think ? Let's dive a little deeper into these countries and see what defines their identity within the WikiWorld. In other words, how can we describe the WikiCountries ?
To do so, one can derive a semantic distance (introduced here) that would represent how close two concepts are in terms of meaning. So let's see to which words users think when they look for the most central countries in the game.
Click on the flags of the countries to have a look at the associated concepts ! The larger the words, the more related they are to the corresponding country. Play around the Wikipedia articles to find a relation between the words and their country !


United States

United Kingdom

Norway

Germany

Mexico

Switzerland



If you have looked carefully at the words associated with the different countries you may have made some observations. If almost all the words associated with the countries can be rationally explained, do you really think that they transcribe the vision you have of these countries? For most of them, you probably would not have picked the same words. In fact, maybe you did not even know why some words were associated with some countries. So does Wikipedia really capture the identity of the countries? Or do you simply have a false perception of countries?

People

Who are the 100 most important people according to Wiki users ?

Now that the countries of WikiWorld are established, who will be living in this world ? It is time to populate the WikiWorld ! Here, we selected the 100 most important people according to Wiki users as a sampling representation of the proud inhabitants of the WikiWorld.

What are their occupations, ages, genders ? We collected information on these 100 people in order to give you an overview of the population of the WikiWorld. You can also take a sneak peak into the three most famous WikiPeople according to Wiki users !


Age

How old are they ?

Continents

Where do they come from ?

Countries

What is their nationality ?

Gender

What are their gender ?

Language

What do they speak ?

Occupation

What are their professions ?

Religions

What are their religions ?

TOP 3

Who are the three most famous ?



Let’s take a step back from Wikiworld and have a critical look at the results we just obtained. If the geopolitical organization of the Wikiworld seems to transcribe the true geopolitical organization of our real world, it is far from being the case for its inhabitants. There is a clear lack of parity and diversity in general. If one compares with the World population, one for example observes that the most represented countries and continents in terms of inhabitants in WikiWorld (i.e. Europa, the United States and United Kingdom) are far from being the most populated ones in the real world. On the other hand, China or India, the two largest countries in terms of population, are not represented. Similar comments can be made on the other characteristics we observed: in our real world, parity is achieved, the population is rather young, English is not the native language of the majority, and the occupations are more diverse. All these observations lead to the same conclusion: the 100 most famous people on Wikipedia, who were taken as representatives of our WikiWorld, are far from being a reliable representation of the world population.

If such an observation could have been inferred, as the nationality and language of the most famous people are for instance highly biased by the fact that we are working on the english version of Wikipedia, this lack of diversity is still questionable. What does it say about our current society? Why are the most famous people not in the image of the true population of the world? Maybe this is why they are famous, because they are out of the norm… Or maybe there exists some kind of bias too. Either way, this raises the question of the influence that can have this lack of diversity in the most famous people on the population that sees them as role models. This debate is outside the scope of this datastory, but underlines the importance of such results and data.

Conclusion



By enabling us to observe cognitive and semantic links between the player’s notions of two concepts, the Wikispeedia dataset allows us to construct a representation of Wikispeedia users’ reality. The latter, called the WikiWorld in this datastory, exhibits both striking similarities and fundamental differences with our real world. As far as the countries and global geopolitical organization are concerned, their portrayal is somewhat accurate and faithful to the real geopolitical organization of the world. The most influential countries are also the most central in the Wikispeedia graph, and one can clearly extract from the Wikispeedia’s users’ games useful insights into our real world. But this is in fact the only aspect faithfully transcribed by the representation of the Wikispeedia users’ reality we constructed. Indeed, the images of the countries given by their semantically-close concepts are quite surprising and do not always gather the most important information on these countries. Similarly, if we consider the different people that are described in the Wikipedia subset we used (and in particular the 100 most famous ones), the observations are quite different from the real world population. From their nationality to their age, gender or religion, there is an obvious lack of diversity, and the people that are described in Wikipedia are far from conveying a correct representation of the diversity of our current world.

These differences raise many questions: What can this wrong (or at least not faithful) representation of the world tell us about our current society ? Since Wikipedia is nowadays the world’s largest and most-read reference work in history, what consequences can this incorrect transcription have on the real world? If our current world is supposed to shape Wikipedia, doesn't a world falsely described in Wikipedia also have an influence on the development of our real world ? Let’s illustrate with an example. The young generations often use Wikipedia as a way to learn and develop their knowledge of the world, when in school. But if the Wikipedia articles don’t represent parity, and if famous women of history are too often forgotten, the younger generation might grow up also unaware of these genius women. For example, Ada Lovelace is the pioneer of computer science and yet too many university students in STEM don’t even know her name…

Thus, it is fundamental that Wikipedia keeps growing toward improving knowledge about the diversity of our world, population and history, and educates the youngest generations in this direction (and also the older ones ;)).

As a last word, one should always keep in mind that the analyzed data come from the English Wikipedia pages, hence a bias in our results plays a role in the lack of diversity we observe without a doubt. It would thus be necessary to perform the same analysis with Wikipedia pages in other languages to confirm or refute the results we have.

Thank you very much for reading our datastory! We hope you enjoyed it as much as we enjoyed creating the WikiWorld. If you are interested or curious about the programming pipeline that led us here, you are very welcome to take a look at our Github repository.

Our Amazing Team



Flore Barde

Master Life Sciences

Manon Béchaz

Master Computational Sciences

Zoé Jeandupeux

Master Physics

Killian Rigaux

Master Physics

This website and datastory were carried out within the framework of the EPFL course Applied data analysis, given by Prof. Robert West. We would like to thank him, as well as all his teaching assistants, and in particular Akhil Arora, for their help and supervision during this project and course.