Wikipedia:Importing a database

From Wikipedia, the free encyclopedia

Wikipedia content should be sourced to reliable, independent sources, and each article should have some indication that the subject of the article is notable that comes from such sources. Databases can be one example of such a source. It can therefore be tempting to go through the entirety of a database, or a substantial part of it, creating an article corresponding to every single entry in the database. However, this can result in a number of issues:

  • Not every entry (or even any entry) in the database may actually be notable. For example, a database of athletes may contain both notable and non-notable athletes. Creating an article for each entry in a database may therefore create a large problem for other editors who will then have to check every single article to see which are and are not notable. The time taken to do this will be at least an order of magnitude more than the time taken to create the original articles, resulting in a net-negative impact on Wikipedia as a project. This is WP:DISRUPTIVE.
  • The database may include systematic errors or inaccuracies which are then trandferred into Wikipedia. Even if they do not, the terminology of the external database may not be a like-for-like match for that used on Wikipedia. For example, the GNIS database lists many places as "populated places" that are not, and have not ever, been populated settlements in the sense that we use the term on Wikipedia.
  • Certain intellectual property rights can reside in the content, formatting, selection etc. of a database that can be infringed by importing it into Wikipedia. Additionally, some databases require the users of them to agree to certain contractual terms that are breached by essentially making Wikipedia a substitute for that database. This can result in potential liability for Wikipedia or the editor who does the import.
  • The articles will essentially be single-source database entries. Wikipedia is not a database.

What is "importing"?[edit]

Copying the content of a database into Wikipedia, regardless of whether it is done by hand or using a bot, and regardless of whether the content is copied word-for-word or is re-written as prose or re-formatted.

What is a "substantial part"?[edit]

Creating Wikipedia articles corresponding to individual entries rarely creates a problem so long as they are not actually copied. It is the systematic creation of many (e.g., typically hundreds or more) articles, such that a large part of the information of an external database has been transferred to Wikipedia, that is problematic. This is particularly the case for commercial databases whose revenue will be impacted by people no longer needing to pay them for their content.

What to do?[edit]

  • Seek consensus first – Ask a relevant Wiki project or WP:RSN what they think about creating articles based on the database before starting so they can vet the source and suggest additional and alternative sources.
  • Read the small print – Does the database have terms of use that say you shouldn't use it to create a substitute for it? Then exercise caution when taking data from it.
  • Exercise judgement – Do not simply create an article for every entry in an external database, every article you create should be able to pass Wikipedia's inclusion criteria and this should not simply be presumed from them being on the database. Do not simply engage in bot-like editing.
  • Use a variety of reliable sources – Do not simply use the same single source for hundreds of articles. This can result in these articles having the same point of failure if a problem arises with that source or if Wikipedia's inclusion criteria change. If the subject of your article is truly notable then more than one reliable source will have given it significant coverage – find those sources and use them. Do not simply expect other editors to find them for you.
  • Write encyclopaedia articles, not database entries – An encyclopaedia article will tell you something about the subject e.g., their history) beyond simple statistical data and pro forma text with a few of the words changed.
  • Future-proof your articles – Writing large numbers of articles that just barely pass Wikipedia's inclusion criteria as they presently stand will result in them all potentially being deleted at a future date if Wikipedia's inclusion criteria become more strict, as they have tended to do over the history of the project.