File:Zipf-euro-3 Spanish (Don Quixote) and Portuguese (Dom Casmurro).svg

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
From Wikipedia, the free encyclopedia

Original file(SVG file, nominally 512 × 504 pixels, file size: 1.34 MB)

Summary

Description
English: Zipf law plot (frequency as function of frequency rank) for texts in two languages Spanish and Portuguese.

The languages, texts and the frequency files are:

  • Spanish. Text of Miguel de Cervantes's novel Don Quixote - Part I (1605). . In original spelling of early 1600s, including variable use of 'v', 'u', and 'b' for the same sound. Mapped to lowercase, excluding foreign language insertions and poems. Sample: en vn lugar de la mancha de cuyo nombre no quiero acordarme no ha mucho [...] pariente suyo fuera de que. File span/qvi/one.1/gud.wfr (original 177061 words, truncated/filtered to 35027 words, N = 5452 distinct).
  • Portuguese. Text of the novel Dom Casmurro by Machado de Assis (1899). The spelling was updated to Brazilian usage as of ~2000, incuing umlaut on 'u' after 'q', accent in 'éia' endings, differential accents 'tem'/'têm', etc. Mapped to lowercase, with numerals excluded. Whole text. Sample: uma noite destas vindo da cidade para o engenho novo encontrei no trem [...] josé dias gostaram do moço o agregado disse~lhe que vira uma vez. File port/csm/tot.1/gud.wfr (original 64602 words, truncated/filtered to 35027 words, N = 6267 distinct).
The word frequency files '*/*/*/gud.wfr' are available at the UNICAMP website. The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src. The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.
Date
Source Own work
Author Jorge Stolfi

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

Ziplf Law plot for Spanish (Don Quixote) and Portuguese (Dom Casmurro)

Items portrayed in this file

depicts

9 May 2023

image/svg+xml

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current19:50, 15 May 2023Thumbnail for version as of 19:50, 15 May 2023512 × 504 (1.34 MB)Jorge StolfiRebuilt the file with small changes in dataset, colors
22:32, 9 May 2023Thumbnail for version as of 22:32, 9 May 2023512 × 504 (1.34 MB)Jorge StolfiReduced Spanish text sample size from ~70'000 to ~35'000 words
14:45, 9 May 2023Thumbnail for version as of 14:45, 9 May 2023512 × 504 (1.15 MB)Jorge StolfiUploaded own work with UploadWizard
The following pages on the English Wikipedia use this file (pages on other projects are not listed):

Metadata