Talk:Word embedding

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Untitled[edit]

Re. "Most new word embedding techniques rely on a neural network architecture instead of more traditional n-gram models and unsupervised learning"--Can someone rewrite this to indicate whether it is "rely on (a neural network architecture) instead of (more traditional n-gram models and unsupervised learning)" or "rely on (a neural network architecture (instead of more traditional n-gram models) and unsupervised learning)"? Philgoetz (talk) 21:56, 21 February 2018 (UTC)[reply]


I am considering adding a section introducing basic approaches of word embedding. --Linzhuoli (talk) 15:29, 20 May 2016 (UTC)[reply]

I think this article would benefit from greater description of the different word embedding tools that exist and how they differ from each other. Perhaps it could be restructured such that word2vec has a subsection, GloVe has a subsection, etc. Akozlowski (talk) 22:20, 9 June 2016 (UTC)[reply]

I am thinking of adding more content to the thought vector section and adding some images to illustrate word embedding. Chinoyhardik (talk) 20:14, 5 October 2017 (UTC)[reply]


Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Linzhuoli. Peer reviewers: Akozlowski.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 05:04, 18 January 2022 (UTC)[reply]

intro to hi level[edit]

the intro is written at to high a level, and is not suitable for a general encylopedia; if you are using the word "vector" you are not writing correctly for a general audience thank you — Preceding unsigned comment added by 194.144.243.122 (talk) 12:51, 26 June 2019 (UTC)[reply]

Very unclear[edit]

This is not clear to someone with a basic understanding of neural networks, and of vectors. It is likely to be only comprehensible to people who already understand the concept. There are no links to more basic concepts which would lead up to it. Consider explaining:

  • You refer to a multidimensional space. What are the dimensions of this space? What does it mean to reduce the dimensions?
  • In some sense this method characterises words as being likely to be associated. How does this compare to something like a Markov chain?
  • Having established that some words are associated, how does this assist with natural language processing?

Scottwh (talk) 20:57, 2 July 2020 (UTC)[reply]

Vandalism by MrOllie[edit]

Unfortunately my comment on the talk page of MrOllie was instantly reverted so I post it here again:

I do not like your revert of https://en.m.wikipedia.org/w/index.php?title=Special:MobileDiff/1044853979&type=revision


it pushes domain experts to edit Wikipedia anonymously as the edit would certainly not be reverted if it was not performed anonymously as it was an improvement to Wikipedia. I think reverting should be done with caution and the article should always be improved rather than made worse just to demonstrate personal power.


Unfortunately I feel that I can not disclose my name as the enforcement of the friendly space policy seems more difficult in practice than in theory. 89.204.137.241 (talk)---

Ethical Implications Section Encourages Bias[edit]

Embeddings are statistical properties or mathematical representations of word or sentences in the data set. As such they are as "biased" as the data set, no more, no less.

The Ethical Implications section implies that there is some external criteria to evaluate the validity of these statistical properties. Any such judgment implies knowledge of a moral framework superior to that used by those writing in the data set. This gets into the realm of morals, subjective values, and religion that does not belong here.

The only valid ethical criticism can be that the data set used for training represents the writings of only a narrow group of writers.

Hence, the Ethical Implications section supports enabling someone's biased, egocentric judgment of other's writing in the data used for learning embeddings and does not belong here. This is not an article for expressing moral superiority. This is statistics. DynamicBoldC (talk) 20:08, 1 April 2023 (UTC)[reply]

Wiki Education assignment: Linguistics in the Digital Age[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 21 August 2023 and 11 December 2023. Further details are available on the course page. Student editor(s): Konman987, Afonseca12 (article contribs).

— Assignment last updated by Fedfed2 (talk) 00:53, 9 December 2023 (UTC)[reply]