• No results found

The computation of structure features is based on an article’s wiki markup. For a description of the wiki markup syntax and the general article layout, refer to the respective MediaWiki help page.6 This section describes the structure features listed in Table 4.3.

6MediaWiki help page “Help:Wiki markup”:

Table B.1: Words and phrases used to compute the two closed-class word set features Peacock word rate and Weasel word rate.

Feature Words / phrases Peacock

word rate

acclaimed, amazing, astonishing, authoritative, beautiful, best, brilliant, canonical, celebrated, charismatic, classic, cutting-edge, defining, definitive, eminent, enigma, exciting, extraordinary, fabulous, famous, infamous, fan- tastic, fully, genius, global, great, greatest, iconic, immensely, impactful, incendiary, indisputable, influential, innovative, inspired, intriguing, leader, leading, legendary, major, masterly, mature, memorable, notable, out- standing, pioneer, popular, prestigious, really good, remarkable, renowned, respected, seminal, significant, skillful, solution, single-handedly, staunch, talented, the most, top, transcendent, undoubtedly, unique, visionary, virtually, virtuoso, well-known, well-established, world-class, worst Weasel

word rate

about, adequate, and/or, appropriate, approximately, are a number, as applicable, as circumstances dictate, as much as possible, as needed, as required, as soon as possible, at your earliest convenience, basically, clearly, completely, critics say, depending on, exceedingly, excellent, experts declare, extremely, fairly, few, frequently, good, huge, if appropriate, if required, if warranted, is a number, in a timely manner, in general, in most cases, in our opinion, in some cases, in most instances, indicated, interestingly, it is believed, it is often reported, it is our understanding, it is widely thought, it may, it was proven, largely, major, make an effort to, many, many are of the opinion, many people think, maybe, more or less, most feel, mostly, normally, often, on occasion, perhaps, primary, quite, relatively, relevant, remarkably, research has shown, roughly, science says, significantly, several, should be, some people say, sometimes, striving for, substantially, suitable, surprisingly, tentatively, tiny, try, typically, usually, valid, various, vast, very, we intend to, when necessary, when possible

• File count. Number of files including images and other media files, identi- fied by file links: [[file:...]].

• Category count. Number of Wikipedia categories an article belongs to, identified by category links: [[category:...]].

• Heading count. Total number of headings, including section, subsection, and subsubsection headings.

• Image count. Number of images, identified by image links: [[image:...]]. • Images per section. Ratio between the image count and the section count. • Infobox count. Number of infoboxes. Infoboxes are fixed-format tables

used to summarize relevant information in a unified and structured manner (typically in the top right-hand corner of an article).

• Lead length. Number of words in the lead section. (Dalip et al. [46] use the character count instead of word count.) A lead section is defined as the text before the first heading.

• Lead rate. Percentage of words in the lead section.

• List ratio. Percentage of words in lists. A list can be either an itemiza- tion, an enumeration, or a definition; identified by lines starting with an asterisk (*), a number sign (#), or a semicolon (;) respectively.

• Reference count. Number of references and citations used in an article, identified by the tags: <ref>...</ref>.

• Reference sections count. Number of reference sections, identified by the section heading. We use the headings listed in Table B.2.

• References per section. Ratio between the reverence count and the section count.

• Reference per text length. Ratio between the reference count and the word count. (Dalip et al. [46] use the character count instead of the word count.) • Section count, subsection count, subsubsection count. Number of sections,

subsections, and subsubsections.

• Section length, subsection length, subsubsection length. Average section, subsection, and subsubsection length in words. (Dalip et al. [46] use the character count to compute the mean section size.)

• Section length deviation. Standard deviation of section length.

• Section nesting, subsection nesting. Average number of subsections per section and average number of subsubsections per subsection.

• Shortest section length, shortest subsection length, shortest subsubsection length. Number of words in the shortest section, subsection, and subsub- section.

• Longest section length, longest subsection length, longest subsubsection length. Number of words in the longest section, subsection, and subsub- section.

• Table count. Number of tables, identified by: {|...|}.

• Template count. Number of templates that are used in an article.

• Trivia sections count. Number of trivia sections, identified by the section heading. We use the following headings: “facts”, “miscellanea”, “other facts”, “other information”, and “trivia”.

Table B.2: Common headings of reference sections in English Wikipedia articles, used to compute the Reference sections count feature.

“references”, “notes”, “footnotes”, “sources”, “citations”, “bibliography”, “works cited”, “external references”, “reference notes”, “references cited”, “bibliographical references”, “cited references”, “notes, references”, “sources, references, external links”, “sources, ref- erences, external links, quotations”, “notes & references”, “references & notes”, “external links & references”, “references & external links”, “references & footnotes”, “footnotes & references”, “citations & notes”, “notes & sources”, “sources & notes”, “notes & citations”, “footnotes & citations”, “citations & footnotes”, “reference & notes”, “footnotes & sources”, “note & references”, “notes & reference”, “sources & footnotes”, “notes & external links”, “references & further reading”, “sources & references”, “references & sources”, “references & links”, “links & references”, “references & bibliography”, “references & resources”, “bibliography & references”, “external articles & references”, “references & citations”, “citations & references”, “references & external link”, “external link & references”, “further reading & references”, “notes, sources & references”, “sources, references & external links”, “references/notes”, “notes/references”, “notes/further reading”, “references/links”, “external links/references”, “references/external links”, “references/sources”, “external links / references”, “references / sources”, “references / external links”