Linguistic fundamentals for natural language processing : 100 essentials from morphology and syntax /
| Main Author: | |
|---|---|
| Format: | eBook |
| Language: | English |
| Published: |
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :
Morgan & Claypool,
[2013]
|
| Series: | Synthesis digital library of engineering and computer science.
Synthesis lectures on human language technologies ; # 20. |
| Subjects: | |
| Online Access: | Connect to the full text of this electronic book |
Table of Contents:
- 1. Introduction/motivation
- #0 Knowing about linguistic structure is important for feature design and error analysis in NLP
- #1 Morphosyntax is the difference between a sentence and a bag of words
- #2 The morphosyntax of a language is the constraints that it places on how words can be combined both in form and in the resulting meaning
- #3 Languages use morphology and syntax to indicate who did what to whom, and make use of a range of strategies to do so
- #4 Languages can be classified 'genetically', areally, or typologically
- #5 There are approximately 7,000 known living languages distributed across language families
- #6 Incorporating information about linguistic structure and variation can make for more cross-linguistically portable NLP systems
- 10. Resources
- #98 Morphological analyzers map surface strings (words in standard orthography) to regularized strings of morphemes or morphological features
- #99 'Deep' syntactic parsers map surface strings (sentences) to semantic structures, including semantic dependencies
- #100 Typological databases summarize properties of languages at a high level
- Summary
- 2. Morphology: introduction
- #7 Morphemes are the smallest meaningful units of language, usually consisting of a sequence of phones paired with concrete meaning
- #8 The phones making up a morpheme don't have to be contiguous
- #9 The form of a morpheme doesn't have to consist of phones
- #10 The form of a morpheme can be null
- #11 Root morphemes convey core lexical meaning
- #12 Derivational affixes can change lexical meaning
- #13 Root+derivational affix combinations can have idiosyncratic meanings
- #14 Inflectional affixes add syntactically or semantically relevant features
- #15 Morphemes can be ambiguous and/or underspecified in their meaning
- #16 The notion 'word' can be contentious in many languages
- #17 Constraints on order operate differently between words than they do between morphemes
- #18 The distinction between words and morphemes is blurred by processes of language change
- #19 A clitic is a linguistic element which is syntactically independent but phonologically dependent
- #20 Languages vary in how many morphemes they have per word (on average and maximally)
- #21 Languages vary in whether they are primarily prefixing or suffixing in their morphology
- #22 Languages vary in how easy it is to find the boundaries between morphemes within a word
- 3. Morphophonology
- #23 The morphophonology of a language describes the way in which surface forms are related to underlying, abstract sequences of morphemes
- #24 The form of a morpheme (root or affix) can be sensitive to its phonological context
- #25 The form of a morpheme (root or affix) can be sensitive to its morphological context
- #26 Suppletive forms replace a stem+affix combination with a wholly different word
- #27 Alphabetic and syllabic writing systems tend to reflect some but not all phonological processes
- 4. Morphosyntax
- #28 The morphosyntax of a language describes how the morphemes in a word affect its combinatoric potential
- #29 Morphological features associated with verbs and adjectives (and sometimes nouns) can include information about tense, aspect and mood
- #30 Morphological features associated with nouns can contribute information about person, number and gender
- #31 Morphological features associated with nouns can contribute information about case
- #32 Negation can be marked morphologically
- #33 Evidentiality can be marked morphologically
- #34 Definiteness can be marked morphologically
- #35 Honorifics can be marked morphologically
- #36 Possessives can be marked morphologically
- #37 Yet more grammatical notions can be marked morphologically
- #38 When an inflectional category is marked on multiple elements of sentence or phrase, it is usually considered to belong to one element and to express agreement on the others
- #39 Verbs commonly agree in person/number/gender with one or more arguments
- #40 Determiners and adjectives commonly agree with nouns in number, gender and case
- #41 Agreement can be with a feature that is not overtly marked on the controller
- #42 Languages vary in which kinds of information they mark morphologically
- #43 Languages vary in how many distinctions they draw within each morphologically marked category
- 5. Syntax: introduction
- #44 Syntax places constraints on possible sentences
- #45 Syntax provides scaffolding for semantic composition
- #46 Constraints ruling out some strings as ungrammatical usually also constrain the range of possible semantic interpretations of other strings
- 6. Parts of speech
- #47 Parts of speech can be defined distributionally (in terms of morphology and syntax)
- #48 Parts of speech can also be defined functionally (but not metaphysically)
- #49 There is no one universal set of parts of speech, even among the major categories
- #50 Part of speech extends to phrasal constituents
- 7. Heads, arguments and adjuncts
- #51 Words within sentences form intermediate groupings called constituents
- #52 A syntactic head determines the internal structure and external distribution of the constituent it projects
- #53 Syntactic dependents can be classified as arguments and adjuncts
- #54 The number of semantic arguments provided for by a head is a fundamental lexical property
- #55 In many (perhaps all) languages, (some) arguments can be left unexpressed
- #56 Words from different parts of speech can serve as heads selecting arguments
- #57 Adjuncts are not required by heads and generally can iterate
- #58 Adjuncts are syntactically dependents but semantically introduce predicates with take the syntactic head as an argument
- #59 Obligatoriness can be used as a test to distinguish arguments from adjuncts
- #60 Entailment can be used as a test to distinguish arguments from adjuncts
- #61 Adjuncts can be single words, phrases, or clauses
- #62 Adjuncts can modify nominal constituents
- #63 Adjuncts can modify verbal constituents
- #64 Adjuncts can modify other types of constituents
- #65 Adjuncts express a wide range of meanings
- #66 The potential to be a modifier is inherent to the syntax of a constituent
- #67 Just about anything can be an argument, for some head
- 8. Argument types and grammatical functions
- #68 There is no agreed upon universal set of semantic roles, even for one language; nonetheless, arguments can be roughly categorized semantically
- #69 Arguments can also be categorized syntactically, though again there may not be universal syntactic argument types
- #70 A subject is the distinguished argument of a predicate and may be the only one to display certain grammatical properties
- #71 Arguments can generally be arranged in order of obliqueness
- #72 Clauses, finite or non-finite, open or closed, can also be arguments
- #73 Syntactic and semantic arguments aren't the same, though they often stand in regular relations to each other
- #74 For many applications, it is not the surface (syntactic) relations, but the deep (semantic) dependencies that matter
- #75 Lexical items map semantic roles to grammatical functions
- #76 Syntactic phenomena are sensitive to grammatical functions
- #77 Identifying the grammatical function of a constituent can help us understand its semantic role with respect to the head
- #78 Some languages identify grammatical functions primarily through word order
- #79 Some languages identify grammatical functions through agreement
- #80 Some languages identify grammatical functions through case marking
- #81 Marking of dependencies on heads is more common cross-linguistically than marking on dependents
- #82 Some morphosyntactic phenomena rearrange the lexical mapping
- 9. Mismatches between syntactic position and semantic roles
- #83 There are a variety of syntactic phenomena which obscure the relationship between syntactic and semantic arguments
- #84 Passive is a grammatical process which demotes the subject to oblique status, making room for the next most prominent argument to appear as the subject
- #85 Related constructions include anti-passives, impersonal passives, and middles
- #86 English dative shift also affects the mapping between syntactic and semantic arguments
- #87 Morphological causatives add an argument and change the expression of at least one other
- #88 Many (all?) languages have semantically empty words which serve as syntactic glue
- #89 Expletives are constituents that can fill syntactic argument positions that don't have any associated semantic role
- #90 Raising verbs provide a syntactic argument position with no (local) semantic role, and relate it to a syntactic argument position of another predicate
- #91 Control verbs provide a syntactic and semantic argument which is related to a syntactic argument position of another predicate
- #92 In complex predicate constructions the arguments of a clause are licensed by multiple predicates working together
- #93 Coordinated structures can lead to one-to-many and many-to-one dependency relations
- #94 Long-distance dependencies separate arguments/adjuncts from their associated heads
- #95 Some languages allow adnominal adjuncts to be separated from their head nouns
- #96 Many (all?) languages can drop arguments, but permissible argument drop varies by word class and by language
- #97 The referent of a dropped argument can be definite or indefinite, depending on the lexical item or construction licensing the argument drop
- A. Grams used in IGT
- Bibliography
- Author's biography
- General index
- Index of languages.