edu.stanford.nlp.international.arabic.pipeline
Class DefaultLexicalMapper
java.lang.Object
edu.stanford.nlp.international.arabic.pipeline.DefaultLexicalMapper
- All Implemented Interfaces:
- Mapper, java.io.Serializable
public class DefaultLexicalMapper
- extends java.lang.Object
- implements Mapper, java.io.Serializable
Applies a default set of lexical transformations that have been empirically validated
in various Arabic tasks. This class automatically detects the input encoding and applies
the appropriate set of transformations.
- Author:
- Spence Green
- See Also:
- Serialized Form
Method Summary |
boolean |
canChangeEncoding(java.lang.String parent,
java.lang.String element)
Indicates whether child can be converted to another encoding. |
static void |
main(java.lang.String[] args)
|
java.lang.String |
map(java.lang.String parent,
java.lang.String element)
Maps from one string representation to another. |
void |
setup(java.io.File path,
java.lang.String... options)
Perform initialization prior to the first call to map . |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
latinPunc
public final java.util.regex.Pattern latinPunc
arabicPunc
public final java.util.regex.Pattern arabicPunc
arabicDigit
public final java.util.regex.Pattern arabicDigit
segmentationMarker
public final java.util.regex.Pattern segmentationMarker
DefaultLexicalMapper
public DefaultLexicalMapper()
map
public java.lang.String map(java.lang.String parent,
java.lang.String element)
- Description copied from interface:
Mapper
- Maps from one string representation to another.
- Specified by:
map
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- The transformed string
setup
public void setup(java.io.File path,
java.lang.String... options)
- Description copied from interface:
Mapper
- Perform initialization prior to the first call to
map
.
- Specified by:
setup
in interface Mapper
- Parameters:
path
- A filename for data on disk used during mappingoptions
- Variable length array of strings for options. Option format may
vary for the particular class instance.
canChangeEncoding
public boolean canChangeEncoding(java.lang.String parent,
java.lang.String element)
- Description copied from interface:
Mapper
- Indicates whether
child
can be converted to another encoding. In the ATB, for example,
if a punctuation character is labeled with the "PUNC" POS tag, then that character should not
be converted from Buckwalter to UTF-8.
- Specified by:
canChangeEncoding
in interface Mapper
- Parameters:
parent
- element
's context (e.g., the parent node in a parse tree)element
- The string to be transformed.
- Returns:
- True if the string encoding can be changed. False otherwise.
main
public static void main(java.lang.String[] args)
Stanford NLP Group