VasileRus
DepartmentofComputerScienceandEngineering
SouthernMethodistUniversityDallas,TX75275-0122vasile@engr.smu.eduAbstract
Thispaperpresentsfewextensionstothelogicformrep-resentationandamethodfortransformingWordNetglossesintologicformsusingasetofhigh-precisionrulescombinedwithasetofhighrecallheuristics.Analmost3%increaseinPOStaggingaccuracyisachievedoverstate-of-theartresultsattheexpenseofuserinterventionononly7.52%ofwords.Weapplyanearestneighborsolutiontoparserswitchingthatleadsto6.43%increaseinexactsentenceac-curacyforglosses.LogicFormsarederivedwithanaccuracyof89.46%.
OurworkofparsingWordNetglossesresembleseffortstoextractlexicalinformationfrommachinereadabledictio-naries(MRD),asLDOCE(LongmanDictionaryofContem-poraryEnglish)orWebster’s2ndInternationalDictionary(W2).Differentparsingmethodsofthedefinitionswereused:pattern-matching[2],speciallyconstructeddefinitionparsers[14]orbroadcoverageparsers[13][8].Allthoseeffortswerelimitedtoextractinggenusterms,unlabeledorlabeledrela-tionsortobuildtaxonomies[8].WeparseWordNetglossestogeneratelogicrepresentationsthatenablereasoningmech-anisms.
1.Introduction
Itiswellunderstoodandagreedthatworldknowledgeisnecessaryformanycommonsensereasoningproblems.Considerquestion471fromTREC-QA:
2.LogicFormTransformationforWordNet
glosses
TheglossesarefirstPOStaggedusingavotingschemeandsyntacticallyparsedusinganin-houseimplementationofMichaelCollins’sstatisticalparser[3].
Thenextstepistotransformtheglossesintoamoreab-stractlogicalrepresentation.Thelogicformisanintermedi-arystepbetweensyntacticparseandthedeepsemanticform-whichwedonotaddresshere.
TheLFcodificationacknowledgessyntax-basedrelation-shipssuchas:(1)syntacticsubjects,(2)syntacticobjects,(3)prepositionalattachments,(4)complexnominals,and(5)adjectival/adverbialadjuncts.
OurapproachistoderivetheLFdirectlyfromtheoutputofthesyntacticparser.Theparserresolvesthestructuralandsyntacticambiguities.Thisway,weavoidtheveryhardprob-lemsoflogicrepresentationofnaturallanguage.WefollowcloselythesuccessfulrepresentationusedbyHobbsinTAC-ITUS[7]andextendedbyHarabagiu,MillerandMoldovanin[5]withsemanticandsyntacticfeatures.Hobbsexplainsthatformanylinguisticapplicationsitisacceptabletorelaxontologicalscruples,intricatesyntacticexplanations,andthedesireforefficientdeductionsinfavorofasimplernotationclosertoEnglish.
ForthelogicrepresentationofWordNetglossesweig-nore:pluralsandsets,verbtenses,auxiliaryverbs,quantifiers
Theanswertothisquestionis“Hitlercommittedsuicidein1945”.Tojustifythatthisisaplausibleansweroneneedsex-traknowledge.FromWordNetglossofsuicide:n#1(firstsenseofsuicide)wehavekillingyourselfandfromthefirstsenseofverbkill:v#1wehavecausetodiewhichwouldprovidewithajustification.However,toautomaticallyobtainajustificationtheinitialquestion,answerandWord-Netglossesneedtobetransformedintoacomputationalrep-resentationandautomatedinferenceprocedureshavetobedefined.ThecomputationalrepresentationthatwechosetorepresentWordNetglossesisthelogicform[11].
Thispaperpresentsfewextensionstothelogicformrep-resentationdescribedin[11].Theapproachusedistofirstparsesyntacticallytheglosses,afterwhichlogicformtrans-formationsarederivedfromgrammarrulesobtainedfromparsetrees.SincetheautomaticLogicFormTransformation(LFT)foropentextrequiresanexceedinglylargenumberofrules,wedevelopedaproceduretoovercomethischallenge.
Gloss
(astructuretallerthanitsdiameter)
workstation
conventionallyconsideredtobemore
desktop
µcomputer:n(
powerful(
)
,
)
Synset
achromatic
(acompound
formsanimagefreefrom
aberration)
(asemiconductordevicecapableofamplification)
LFnn(
)&compound
from(
)&
aberration:n()
device:n()&capable
Table2.Examplesofpostmodifierpredicates
andmodaloperatorsandnegation.Thisdecisionisbasedonourdesiretoprovidemanageableandconsistentlogicrepre-sentationthatotherwisewouldbeunfeasible.
3.LFDefinitions
Predicates
Apredicateisgeneratedforeverynoun,verb,ad-jectiveoradverbencounteredinanygloss.Thenameofthepredicateisaconcatenationofthemor-pheme’sbaseform,thepart-of-speechandtheWord-Netsemanticsense(notincludedhere),thuscapturingthefulllexicalandsemanticdisambiguation.Forexam-ple,theLFoftheglossofstudent,pupil,ed-ucatee,(alearnerwhoisenrolledinaneducationalinstitution),willcontainthepredi-cateslearner:n,enroll:vandeducational
lens:(acompound
lenssystemthatformsanimagefreefromchromaticaberration).
Fromasemanticpointofviewfreeisamodifierofim-age.Troublecomeswhenonetriestoprovidearepresen-tationforprepositionfrom.Whatphraseshouldstandasitsprepositionalhead?Onemightsayitshouldbemodifierfree.Butfreedoesnothaveanargumentonitsownsincebeing
arc
light
Gloss
(alargebuildingatanairportwhereaircraftcanbestoredandmaintained)
(produceslightwhenelectriccurrent
when(
,
)&electric
electrodes)
Table3.Examplesofrelativeadverbpredicates
Preprocessing- definition extraction-tokenizationand cleaning- definition expansion
POStagging
Parsing
Logic FormTransformer
Figure1.LFTransformer-architectureoverview
amodifieritborrowstheargumentfromitsmodifeeimage.Wecouldconsiderimageasprepositionalheadoffrombutthiswouldbewrongasthereisnosucharelationasprepo-sitionalhead-prepositionalobjectbetweenimageandchro-matic
fromandtreatitasarelationalpredicate,somehowsimilartoapreposition,wemayrepresenttheaboveexam-plelikethis:image:n()&free
aberration:n().Thissolutionmaintainsthesim-plicityandconsistencyofthenotationandistheclosestonetothesemanticinterpretation.
PossesivePronouns
Possesivepronounsintroducearelationshipbetweentheheadtheystandbyandthereferentofthepronoun.Intheglossoftowerthepronounitsintroducesapossesionre-lationbetweenstructureanddiameter.Toremedythelackofspecificityoftheoriginalnotationregardingthiscaseweproposetorepresenttherelationusingthepredicatepos.Thus,forthepreviousexample,weobtainstructure:n()&pos(,)&diameter:n().Allpossessivepronounsarerepresentedusingthispredicate.
anairportwhereaircraftcanbestoredandmaintained).
Therelationbetweenthetwosentencesisembeddedinadverbialphrasewhereandthusthecorrespondinglogicformrepresentationwouldbelarge:a()&building:n()&where(,)&aircraft:n()&and(,,)&store:v(,,)&maintain:v(,,).Thislogicrepresentationissamewiththelogicformrepresentationofimaginarysentenceaircraftcanbestoredandmaintainedinalargebuildingatanairport.Oneneedsonlytheextraknowledgeofequalinginwithwheretoobtainthesamerep-resentationfortheexemplifiedgloss,respectivelyimaginarysentence.Thissimilarityillustratestheabstractnesspoweroflogicforms:structurallydifferentnaturallanguagerepresen-tationaremappedintosamelogicformrepresentation.
4.LogicFormTransformer
OurapproachforderivingtheLFofglossesreliesonstructuralrelationsprovidedbythesyntacticparser.Severalpreprocessingstepsarenecessarybeforeparsing.
Thesystemcomprisesthefollowingmodules(seeFigure3):preprocessing,POStagging,parsing,ruleselectionandlogicformtransformation(LFT).
RelativeAdverbs
Relativeadverbsofthekindwhere,when,how,whywhenintroducingarelativeclauseshouldberepresentedasapred-icatewithtwoargumentsreferringtotheargumentoftherelativeclauseorphrase,respectivelyargumentofthemainclauseorphrase.
Weillustratesuchacaseusingthedefinitionofairdockwhichis(alargebuildingat
4.1.Preprocessing
Thepreprocessingextractsdefinitionsfromglosses,dis-cardscommentsfromdefinitions,andappliesseveraltextma-nipulationssuchas:replacesasinwithin,alsoaswithas,
onlywhenwithwhen(sincetheyonlycomplicatetheparsingprocessandLFTs),eliminateswordsasusually,especially
VotesOriginal
87.73
2
94.39
4
96.70
Recall91.29
91.65
94.90
94.18
90.96
91.26
ExactSentence65.6767.4563.85
off.Thosemanipulations
formtheso-calledcleaningphase.
4.2.POSTagging
ThePOStaggingmodulePOStagsglossesusingBrill’srulebasedtagger[1],MXPOSTstatisticaltagger[12],andWordNetsyntacticcategoryinformation.Weuseavotingschemethatisbasedontheoutputofthetwotaggers.VotingtoimprovePOStaggingaccuracywasalsousedby[10].Ifthetwotaggersagreeweassessthattagasbeingcorrect.Forwordsthathavedifferenttagsonemoreattemptismadetoautomaticallydecidethetag:ifBrill’stagandword’sWord-Netsyntacticcategoryaresimilar,thenwepickthistag.Ifthisfails,thenthetagisselectedmanually.
Usingthisapproach,wesuccessfullytagged91.57%wordsin1,000glossesfromnoun.artifacthierarchywithanaccuracyof98.50%.WordNetcoarsesyntacticcategoriesadd0.91%agreementwithalmost100%accuracy.Theuserneedstocheck7.52%ofthetagsand,supposedlyshedoesaperfectjob,anoverallaccuracyof98.93%isachieved.Com-paredwiththemeasuredaccuracyofthetaggersemployedonglossesof96%wehaveanimprovementinaccuracyofalmost3%.
4.3.SyntacticParsing
Theparsingmoduleisanin-houseimplementationofaparserwhichfollowsthestatisticalparsingprinciplesde-scribedin[3].Theparserisbasedontheprobabilitiesbe-tweenhead-wordsinparsetrees.Alloverthissectionweusetheseperformancemeasuresforparsing:P-precisionwhichisnumberofcorrectconstituentsretrieveddividedbynumberofallconstituentsretrieved,R-recallornumberofcorrectconstituentsretrieveddividedbynumberofallcorrectconstituents,F-measurewhichis
CaseOriginal
89.59
UpperLimit
92.87
Recall91.29
88.53
95.49
92.17
ExactSentence65.6779.03
Precision92.47
Random
93.46
BestPR
91.2991.33
F-measure91.87
63.94
93.65
63.42
(LFT)togenerateitscorrespondinglogicform.Table10illustratestwotypesofrulesthatweuse:intra-phraseandinter-phraserules.InordertoevaluatetheamountofworktobedoneweestimatehowmanyLFTsareneeded.Table7showsthenumberofdistinctgrammarrulesextractedfromparsetreesoftheglossesinentireWordNet.Thetotalnum-Partofspeechnounverb
adjectivesadverbs
5,3929,826
Table7.SizeofgrammarforWordNetglossesperpartofspeech
berofnearly10,000rules(seeTable7)isbyfartoolargetopossiblyimplementLFTsforallofthem(thebestcasewouldbewhenthereistotaloverlapamongthefourpartsofspeehandeachgrammarrulewouldmapintoasingleLFT).Toovercomethisproblemwedevelopedaprocedureintwosteps:
applytransformationatPOSlevelandparsetreestore-ducethenumberofcandidategrammarrules
selectmostfrequentrulesanddesignhigh-precisionLFtransformationrulesforthem.
Beforewedetaileachstepweofferarationaleforourscheme.ForthecaseofWordNetglossesalthoughtheto-talnumberofgrammarrulesislarge,asmallnumberofrulescoveralargepercentageofalloccurrences.Thismightbeex-plainedbytherelativestructuraluniformityofglosses:genusanddifferentia.Thisrelativeuniformityismorespecificfornounsandverbglosses.Table8showsthedistributionofthemostcommonphrases(identifiedbytheirnonterminal:S,NP,VP,etc.)for10,000randomlyselectednounglosses,thenumberofuniquegrammarruleshavingthatnontermi-nalastheirlefthandsideandthepercentageoftoptenmostfrequentrulesoutofthealloccurrencesofruleswithsamenonterminalonthelefthandside.Fromthetableweobservethatthetoptenmostfrequentrulescovermorethan90%ofalloccurrencesformostphrases.
5.1.TransformationatPOSlevelandparsetrees
BaseNPsandVPshaveacoveragearound70%whichcallsforimprovements.Oursolutiontoboostupthecoverageabove90%consistsofperformingasetoftagandparsetree
PhraseUniquerulesbaseNP857NP244VP450PP40S35
NPDTJJNNNNSNNPNNPS
NPDTVBNNNNNSNNPNNPSNP
JJNN
Table9.Examplesofruleshavingprenomi-nalmodifiersbelongingtodifferentsyntacticcategoriesNPsandplurals,determinersandpropernountagstreatedsimilarly
transformationstoreducethenumberofcandidaterules.Twobasictechniquesareused:(1)tagreductionand(2)transfor-mationsofparsetrees.
Tagreductionisallowedduetosimplificationsinnota-tion:(1)determinersareeliminated,(2)pluralsareignoredandthuswecanreplaceNNSwithNN,(3)propernounsaretreatedidenticallyascommonnounsandinconsequenceNNPischangedintoNNand(4)everythinginaprenomi-nalpositionplaysthefunctionofamodifier.ExamplesofrulereductionduetotagreductionareillustratedinTable9.Forverbsweignoretenses:VBG,VBP,VBZ,VBN,VBareallmappedintoVB.Keepingthepassiveinformationisim-portantforsyntacticroledetectionandthusweaddanewtagVP-PASStoindicatethattheheadoftheVPispassive.Modalsandauxiliariesareeliminatedandnegationsareig-nored.
Thesecondtechniqueconsistsofrearrangingtheparsetreessothatmorecomplexstructuresarereducedtosimplerones:morecomplexbaseNPsarerearrangedintosimplerones(seeFigure7).
NP
NP
NP
CCNP
DTNN
CCNN
DTNN
NNa
ruler
orinstitutionaruler
or
institution
Figure2.TransformingacoordinatedNPinanonbaseNPandtwoprimitivebaseNPs
TransformationRule(LFT)
NPNP
Synset
(NP(a/DTmonastery/NN))(NP(a/DTshort/JJsleep/NN))
DTNNDTJJNN
TransformationRule(LFT)
PP
INNP
verb/VP-PASSby/PP(verb(e,x,
,x)&noun(x)
(VP(ruled/VBNby/PP))
)
Synsetabbey:n#3
NP
NPVPabbey:n#3
Table10.ExamplesofLFTs
AsaconsequenceofthosetransformationsthecoverageoftoptenmostfrequentrulesforbaseNPs,respectivelyverbphrasesjumpsover90%.
6.ResultsforLFTransformation
Tovalidateourmethodwehaveexperimenteditonpre-viouslydescribedsetof1,140definitionsfromnoun.artifacthierarchy.Thelogicformsforthosedefinitionsweremanu-allyobtainedasourreferencedata.
TheinitialsetofLFTsisformedbytakingthemostfre-quentrulesforeachgrammarphrasedetectedinacorpusof10,000nounglossesrandomlyselectedfromthenoundatafileofWordNet1.6.Wetagandparsethem.Fromtheparsetreesobtainedweextractedallgrammarrulesandtheirnum-berofoccurrencesandsortedthemaccordingtotheirfre-quency.Then,weselectedthetoptenmostfrequentrulesoruptothepointwheretherulesfrequencyislessthan1%outofoccurrencesofallgrammarruleswithsamelefthandsidetag.Asetofabout70mostfrequentlyusedruleswasob-tainedandtheircorrespondingLFTshavebeenimplemented.WeappliedtheLFTstothetestsetof1000glossesandcomparedtheoutputtothemanuallybuiltlogicforms.ThemeasurethatweuseisexactLFaccuracy:numberofcor-rectlygeneratedglossesinlogicformovernumberofallglossesattempted.AnalternativemeasurewouldbeLFpred-icateaccuracy:numberofcorrectlygeneratedpredicatesoverthenumberofallpredicates.Apredicateiscorrectlygeneratedifallitsargumentsarecorrectlyassigned.Thissecondarymeasureisnotsuitablewhenonewantstofurtherconsidertheglossesasaxiomsasitdoesnottellthenumberofcorrectlygeneratedaxioms(see[11]).
Forourtestdata,whicharecorrectlyparsed,anexactLFaccuracyof83.59%hasbeenobtainedusingLFTsderivedfromtheselectedrules.TheheuristicsboosttheLFaccu-racyto89.46%.Werepeatedsameexperimentsusingtheoutputfromnearestneighborparserswitching.Theaccuracydroppedto61.53%withoutheuristics,respectively,66.31%withthem.Inordertodecreasethegapbetweenthetwomea-suresweplantoworkmoreonimprovingparseraccuracy.Weplantouseparsercombinationtechniquetodetectpossi-bleerrorsandaskuser’sinterventionwhennecessary.
5.2.Ruleselection
InthisstepwederiveLFTsfortoptenmostfrequentgrammarrulesforeachgrammarphrase.Rulesthathaveacoverageoflessthan1%arenotpickedeveniftheyareinthetopten.TheselectedrulesmightbemappedintooneormorelogicformtransformationsasillustratedinTable10.
5.3.Heuristics
Theadvantageoftheproposedapproachisthattheimple-mentedrulesarehighlyaccurateandwheneveranargumentisassignedonecantellwithhighprecisionthatistherightone.Amissingargumentindicatesthatcasewasnotcov-eredbytheimplementedrules.Inotherwordsthisapproachprefershighprecisionrulesoverhighrecallheuristics.Foruncoveredcaseswehavetwochoices:eithertheusermaymanuallyinterveneandfilltheargumentorasetofheuris-ticscanbedesignedtodothejob.Weoptedtodesignasetofheuristicstosolveuncoveredcases.Theheuristicswillassurethateveryargumentslotwillbefilledbutitcannotassurethatwouldbetherightoneallthetime.
Aheuristicwasdesignedforeachtypeofargument:sub-ject-previousphraseheadargumentorifverbisinpassivetheprepositionalobjectargumentofthefollowingbypreposition;directobject-firstfollowingphraseheadargument(orsecondforditransitives)orsurfacesub-jectiftheverbisinpassive;Indirectobject-sec-ondfollowingnounphraseheadargument(orfirst)forditransitives;prepositionalhead-previousphraseheadargument;prepositionalargument-followingphraseheadargument;adjective/adverbs-follow-ingnoun/verbphraseheadargumentorpreviousnoun/verbphraseargumentifthereisnonefollowing;default-gen-erateanewargumentthatdoesnotexist.
7.AnexampleToillustratestepbystepthederivationprocessweconsidertheglossofbadmintonracket,bad-mintonracquet,battledore:
(alightlong-handledracketusedbybadmintonplayers).
Thepreprocessingphaseoutputsthedefinitionalightlong-handledracketusedbybadmintonplayersandtok-enizesit.Therearenocommentstobedroppedoranyclean-ingnecessary.
ThenBrill’stagger,respectivelyMXPOSTarerun,andtheoutputsare:
Brill’s:a/DTlight/NNlong-handled/JJracket/NNused/VBNby/INbadminton/NNplayers/NNS
MXPOST:a
JJlong-handledNNusedINbadmintonNNS
Thetwotaggersdisagreeforlight,respectivelybadminton.ThelatterisautomaticallycorrectedusingWordNetlex-icalinformation,whiletheformerissolvedbyuser’sintervention.
Thedefinitionisexpandedinto:NNBadminton-racketVBZisDTaJJlightJJlong-handledNNracketVBNusedINbyNNbadmintonNNSplayers..andparsed.Theparsetreeisprocessedinabottomupfashion:fromleavesuptothetop.Ateachlevelthegrammarruleisextracted(parentnonterminal-listofchildrentags)andthenthecorrespond-ingLFTorheuristicistriggered.Whenthetopisreachedthelogicformisprinted:light()&long-handled()&racket()&use()&nn()&badminton()&player().
S
S NP VP-PASSNPNPJJ JJ NN
VP-PASSVP VB PP-byPP-byPP IN NPNP
NP NN NNDT
JJJJNNVBNINNNNNSa
lightlong-handledracketusedbybadmintonplayersFigure3.Theparsetreeforbadmintonracket,badmintonracquet,battle-doreandthetriggeredLFTsateachlevel.
8.Conclusions
WepresentedhereaproceduretotransformWordNetglossesintologicforms.Ourprocedurecombinesasetofhighprecisionruleswithasetofhighrecallheuristics.Thenotationusedisfirstorderlogicandcontainssyntacticinfor-mationaspositionalarguments.ImprovementsatPOStag-gingandsyntacticparsingarereported.AnLFaccuracyof89.46%on1000WordNetglosseswasobtained.
References
[1]E.Brill.Asimplerule-basedpartofspeechtagger.InPro-ceedingsoftheThirdConferenceonAppliedNaturalLan-guageProcessing,pages152–155,1992.
[2]M.Chodorow,R.Byrd,andG.Heidorn.Extractingseman-tichierarchiesfromalargeon-linedictionary.InProceedingsofthe23rdAnnualMeetingoftheAssociationforComputa-tionalLinguistics,pages299–304,1985.
[3]M.Collins.Threegenerative,lexicalisedmodelsforstatis-ticalparsing.InProceedingsofthe35thAnnualMeetingoftheAssociationforComputationalLinguistic,Madrid,Spain,1997.
[4]D.Davidson.Thelogicalformofactionsentences.In
N.Rescher,editor,TheLogicofDecisionandAction,pages81–95.UniversityofPittsburghPress,1967.
[5]S.M.Harabagiu,A.G.Miller,andD.I.Moldovan.WordNet
2-aMorphologicallyandSemanticallyEnhancedResource.InProceedingsofSIGLEX-99,pages1–8,UniversityofMary-land,June1999.
[6]J.HendersonandE.Brill.BaggingandBoostingaTreebank
Parser.InProceedingsofNAACL2000,Seattle,WA,2000.[7]J.R.Hobbs.OverviewoftheTACITUSproject.Computa-tionalLinquistics,12(3),1986.
[8]ISI.http://www.isi.edu/natural-language/dpp/.1998.
[9]M.Marcus,B.Santorini,andMarcinkiewicz.Buildingalarge
annotatedcorpusofEnglish:thePennTreebank.Computa-tionalLinguistic,19(2):313–330,1993.
[10]R.MihalceaandD.I.Moldovan.eXtendedWordnet:progress
report.InNAACL2001-WorkshoponWordNetandOtherLexicalResources,Pittsburgh,PA,2001.
[11]D.I.MoldovanandV.Rus.LogicFormtransformationof
WordNetanditsApplicabilitytoQuestionAnswering.InProceedingsofACL2001,Toulouse,France,6-11July2001.AssociationforComputationalLinguistics.Toappear.
[12]A.Ratnaparkhi.Amaximumentropypart-of-speechtagger.
InInProceedingsoftheEmpiricalMethodsinNaturalLan-guageProcessingConference,UniversityofPennsylvania,May17-181996.
[13]S.D.Richardson,W.B.Dolan,andL.Vanderwende.Mind-Net:acquiringandstructuringsemanticinformationfromtext.volumeProceedingsofCOLING’98,1998.
[14]Y.Wilks,B.Slator,andL.Guthrie.ElectricWords-Dictio-naries,ComputersandMeanings.TheMITPress,1996.
因篇幅问题不能全部显示,请点此查看更多更全内容