您的当前位置:首页High precision logic form transformation

High precision logic form transformation

来源:小侦探旅游网
HighPrecisionLogicFormTransformation

VasileRus

DepartmentofComputerScienceandEngineering

SouthernMethodistUniversityDallas,TX75275-0122vasile@engr.smu.eduAbstract

Thispaperpresentsfewextensionstothelogicformrep-resentationandamethodfortransformingWordNetglossesintologicformsusingasetofhigh-precisionrulescombinedwithasetofhighrecallheuristics.Analmost3%increaseinPOStaggingaccuracyisachievedoverstate-of-theartresultsattheexpenseofuserinterventionononly7.52%ofwords.Weapplyanearestneighborsolutiontoparserswitchingthatleadsto6.43%increaseinexactsentenceac-curacyforglosses.LogicFormsarederivedwithanaccuracyof89.46%.

OurworkofparsingWordNetglossesresembleseffortstoextractlexicalinformationfrommachinereadabledictio-naries(MRD),asLDOCE(LongmanDictionaryofContem-poraryEnglish)orWebster’s2ndInternationalDictionary(W2).Differentparsingmethodsofthedefinitionswereused:pattern-matching[2],speciallyconstructeddefinitionparsers[14]orbroadcoverageparsers[13][8].Allthoseeffortswerelimitedtoextractinggenusterms,unlabeledorlabeledrela-tionsortobuildtaxonomies[8].WeparseWordNetglossestogeneratelogicrepresentationsthatenablereasoningmech-anisms.

1.Introduction

Itiswellunderstoodandagreedthatworldknowledgeisnecessaryformanycommonsensereasoningproblems.Considerquestion471fromTREC-QA:

2.LogicFormTransformationforWordNet

glosses

TheglossesarefirstPOStaggedusingavotingschemeandsyntacticallyparsedusinganin-houseimplementationofMichaelCollins’sstatisticalparser[3].

Thenextstepistotransformtheglossesintoamoreab-stractlogicalrepresentation.Thelogicformisanintermedi-arystepbetweensyntacticparseandthedeepsemanticform-whichwedonotaddresshere.

TheLFcodificationacknowledgessyntax-basedrelation-shipssuchas:(1)syntacticsubjects,(2)syntacticobjects,(3)prepositionalattachments,(4)complexnominals,and(5)adjectival/adverbialadjuncts.

OurapproachistoderivetheLFdirectlyfromtheoutputofthesyntacticparser.Theparserresolvesthestructuralandsyntacticambiguities.Thisway,weavoidtheveryhardprob-lemsoflogicrepresentationofnaturallanguage.WefollowcloselythesuccessfulrepresentationusedbyHobbsinTAC-ITUS[7]andextendedbyHarabagiu,MillerandMoldovanin[5]withsemanticandsyntacticfeatures.Hobbsexplainsthatformanylinguisticapplicationsitisacceptabletorelaxontologicalscruples,intricatesyntacticexplanations,andthedesireforefficientdeductionsinfavorofasimplernotationclosertoEnglish.

ForthelogicrepresentationofWordNetglossesweig-nore:pluralsandsets,verbtenses,auxiliaryverbs,quantifiers

Theanswertothisquestionis“Hitlercommittedsuicidein1945”.Tojustifythatthisisaplausibleansweroneneedsex-traknowledge.FromWordNetglossofsuicide:n#1(firstsenseofsuicide)wehavekillingyourselfandfromthefirstsenseofverbkill:v#1wehavecausetodiewhichwouldprovidewithajustification.However,toautomaticallyobtainajustificationtheinitialquestion,answerandWord-Netglossesneedtobetransformedintoacomputationalrep-resentationandautomatedinferenceprocedureshavetobedefined.ThecomputationalrepresentationthatwechosetorepresentWordNetglossesisthelogicform[11].

Thispaperpresentsfewextensionstothelogicformrep-resentationdescribedin[11].Theapproachusedistofirstparsesyntacticallytheglosses,afterwhichlogicformtrans-formationsarederivedfromgrammarrulesobtainedfromparsetrees.SincetheautomaticLogicFormTransformation(LFT)foropentextrequiresanexceedinglylargenumberofrules,wedevelopedaproceduretoovercomethischallenge.

Gloss

(astructuretallerthanitsdiameter)

workstation

conventionallyconsideredtobemore

desktop

µcomputer:n(

powerful(

)

,

)

Synset

achromatic

(acompound

formsanimagefreefrom

aberration)

(asemiconductordevicecapableofamplification)

LFnn(

)&compound

from(

)&

aberration:n()

device:n()&capable

Table2.Examplesofpostmodifierpredicates

andmodaloperatorsandnegation.Thisdecisionisbasedonourdesiretoprovidemanageableandconsistentlogicrepre-sentationthatotherwisewouldbeunfeasible.

3.LFDefinitions

Predicates

Apredicateisgeneratedforeverynoun,verb,ad-jectiveoradverbencounteredinanygloss.Thenameofthepredicateisaconcatenationofthemor-pheme’sbaseform,thepart-of-speechandtheWord-Netsemanticsense(notincludedhere),thuscapturingthefulllexicalandsemanticdisambiguation.Forexam-ple,theLFoftheglossofstudent,pupil,ed-ucatee,(alearnerwhoisenrolledinaneducationalinstitution),willcontainthepredi-cateslearner:n,enroll:vandeducational

lens:(acompound

lenssystemthatformsanimagefreefromchromaticaberration).

Fromasemanticpointofviewfreeisamodifierofim-age.Troublecomeswhenonetriestoprovidearepresen-tationforprepositionfrom.Whatphraseshouldstandasitsprepositionalhead?Onemightsayitshouldbemodifierfree.Butfreedoesnothaveanargumentonitsownsincebeing

arc

light

Gloss

(alargebuildingatanairportwhereaircraftcanbestoredandmaintained)

(produceslightwhenelectriccurrent

when(

,

)&electric

electrodes)

Table3.Examplesofrelativeadverbpredicates

Preprocessing- definition extraction-tokenizationand cleaning- definition expansion

POStagging

Parsing

Logic FormTransformer

Figure1.LFTransformer-architectureoverview

amodifieritborrowstheargumentfromitsmodifeeimage.Wecouldconsiderimageasprepositionalheadoffrombutthiswouldbewrongasthereisnosucharelationasprepo-sitionalhead-prepositionalobjectbetweenimageandchro-matic

fromandtreatitasarelationalpredicate,somehowsimilartoapreposition,wemayrepresenttheaboveexam-plelikethis:image:n()&free

aberration:n().Thissolutionmaintainsthesim-plicityandconsistencyofthenotationandistheclosestonetothesemanticinterpretation.

PossesivePronouns

Possesivepronounsintroducearelationshipbetweentheheadtheystandbyandthereferentofthepronoun.Intheglossoftowerthepronounitsintroducesapossesionre-lationbetweenstructureanddiameter.Toremedythelackofspecificityoftheoriginalnotationregardingthiscaseweproposetorepresenttherelationusingthepredicatepos.Thus,forthepreviousexample,weobtainstructure:n()&pos(,)&diameter:n().Allpossessivepronounsarerepresentedusingthispredicate.

anairportwhereaircraftcanbestoredandmaintained).

Therelationbetweenthetwosentencesisembeddedinadverbialphrasewhereandthusthecorrespondinglogicformrepresentationwouldbelarge:a()&building:n()&where(,)&aircraft:n()&and(,,)&store:v(,,)&maintain:v(,,).Thislogicrepresentationissamewiththelogicformrepresentationofimaginarysentenceaircraftcanbestoredandmaintainedinalargebuildingatanairport.Oneneedsonlytheextraknowledgeofequalinginwithwheretoobtainthesamerep-resentationfortheexemplifiedgloss,respectivelyimaginarysentence.Thissimilarityillustratestheabstractnesspoweroflogicforms:structurallydifferentnaturallanguagerepresen-tationaremappedintosamelogicformrepresentation.

4.LogicFormTransformer

OurapproachforderivingtheLFofglossesreliesonstructuralrelationsprovidedbythesyntacticparser.Severalpreprocessingstepsarenecessarybeforeparsing.

Thesystemcomprisesthefollowingmodules(seeFigure3):preprocessing,POStagging,parsing,ruleselectionandlogicformtransformation(LFT).

RelativeAdverbs

Relativeadverbsofthekindwhere,when,how,whywhenintroducingarelativeclauseshouldberepresentedasapred-icatewithtwoargumentsreferringtotheargumentoftherelativeclauseorphrase,respectivelyargumentofthemainclauseorphrase.

Weillustratesuchacaseusingthedefinitionofairdockwhichis(alargebuildingat

4.1.Preprocessing

Thepreprocessingextractsdefinitionsfromglosses,dis-cardscommentsfromdefinitions,andappliesseveraltextma-nipulationssuchas:replacesasinwithin,alsoaswithas,

onlywhenwithwhen(sincetheyonlycomplicatetheparsingprocessandLFTs),eliminateswordsasusually,especially

VotesOriginal

87.73

2

94.39

4

96.70

Recall91.29

91.65

94.90

94.18

90.96

91.26

ExactSentence65.6767.4563.85

off.Thosemanipulations

formtheso-calledcleaningphase.

4.2.POSTagging

ThePOStaggingmodulePOStagsglossesusingBrill’srulebasedtagger[1],MXPOSTstatisticaltagger[12],andWordNetsyntacticcategoryinformation.Weuseavotingschemethatisbasedontheoutputofthetwotaggers.VotingtoimprovePOStaggingaccuracywasalsousedby[10].Ifthetwotaggersagreeweassessthattagasbeingcorrect.Forwordsthathavedifferenttagsonemoreattemptismadetoautomaticallydecidethetag:ifBrill’stagandword’sWord-Netsyntacticcategoryaresimilar,thenwepickthistag.Ifthisfails,thenthetagisselectedmanually.

Usingthisapproach,wesuccessfullytagged91.57%wordsin1,000glossesfromnoun.artifacthierarchywithanaccuracyof98.50%.WordNetcoarsesyntacticcategoriesadd0.91%agreementwithalmost100%accuracy.Theuserneedstocheck7.52%ofthetagsand,supposedlyshedoesaperfectjob,anoverallaccuracyof98.93%isachieved.Com-paredwiththemeasuredaccuracyofthetaggersemployedonglossesof96%wehaveanimprovementinaccuracyofalmost3%.

4.3.SyntacticParsing

Theparsingmoduleisanin-houseimplementationofaparserwhichfollowsthestatisticalparsingprinciplesde-scribedin[3].Theparserisbasedontheprobabilitiesbe-tweenhead-wordsinparsetrees.Alloverthissectionweusetheseperformancemeasuresforparsing:P-precisionwhichisnumberofcorrectconstituentsretrieveddividedbynumberofallconstituentsretrieved,R-recallornumberofcorrectconstituentsretrieveddividedbynumberofallcorrectconstituents,F-measurewhichis

CaseOriginal

89.59

UpperLimit

92.87

Recall91.29

88.53

95.49

92.17

ExactSentence65.6779.03

Precision92.47

Random

93.46

BestPR

91.2991.33

F-measure91.87

63.94

93.65

63.42

(LFT)togenerateitscorrespondinglogicform.Table10illustratestwotypesofrulesthatweuse:intra-phraseandinter-phraserules.InordertoevaluatetheamountofworktobedoneweestimatehowmanyLFTsareneeded.Table7showsthenumberofdistinctgrammarrulesextractedfromparsetreesoftheglossesinentireWordNet.Thetotalnum-Partofspeechnounverb

adjectivesadverbs

5,3929,826

Table7.SizeofgrammarforWordNetglossesperpartofspeech

berofnearly10,000rules(seeTable7)isbyfartoolargetopossiblyimplementLFTsforallofthem(thebestcasewouldbewhenthereistotaloverlapamongthefourpartsofspeehandeachgrammarrulewouldmapintoasingleLFT).Toovercomethisproblemwedevelopedaprocedureintwosteps:

applytransformationatPOSlevelandparsetreestore-ducethenumberofcandidategrammarrules

selectmostfrequentrulesanddesignhigh-precisionLFtransformationrulesforthem.

Beforewedetaileachstepweofferarationaleforourscheme.ForthecaseofWordNetglossesalthoughtheto-talnumberofgrammarrulesislarge,asmallnumberofrulescoveralargepercentageofalloccurrences.Thismightbeex-plainedbytherelativestructuraluniformityofglosses:genusanddifferentia.Thisrelativeuniformityismorespecificfornounsandverbglosses.Table8showsthedistributionofthemostcommonphrases(identifiedbytheirnonterminal:S,NP,VP,etc.)for10,000randomlyselectednounglosses,thenumberofuniquegrammarruleshavingthatnontermi-nalastheirlefthandsideandthepercentageoftoptenmostfrequentrulesoutofthealloccurrencesofruleswithsamenonterminalonthelefthandside.Fromthetableweobservethatthetoptenmostfrequentrulescovermorethan90%ofalloccurrencesformostphrases.

5.1.TransformationatPOSlevelandparsetrees

BaseNPsandVPshaveacoveragearound70%whichcallsforimprovements.Oursolutiontoboostupthecoverageabove90%consistsofperformingasetoftagandparsetree

PhraseUniquerulesbaseNP857NP244VP450PP40S35

NPDTJJNNNNSNNPNNPS

NPDTVBNNNNNSNNPNNPSNP

JJNN

Table9.Examplesofruleshavingprenomi-nalmodifiersbelongingtodifferentsyntacticcategoriesNPsandplurals,determinersandpropernountagstreatedsimilarly

transformationstoreducethenumberofcandidaterules.Twobasictechniquesareused:(1)tagreductionand(2)transfor-mationsofparsetrees.

Tagreductionisallowedduetosimplificationsinnota-tion:(1)determinersareeliminated,(2)pluralsareignoredandthuswecanreplaceNNSwithNN,(3)propernounsaretreatedidenticallyascommonnounsandinconsequenceNNPischangedintoNNand(4)everythinginaprenomi-nalpositionplaysthefunctionofamodifier.ExamplesofrulereductionduetotagreductionareillustratedinTable9.Forverbsweignoretenses:VBG,VBP,VBZ,VBN,VBareallmappedintoVB.Keepingthepassiveinformationisim-portantforsyntacticroledetectionandthusweaddanewtagVP-PASStoindicatethattheheadoftheVPispassive.Modalsandauxiliariesareeliminatedandnegationsareig-nored.

Thesecondtechniqueconsistsofrearrangingtheparsetreessothatmorecomplexstructuresarereducedtosimplerones:morecomplexbaseNPsarerearrangedintosimplerones(seeFigure7).

NP

NP

NP

CCNP

DTNN

CCNN

DTNN

NNa

ruler

orinstitutionaruler

or

institution

Figure2.TransformingacoordinatedNPinanonbaseNPandtwoprimitivebaseNPs

TransformationRule(LFT)

NPNP

Synset

(NP(a/DTmonastery/NN))(NP(a/DTshort/JJsleep/NN))

DTNNDTJJNN

TransformationRule(LFT)

PP

INNP

verb/VP-PASSby/PP(verb(e,x,

,x)&noun(x)

(VP(ruled/VBNby/PP))

)

Synsetabbey:n#3

NP

NPVPabbey:n#3

Table10.ExamplesofLFTs

AsaconsequenceofthosetransformationsthecoverageoftoptenmostfrequentrulesforbaseNPs,respectivelyverbphrasesjumpsover90%.

6.ResultsforLFTransformation

Tovalidateourmethodwehaveexperimenteditonpre-viouslydescribedsetof1,140definitionsfromnoun.artifacthierarchy.Thelogicformsforthosedefinitionsweremanu-allyobtainedasourreferencedata.

TheinitialsetofLFTsisformedbytakingthemostfre-quentrulesforeachgrammarphrasedetectedinacorpusof10,000nounglossesrandomlyselectedfromthenoundatafileofWordNet1.6.Wetagandparsethem.Fromtheparsetreesobtainedweextractedallgrammarrulesandtheirnum-berofoccurrencesandsortedthemaccordingtotheirfre-quency.Then,weselectedthetoptenmostfrequentrulesoruptothepointwheretherulesfrequencyislessthan1%outofoccurrencesofallgrammarruleswithsamelefthandsidetag.Asetofabout70mostfrequentlyusedruleswasob-tainedandtheircorrespondingLFTshavebeenimplemented.WeappliedtheLFTstothetestsetof1000glossesandcomparedtheoutputtothemanuallybuiltlogicforms.ThemeasurethatweuseisexactLFaccuracy:numberofcor-rectlygeneratedglossesinlogicformovernumberofallglossesattempted.AnalternativemeasurewouldbeLFpred-icateaccuracy:numberofcorrectlygeneratedpredicatesoverthenumberofallpredicates.Apredicateiscorrectlygeneratedifallitsargumentsarecorrectlyassigned.Thissecondarymeasureisnotsuitablewhenonewantstofurtherconsidertheglossesasaxiomsasitdoesnottellthenumberofcorrectlygeneratedaxioms(see[11]).

Forourtestdata,whicharecorrectlyparsed,anexactLFaccuracyof83.59%hasbeenobtainedusingLFTsderivedfromtheselectedrules.TheheuristicsboosttheLFaccu-racyto89.46%.Werepeatedsameexperimentsusingtheoutputfromnearestneighborparserswitching.Theaccuracydroppedto61.53%withoutheuristics,respectively,66.31%withthem.Inordertodecreasethegapbetweenthetwomea-suresweplantoworkmoreonimprovingparseraccuracy.Weplantouseparsercombinationtechniquetodetectpossi-bleerrorsandaskuser’sinterventionwhennecessary.

5.2.Ruleselection

InthisstepwederiveLFTsfortoptenmostfrequentgrammarrulesforeachgrammarphrase.Rulesthathaveacoverageoflessthan1%arenotpickedeveniftheyareinthetopten.TheselectedrulesmightbemappedintooneormorelogicformtransformationsasillustratedinTable10.

5.3.Heuristics

Theadvantageoftheproposedapproachisthattheimple-mentedrulesarehighlyaccurateandwheneveranargumentisassignedonecantellwithhighprecisionthatistherightone.Amissingargumentindicatesthatcasewasnotcov-eredbytheimplementedrules.Inotherwordsthisapproachprefershighprecisionrulesoverhighrecallheuristics.Foruncoveredcaseswehavetwochoices:eithertheusermaymanuallyinterveneandfilltheargumentorasetofheuris-ticscanbedesignedtodothejob.Weoptedtodesignasetofheuristicstosolveuncoveredcases.Theheuristicswillassurethateveryargumentslotwillbefilledbutitcannotassurethatwouldbetherightoneallthetime.

Aheuristicwasdesignedforeachtypeofargument:sub-ject-previousphraseheadargumentorifverbisinpassivetheprepositionalobjectargumentofthefollowingbypreposition;directobject-firstfollowingphraseheadargument(orsecondforditransitives)orsurfacesub-jectiftheverbisinpassive;Indirectobject-sec-ondfollowingnounphraseheadargument(orfirst)forditransitives;prepositionalhead-previousphraseheadargument;prepositionalargument-followingphraseheadargument;adjective/adverbs-follow-ingnoun/verbphraseheadargumentorpreviousnoun/verbphraseargumentifthereisnonefollowing;default-gen-erateanewargumentthatdoesnotexist.

7.AnexampleToillustratestepbystepthederivationprocessweconsidertheglossofbadmintonracket,bad-mintonracquet,battledore:

(alightlong-handledracketusedbybadmintonplayers).

Thepreprocessingphaseoutputsthedefinitionalightlong-handledracketusedbybadmintonplayersandtok-enizesit.Therearenocommentstobedroppedoranyclean-ingnecessary.

ThenBrill’stagger,respectivelyMXPOSTarerun,andtheoutputsare:

Brill’s:a/DTlight/NNlong-handled/JJracket/NNused/VBNby/INbadminton/NNplayers/NNS

MXPOST:a

JJlong-handledNNusedINbadmintonNNS

Thetwotaggersdisagreeforlight,respectivelybadminton.ThelatterisautomaticallycorrectedusingWordNetlex-icalinformation,whiletheformerissolvedbyuser’sintervention.

Thedefinitionisexpandedinto:NNBadminton-racketVBZisDTaJJlightJJlong-handledNNracketVBNusedINbyNNbadmintonNNSplayers..andparsed.Theparsetreeisprocessedinabottomupfashion:fromleavesuptothetop.Ateachlevelthegrammarruleisextracted(parentnonterminal-listofchildrentags)andthenthecorrespond-ingLFTorheuristicistriggered.Whenthetopisreachedthelogicformisprinted:light()&long-handled()&racket()&use()&nn()&badminton()&player().

S

S NP VP-PASSNPNPJJ JJ NN

VP-PASSVP VB PP-byPP-byPP IN NPNP

NP NN NNDT

JJJJNNVBNINNNNNSa

lightlong-handledracketusedbybadmintonplayersFigure3.Theparsetreeforbadmintonracket,badmintonracquet,battle-doreandthetriggeredLFTsateachlevel.

8.Conclusions

WepresentedhereaproceduretotransformWordNetglossesintologicforms.Ourprocedurecombinesasetofhighprecisionruleswithasetofhighrecallheuristics.Thenotationusedisfirstorderlogicandcontainssyntacticinfor-mationaspositionalarguments.ImprovementsatPOStag-gingandsyntacticparsingarereported.AnLFaccuracyof89.46%on1000WordNetglosseswasobtained.

References

[1]E.Brill.Asimplerule-basedpartofspeechtagger.InPro-ceedingsoftheThirdConferenceonAppliedNaturalLan-guageProcessing,pages152–155,1992.

[2]M.Chodorow,R.Byrd,andG.Heidorn.Extractingseman-tichierarchiesfromalargeon-linedictionary.InProceedingsofthe23rdAnnualMeetingoftheAssociationforComputa-tionalLinguistics,pages299–304,1985.

[3]M.Collins.Threegenerative,lexicalisedmodelsforstatis-ticalparsing.InProceedingsofthe35thAnnualMeetingoftheAssociationforComputationalLinguistic,Madrid,Spain,1997.

[4]D.Davidson.Thelogicalformofactionsentences.In

N.Rescher,editor,TheLogicofDecisionandAction,pages81–95.UniversityofPittsburghPress,1967.

[5]S.M.Harabagiu,A.G.Miller,andD.I.Moldovan.WordNet

2-aMorphologicallyandSemanticallyEnhancedResource.InProceedingsofSIGLEX-99,pages1–8,UniversityofMary-land,June1999.

[6]J.HendersonandE.Brill.BaggingandBoostingaTreebank

Parser.InProceedingsofNAACL2000,Seattle,WA,2000.[7]J.R.Hobbs.OverviewoftheTACITUSproject.Computa-tionalLinquistics,12(3),1986.

[8]ISI.http://www.isi.edu/natural-language/dpp/.1998.

[9]M.Marcus,B.Santorini,andMarcinkiewicz.Buildingalarge

annotatedcorpusofEnglish:thePennTreebank.Computa-tionalLinguistic,19(2):313–330,1993.

[10]R.MihalceaandD.I.Moldovan.eXtendedWordnet:progress

report.InNAACL2001-WorkshoponWordNetandOtherLexicalResources,Pittsburgh,PA,2001.

[11]D.I.MoldovanandV.Rus.LogicFormtransformationof

WordNetanditsApplicabilitytoQuestionAnswering.InProceedingsofACL2001,Toulouse,France,6-11July2001.AssociationforComputationalLinguistics.Toappear.

[12]A.Ratnaparkhi.Amaximumentropypart-of-speechtagger.

InInProceedingsoftheEmpiricalMethodsinNaturalLan-guageProcessingConference,UniversityofPennsylvania,May17-181996.

[13]S.D.Richardson,W.B.Dolan,andL.Vanderwende.Mind-Net:acquiringandstructuringsemanticinformationfromtext.volumeProceedingsofCOLING’98,1998.

[14]Y.Wilks,B.Slator,andL.Guthrie.ElectricWords-Dictio-naries,ComputersandMeanings.TheMITPress,1996.

因篇幅问题不能全部显示,请点此查看更多更全内容