구글, 야후, Msn 등 해외검색엔진이 무시하는 단어 “Stop Words”
국내 검색엔진의 경우 통합검색으로 웹문서 검색의 비중이 거의 없지만 해외검색엔진의 일반적인 검색방식은 웹문서 검색결과이다.
대표적인 해외검색엔진 구글(http://www.google.com), 야후(http://www.yahoo.com) 등을 비롯하여 대부분의 검색엔진은 지극히 일반적인 단어을 무시하고 있다. 하루에도 수백만페이지 이상을 크롤링하여 수많은 정보를 저장하기 때문에 디스크공간을 절약하기 위한 것과 이러한 방대한 양의 데이터를 좀더 빠르게 처리하기 위해서이다.
이렇게 검색엔진에서 제외되는 단어들을 “stop words”라고 이야기한다.
가령 The way to the school is long and hard when walking in the rain. 이런 문장이 있다면
검색엔진은 * way to * school is long and hard when walking in * rain. 이렇게 인식하여 저장할 것 같이라는 이야기다.
하지만 이런 stop words의 비중이 대부분 내용에서 70% 정도 차지하고 있어서 검색엔진 단지
100%중 30%만을 크롤링하여 인덱싱하는 것이다.
따라서 영문 웹사이트의 내용 중 이런 stop words의 비중이 얼마나 높은지 확인해 보길 바란다.
이런 stop words의 비중이 높은 페이지는 검색엔진의 관점에서는 그다지 중요한 페이지가 아닌 것으로 판단하게 한다. 따라서 stop words를 최소화하고 title, 메타태그, 그리고 내용 순으로 살펴보고 stop words의 비중을 줄여야 한다. 남들이 흔히 사용하는 단어보다는 남들이 사용하지 않는 단어나 문장이 검색엔진은 더 선호한다.
아래의 영어 단어들은 검색엔진이 무시하는 단어 리스트이다.
Stop Words 리스트
a able about above abroad according accordingly across actually adj after afterwards again against ago ahead ain’t all allow allows almost alone along alongside already also although always am amid amidst among amongst an and another any anybody anyhow anyone anything anyway anyways anywhere apart appear appreciate appropriate are aren’t around as a’s aside ask asking associated at available away awfully b back backward backwards be became because become becomes becoming been before beforehand begin behind being believe below beside besides best better between beyond both brief but by c came can cannot cant can’t caption cause causes certain certainly changes clearly c’mon co co. com come comes concerning consequently consider considering contain containing contains corresponding could couldn’t course c’s currently d dare daren’t definitely described despite did didn’t different directly do does doesn’t doing done don’t down downwards during e each edu eg eight eighty either else elsewhere end ending enough entirely especially et etc even ever evermore every everybody everyone everything everywhere ex exactly example except f fairly far farther few fewer fifth first five followed following follows for forever former formerly forth forward found four from further furthermore g get gets getting given gives go goes going gone got gotten greetings h had hadn’t half happens hardly has hasn’t have haven’t having he he’d he’ll hello help | hence her here hereafter hereby herein here’s hereupon hers herself he’s hi him himself his hither hopefully how howbeit however hundred i i’d ie if ignored i’ll i’m immediate in inasmuch inc inc. indeed indicate indicated indicates inner inside insofar instead into inward is isn’t it it’d it’ll its it’s itself i’ve j just k keep keeps kept know known knows l last lately later latter latterly least less lest let let’s like liked likely likewise little look looking looks low lower ltd m made mainly make makes many may maybe mayn’t me mean meantime meanwhile merely might mightn’t mine minus miss more moreover most mostly mr mrs much must mustn’t my myself n name namely nd near nearly necessary need needn’t needs neither never neverf neverless nevertheless new next nine ninety no nobody non none nonetheless noone no-one nor normally not nothing notwithstanding novel now nowhere o obviously of off often oh ok okay old on once one ones one’s only onto opposite or other others otherwise ought oughtn’t our ours ourselves out outside over overall own p particular particularly past per perhaps placed please plus possible presumably probably provided provides q que quite qv r rather rd re really reasonably recent recently regarding regardless regards relatively respectively right round s said same saw say saying says second secondly | see seeing seem seemed seeming seems seen self selves sensible sent serious seriously seven several shall shan’t she she’d she’ll she’s should shouldn’t since six so some somebody someday somehow someone something sometime sometimes somewhat somewhere soon sorry specified specify specifying still sub such sup sure t take taken taking tell tends th than thank thanks thanx that that’ll thats that’s that’ve the their theirs them themselves then thence there thereafter thereby there’d therefore therein there’ll there’re theres there’s thereupon there’ve these they they’d they’ll they’re they’ve thing things think third thirty this thorough thoroughly those though three through throughout thru thus till to together too took toward towards tried tries truly try trying t’s twice two u un under underneath undoing unfortunately unless unlike unlikely until unto up upon upwards us use used useful uses using usually v value various versus very via viz vs w want wants was wasn’t way we we’d welcome well we’ll went were we’re weren’t we’ve what whatever what’ll what’s what’ve when whence whenever where whereafter whereas whereby wherein where’s whereupon wherever whether which whichever while whilst whither who who’d whoever whole who’ll whom whomever who’s whose why will willing wish with within without wonder won’t would wouldn’t x y yes yet you you’d you’ll your you’re yours yourself yourselves you’ve z zero |
2 thoughts on “검색엔진최적화 – 구글,야후 등 검색엔진이 무시하는 단어”
안녕하세요 저는 재택 알바로 블로그를 운영중인 사람입니다^^ 저도 저의 정성들인 포스팅(홍보목적이긴하지만;)을 세시간동안 작성하고 검색엔진을 돌려보았을때 검색이 한참뒤로 밀려나있는것을보고 그때부터 검색엔진 상위 노출에 매우 관심을 갖게되어 결국 님 블로그까지 들어오게 되었습니다 정말로 유용한 정보가 많고 저에게도 많은 도움이 될 것같습니다 저는 대부분 사이트를 눈팅으로 끝냈긴했지만 이번 블로그를 이용한 마케팅이 제 일생에 지대한 영향을 미쳐 이렇듯 인연의 글을 띄웁니다 좋은 정보에 감사하고 본 블로그가 더욱 발전되도록 기원하겠습니다^^
제 블로그를 통해 좋은 영향을 끼친것 같아 보람을 느낍니다. 하시는 일이 더욱 발전하시길 기대하면서 저도 블로그 방문해보겠습니다.