2022年5月26日星期四

Web scraper抓取亚马逊数据案例 + 常见无法抓取问题

首先你要保证自己的浏览器已经安装了WebScraper插件。

接下来,直接上各种案例代码。01
抓取产品QA信息
有两种方式:抓取所有页面,抓取指定页

a.Web scraper点击翻页,抓取所有页数。缺点是有的竞品可能几百页QA,没必要抓那么多。
代码:{"_id":"amz-qa","startUrl":["https://www.amazon.com/ask/questions/asin/B09FT58QQP"],"selectors":[{"id":"contents","parentSelectors":["_root","nextpage"],"type":"SelectorElement","selector":".a-section > div.a-spacing-base > div > div.a-col-right","multiple":true,"delay":null},{"id":"question","parentSelectors":["contents"],"type":"SelectorText","selector":".a-spacing-small div.a-col-right","multiple":false,"delay":0,"regex":""},{"id":"answer","parentSelectors":["contents"],"type":"SelectorText","selector":".a-col-right > span:nth-of-type(1)","multiple":false,"delay":0,"regex":""},{"id":"buyer","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-profile-name","multiple":false,"delay":0,"regex":""},{"id":"date","parentSelectors":["contents"],"type":"SelectorText","selector":"span.a-color-tertiary","multiple":false,"delay":0,"regex":""},{"id":"nextpage","parentSelectors":["_root","nextpage"],"type":"SelectorLink","selector":".a-last a","multiple":true,"delay":0}]} 按图示,导入代码

抓取不同竞品,最好更换ASIN。可以在这里编辑,后面的ASIN换成其他ASIN

 
b.抓取指定页面代码:

导入代码以后,如果想指定页数,就打开编辑,更改网址后面的数字。

如果爬取8页,就改成[1-8],依此类推。代码:{"_id":"amz-qa2","startUrl":["https://www.amazon.com/ask/questions/asin/B09FT58QQP/[1-2]"],"selectors":[{"delay":null,"id":"contents","multiple":true,"parentSelectors":["_root"],"selector":".a-section > div.a-spacing-base > div > div.a-col-right","type":"SelectorElement"},{"delay":0,"id":"question","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-spacing-small div.a-col-right","type":"SelectorText"},{"delay":0,"id":"answer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":".a-col-right > span:nth-of-type(1)","type":"SelectorText"},{"delay":0,"id":"buyer","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"date","multiple":false,"parentSelectors":["contents"],"regex":"","selector":"span.a-color-tertiary","type":"SelectorText"}]}

抓取其他数据依旧是同样的方式,导入代码即可。

02
抓取竞品review

代码:{"_id":"review","startUrl":["https://www.amazon.com/Insulated-Lunch-Bag-Women-Men/dp/B07WVWSVL3/ref=cm_cr_arp_d_viewopt_sr?ie=UTF8&reviewerType=all_reviews&pageNumber=1&filterByStar=all_stars"],"selectors":[{"clickElementSelector":".a-last a","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickMore","delay":3000,"discardInitialElements":"do-not-discard","id":"info","multiple":true,"parentSelectors":["_root"],"selector":".a-row div.celwidget","type":"SelectorElementClick"},{"delay":0,"id":"name","multiple":false,"parentSelectors":["info"],"regex":"","selector":"span.a-profile-name","type":"SelectorText"},{"delay":0,"id":"score","multiple":false,"parentSelectors":["info"],"regex":"","selector":"> div:nth-of-type(2)","type":"SelectorText"},{"delay":0,"id":"status","multiple":false,"parentSelectors":["info"],"regex":"","selector":"div.a-spacing-mini.review-data","type":"SelectorText"},{"delay":0,"id":"time","multiple":false,"parentSele.............

原文转载:http://fashion.shaoqun.com/a/1005212.html


hyperloop:http://www.ikjzd.com/w/1658
ups官网:http://www.ikjzd.com/w/512
心怡:http://m.ikjzd.com/w/1327
赛兔:http://m.ikjzd.com/w/2375
死磕亚马逊!被封号后,中国卖家逃向沃尔玛:https://m.ikjzd.com/articles/159713
亚马逊品牌备案的好处及流程:https://m.ikjzd.com/articles/159716
CASIO卡西欧科学计算器侵权名单已公布!代理律所HSP向卖家索赔巨额赔偿金!:https://m.ikjzd.com/articles/159715

没有评论:

发表评论