Aug 20, 2017
写爬虫的话做到最后基本上最终没法自动化的就是指定要抽取的元素的xpath了, 要定向爬一个网站的内容基本上都会归结到去找下一页和数据元素的xpath. 如果能把xpath的生成交给不会写程序的运营同学来做的话, 能够极大地解放程序员的生产力.
毕竟xpath也算是一个DSL, 对于不会编程的同学还是有一定难度的. SQL写得熟练的PM多得是, 想找一个会写xpath的运营同学则是很困难, 毕竟术业有专攻, 运营需要面对的问题和我们程序猿还是有很大不同. 多年的经验, 感觉能教会他们yaml已经是极限了...
那么能不能有一个图形化的工具来生成xpath呢? 答案显然是有的, chrome浏览器就内置了生成xpath的工具, 如下图所示:
这幅图生成的xpath是: //*[@id="fc_B_pic"]/ul[1]/li[1]/a[1]
然而chrome的xpath生成却有几个缺点:
- chrome的xpath只会想上去找带有id的元素, 而根据实际的情况, 往往找到带有class的元素就可以保证找的xpath是对的了.
- chrome生成的元素是尽量保证元素唯一的, 也就是当你想要搞一个能能够选中多个元素的xpath时, chrome 无能为力, 还是需要自己去改写.
- 另外就是生成之后不能方便的用图形工具去验证.
未完待续
Jun 18, 2017
chrome set focus to the omni box when you create a new tab, although there is an api to replace the new tab page. you can't steal the focus from the omni box in the new tab page simply. there are two work-arounds.
if you are creating a new tab …
Jun 17, 2017
https://stackoverflow.com/questions/22193369/run-chrome-extensions-using-command-prompt
chrome/firefox 插件源码的位置:
- Chrome
~/Library/Application Support/Google/Chrome/Default/Extensions/<EXTENSION_ID>/
- Firefox
~/Library/Application Support/Firefox/Profiles/PROFILE_ID/extensions/EXTENSION_ID/
Jun 12, 2017
basic usage
chrome.runtime.getBackgroundPage(function(window) {}) // retrive the background page's window object
chrome.runtime.getURL(path) // get the absolute url for given file in extension package
Jun 12, 2017
Basic Concepts
there are 3 storage area for chrome, sync
, local
, managed
areas. the sync
area will be synced with the cloud. managed area is read-only.
all your extension scripts share the same storage, including content scripts, they don't belong to their domain's localStorage.
Usage
chrome.storage.local.get('key …
Jun 12, 2017
permissions
set the cookies permission and the domain you would like to access cookies.
"permissions": {
"cookies",
"*://*.example.com/"
}
type
cookie
just a simple object with {name, value, domain...}
CookieStore
normal mode and incognito mode use different cookie stores.
read
get: chrome.cookies.get({url: URL, name: COOKIE_NAME, storeId: COOKIE_STORE_ID}, function …
Jun 09, 2017
A chrome extension can inject script into the page, this is called content script.
https://developer.chrome.com/extensions/getstarted
https://developer.chrome.com/extensions/content_scripts
https://developer.chrome.com/extensions/messaging
图标变灰的问题
Add browser_action.default_icon in your manifest.json file
{
...
"browser_action": {
"default_icon": "icons/icon-32.png"
},
...
}
May 29, 2017
GreaseMonkey/TamperMonkey 学习
头部命令
- @name | 脚本名字|
- @namespace|命名空间|
- @version| 版本|
- @author|作者|
- @description
- @homepage
- @icon
- @updateURL
- @downloadURL
- @include
- @exclude
- @resource key url
- @require include scripts
- @connect reach cross origin domains self, current domain, localhost, or *
- @run-at when to run the script document-start/document-body/document-end/document-idle/context-menu
- @grant whitelist GM_* functions …