{"id":15,"date":"2011-07-16T15:37:20","date_gmt":"2011-07-16T19:37:20","guid":{"rendered":"https:\/\/michaelnielsen.org\/ddi\/?p=15"},"modified":"2011-07-16T15:37:20","modified_gmt":"2011-07-16T19:37:20","slug":"benchmarking-a-simple-crawler-working-notes","status":"publish","type":"post","link":"https:\/\/michaelnielsen.org\/ddi\/benchmarking-a-simple-crawler-working-notes\/","title":{"rendered":"Benchmarking a simple crawler (working notes)"},"content":{"rendered":"<p>In this post I describe a simple, single-machine web crawler that I&#8217;ve written, and do some simple profiling and benchmarking.  In the next post I intend to benchmark it against two popular open source crawlers, the <a href=\"http:\/\/scrapy.org\/\">scrapy<\/a> and <a href=\"http:\/\/nutch.apache.org\/\">Nutch<\/a> crawlers.<\/p>\n<p>I&#8217;m doing this as part of an attempt to answer a big, broad question: if you were trying to build a web-scale crawler, does it make most sense to start from scratch (which gives you a lot of flexibility), or would it make more sense to start from an existing project, like Nutch?<\/p>\n<p>Of course, there are many aspects to answering this question, but obviously one important aspect is speed: how fast can we download pages?  I&#8217;m especially interested in understanding where the bottlenecks are in my code.  Is it the fact that I&#8217;ve used Python?  Is it download speed over the network?  Is it access to the database server?  Is it parsing content?  Are we CPU-bound, network-bound, or disk-bound?  The answers to these questions will help inform decisions about whether to work on improving the crawler, or perhaps to work starting from an existing crawler.<\/p>\n<p>The code for my test crawler is at <a href=\"https:\/\/github.com\/mnielsen\/test_crawler\/tree\/initial_profile\">GitHub<\/a>. The crawler uses as a set of seed urls a listing of some of the top blogs from Technorati.  Only urls from within the corresponding domains are crawled.  I won&#8217;t explicitly show the code for getting the seed urls, but it&#8217;s in the same GitHub repository (<a href=\"https:\/\/github.com\/mnielsen\/test_crawler\/tree\/initial_profile\">link<\/a>). Here&#8217;s the code for the crawler: <\/p>\n<pre>\r\n\"\"\"crawler.py crawls the web.  It uses a domain whitelist generated\r\nfrom Technorati's list of top blogs. \r\n\r\n\r\nUSEAGE\r\n\r\npython crawler.py &\r\n\r\n\r\nThe crawler ingests input from external sources that aren't under\r\ncentralized control, and so needs to deal with many potential errors.\r\nBy design, there are two broad classes of error, which we'll call\r\nanticipated errors and unanticipated errors.\r\n\r\nAnticipated errors are things like a page failing to download, or\r\ntiming out, or a robots.txt file disallowing crawling of a page.  When\r\nanticipated errors arise, the crawler writes the error to info.log,\r\nand continues in an error-appropriate manner.\r\n\r\nUnanticipated errors are, not surprisingly, errors which haven't been\r\nanticipated and designed for.  Rather than the crawler falling over,\r\nwe log the error and continue.  At the same time, we also keep track\r\nof how many unanticipated errors have occurred in close succession.\r\nIf many unanticipated errors occur rapidly in succession it usually\r\nindicates that some key piece of infrastructure has failed --- maybe\r\nthe network connection is down, or something like that.  In that case\r\nwe shut down the crawler entirely.\"\"\"\r\n\r\nimport cPickle\r\nimport json\r\nimport logging\r\nimport logging.handlers\r\nimport os\r\nfrom Queue import Queue\r\nimport re\r\nimport robotparser\r\nfrom StringIO import StringIO\r\nimport sys\r\nimport threading\r\nimport time\r\nimport traceback\r\nimport urllib\r\nimport urllib2\r\nfrom urlparse import urlparse\r\n\r\n# Third party libraries\r\nfrom lxml import etree\r\nimport MySQLdb\r\nfrom redis import Redis\r\n\r\n# Configuration parameters\r\n\r\n# NUM_THREADS is the number of crawler threads: setting this parameter\r\n# requires some experimentation.  A rule of thumb is that the speed of\r\n# the crawler should scale roughly proportionally to NUM_THREADS, up\r\n# to a point at which performance starts to saturate.  That's the\r\n# point at which to stop.\r\nNUM_THREADS = 15\r\n\r\n# MAX_LENGTH is the largest number of bytes to download from any\r\n# given url.\r\nMAX_LENGTH = 100000\r\n\r\n# NUM_PAGES is the number of pages to crawl before halting the crawl.\r\n# Note that the exact number of pages crawled will be slightly higher,\r\n# since each crawler thread finishes its current downloads before\r\n# exitting.  At the moment, NUM_PAGES is set quite modestly.  To\r\n# increase it dramatically --- say up to 10 million --- would require\r\n# moving away from Redis to maintain the URL queue.\r\nNUM_PAGES = 5000\r\n\r\n# Global variables to keep track of the number of unanticipated\r\n# errors, and a configuration parameter --- the maximum number of\r\n# close unanticipated errors in a row that we'll tolerate before\r\n# shutting down.\r\ncount_of_close_unanticipated_errors = 0\r\ntime_of_last_unanticipated_error = time.time()\r\nMAX_CLOSE_UNANTICIPATED_ERRORS = 5\r\n\r\n# Counter to keep track of the number of pages crawled.\r\nr = Redis()\r\nr.set(\"count\",0)\r\n\r\n# total_length: the total length of all downloaded files.\r\n# Interpreting it depends on\r\ntotal_length = 0\r\n\r\ndef main():\r\n    create_logs()\r\n    establish_url_queues()\r\n    get_seed_urls()\r\n    get_domain_whitelist()\r\n    establish_mysql_database()\r\n    start_time = time.time()\r\n    for j in range(NUM_THREADS):\r\n        crawler = Crawler()\r\n        crawler.setName(\"thread-%s\" % j)\r\n        print \"Launching crawler %s\" % (j,)\r\n        crawler.start()\r\n    crawler.join()\r\n    end_time = time.time()\r\n    elapsed_time = end_time-start_time\r\n    r = Redis()\r\n    num_pages = int(r.get(\"count\"))\r\n    print \"%s pages downloaded in %s seconds.\" % (num_pages,elapsed_time)\r\n    print \"That's %s pages per second.\" % (num_pages\/elapsed_time)\r\n    print \"\\nTotal length of download is %s.\" % total_length\r\n    print \"Assuming UTF-8 encoding (as for most English pages) that's the # of bytes downloaded.\"\r\n    print \"Bytes per second: %s\" % (total_length\/elapsed_time)\r\n\r\ndef create_logs():\r\n    \"\"\"Set up two logs: (1) info_logger logs routine events, including\r\n    both pages which have been crawled, and also anticipated errors;\r\n    and (2) critical_logger records unanticipated errors.\"\"\"\r\n    global info_logger\r\n    global critical_logger\r\n    info_logger = logging.getLogger('InfoLogger')\r\n    info_logger.setLevel(logging.INFO)\r\n    info_handler = logging.handlers.RotatingFileHandler(\r\n        'info.log', maxBytes=1000000, backupCount=5)\r\n    info_logger.addHandler(info_handler)\r\n\r\n    critical_logger = logging.getLogger('CriticalLogger')\r\n    critical_logger.setLevel(logging.CRITICAL)\r\n    critical_handler = logging.handlers.RotatingFileHandler(\r\n        'critical.log',maxBytes=100000,backupCount=5)\r\n    critical_logger.addHandler(critical_handler)\r\n\r\ndef establish_url_queues():\r\n    \"\"\"Checks whether the Redis database has been set up.  If not,\r\n    set it up.\"\"\"\r\n    r = Redis()\r\n    # Strictly, we should check that the lists for all threads are empty.\r\n    # But this works in practice.\r\n    if r.llen(\"thread-0\") == 0:\r\n        get_seed_urls()\r\n\r\ndef get_seed_urls():\r\n    \"\"\"Puts the seed urls generated by get_seed_urls_and_domains.py\r\n    into the crawl queues.\"\"\"\r\n    f = open(\"seed_urls.json\")\r\n    urls = json.load(f)\r\n    f.close()\r\n    append_urls(urls)\r\n\r\ndef append_urls(urls):\r\n    \"\"\"Appends the contents of urls to the crawl queues.  These are\r\n    implemented as Redis lists, with names \"thread-0\", \"thread-1\", and\r\n    so on, corresponding to the different threads.\"\"\"\r\n    r = Redis()\r\n    for url in urls:\r\n        thread = hash(domain(url)) % NUM_THREADS\r\n        r.rpush(\"thread-%s\" % str(thread),url)\r\n\r\ndef domain(url):\r\n    \"\"\"A convenience method to return the domain associated to a url.\"\"\"\r\n    return urlparse(url).netloc\r\n\r\ndef get_domain_whitelist():\r\n    global domain_whitelist\r\n    f = open(\"domain_whitelist.json\")\r\n    domain_whitelist = set(json.load(f))\r\n    f.close()\r\n\r\ndef establish_mysql_database():\r\n    \"\"\"Checks whether the tables in the MySQL database \"crawl\" \r\n    have been set up.  If not, then set them up.  Note that this\r\n    routine assumes that the \"crawl\" database has already been\r\n    created. \"\"\"\r\n    conn = MySQLdb.connect(\"localhost\",\"root\",\"\",\"crawl\")\r\n    cur = conn.cursor()\r\n    if int(cur.execute(\"show tables\")) == 0:\r\n        create_tables(conn)\r\n    conn.close()\r\n\r\ndef create_tables(conn):\r\n    \"\"\"Creates the MySQL tables and indices for the crawl.  We create\r\n    two tables: (1) robot_parser, which is used to store the (parsed)\r\n    robots.txt file for each domain; and (2) pages, which stores urls\r\n    and the corresponding title and content text.\"\"\"\r\n    cur = conn.cursor()\r\n    cur.execute(\r\n        'create table robot_parser (domain text, robot_file_parser text)') \r\n    cur.execute('create table pages (url text, title text, content mediumtext)')\r\n    cur.execute('create index domain_idx on robot_parser(domain(255))')\r\n    cur.execute('create index url_pages_idx on pages(url(255))')\r\n\r\nclass Crawler(threading.Thread):\r\n        \r\n    def run(self):\r\n        global total_length\r\n        r = Redis()\r\n        conn = MySQLdb.connect(\"localhost\",\"root\",\"\",\"crawl\")\r\n        parser = etree.HTMLParser()\r\n        while int(r.get(\"count\")) < NUM_PAGES:\r\n            try:\r\n                urls = self.pop_urls()\r\n                new_urls = []\r\n            except:\r\n                self.error_handler()\r\n                continue\r\n            for url in urls:\r\n                try:\r\n                    if not self.is_crawling_allowed(conn,url):\r\n                        continue\r\n                    try:\r\n                        request = urllib2.urlopen(url,timeout=5)\r\n                    except urllib2.URLError:\r\n                        info_logger.info(\"%s: could not open\" % url)\r\n                        continue\r\n                    headers = request.headers\r\n                    length = get_length(url,headers)\r\n                    try:\r\n                        content = request.read(length)\r\n                    except urllib2.URLError:\r\n                        info_logger.info(\"%s: could not download\" % url)\r\n                        continue\r\n                    try:\r\n                        tree = etree.parse(StringIO(content),parser)\r\n                    except:\r\n                        info_logger.info(\"%s: lxml could not parse\" % url)\r\n                        continue\r\n                    self.add_to_index(conn,url,content,tree)\r\n                    r.incr(\"count\")\r\n                    total_length += len(content)\r\n                    info_logger.info(\"Crawled %s\" % url)\r\n                    for url in tree.xpath(\"\/\/a\/@href\"):\r\n                        if not(domain(url) in domain_whitelist):\r\n                            pass\r\n                        else:\r\n                            new_urls.append(url)\r\n                except:\r\n                    self.error_handler()\r\n                    continue\r\n            try:\r\n                append_urls(new_urls)\r\n                conn.commit()\r\n            except:\r\n                self.error_handler()\r\n                continue\r\n\r\n    def pop_urls(self):\r\n        \"\"\"Returns 10 urls from the current thread's url queue.\"\"\"\r\n        urls = []\r\n        r = Redis()\r\n        for j in range(10):\r\n            urls.append(r.lpop(self.name))\r\n        return urls\r\n\r\n    def error_handler(self):\r\n        \"\"\"Logs unanticipated errors to critical_logger. Also checks\r\n        whether or not the error occurred close to (within 3 seconds)\r\n        another unanticipated error, and if that occurs too many times\r\n        in a row, shuts down the crawler.\"\"\"\r\n        global time_of_last_unanticipated_error\r\n        global count_of_close_unanticipated_errors\r\n\r\n        critical_logger.critical(\r\n            \"Error getting URLs at %s:\\n\\n%s\" % (time.asctime(),\r\n                                                 traceback.format_exc()))\r\n        # Check whether the error is close (within 3 seconds) to the\r\n        # last unanticipated error\r\n        if (time.time() < time_of_last_unanticipated_error + 3.0):\r\n            critical_logger.critical(\r\n                \"\\nThis error occurred close to another.\\n\")\r\n            # Not threadsafe, but shouldn't cause major problems\r\n            count_of_close_unanticipated_errors += 1\r\n        else:\r\n            count_of_close_unanticipated_errors = 0\r\n        # Shut down if we have too many close unanticipated errors\r\n        if (count_of_close_unanticipated_errors >=\r\n            MAX_CLOSE_UNANTICIPATED_ERRORS):\r\n            critical_logger.critical(\r\n                \"\\nExit: too many close unanticipated errors.\")\r\n            sys.exit(1)\r\n        time_of_last_unanticipated_error = time.time()\r\n\r\n    def is_crawling_allowed(self,conn,url):\r\n        \"\"\"Checks that url can be crawled.  At present, this merely\r\n        checks that robots.txt allows crawling, and that url signifies\r\n        a http request.  It could easily be extended to do other\r\n        checks --- on the language of the page, for instance.\"\"\"\r\n        return (self.is_url_a_http_request(url) and\r\n                self.does_robots_txt_allow_crawling(conn,url))\r\n\r\n    def is_url_a_http_request(self,url):\r\n        \"\"\"Checks that url is for a http request, and not (say) ftp.\"\"\"\r\n        u = urlparse(url)\r\n        if u.scheme == \"http\":\r\n            return True\r\n        else:\r\n            info_logger.info(\"%s: not a http request\" % url)\r\n            return False\r\n\r\n    def does_robots_txt_allow_crawling(self,conn,url):\r\n        \"\"\"Check that robots.txt allows url to be crawled.\"\"\"\r\n        cur = conn.cursor()\r\n        cur.execute(\r\n            \"select robot_file_parser from robot_parser where domain='%s'\" \\\r\n                % domain(url))\r\n        rfp_db = cur.fetchone()\r\n        if rfp_db: # we've crawled this domain before\r\n            rfp = cPickle.loads(str(rfp_db[0]))\r\n        else: # we've never crawled this domain\r\n            rfp = robotparser.RobotFileParser()\r\n            try:\r\n                rfp.set_url(\"http:\/\/\"+domain(url)+\"\/robots.txt\")\r\n                rfp.read()\r\n            except:\r\n                info_logger.info(\"%s: couldn't read robots.txt\" % url)\r\n                return False\r\n            rfp_pickle = cPickle.dumps(rfp)\r\n            cur = conn.cursor()\r\n            cur.execute(\r\n                \"insert into robot_parser(domain,robot_file_parser) values(%s,%s)\", (domain,rfp_pickle))\r\n            conn.commit()\r\n        if rfp.can_fetch(\"*\",url):\r\n            return True\r\n        else:\r\n            info_logger.info(\"%s: robots.txt disallows fetching\" % url)\r\n            return False\r\n\r\n    def add_to_index(self,conn,url,content,tree):\r\n        title = get_title(tree)\r\n        cur = conn.cursor()\r\n        try:\r\n            cur.execute(\"insert into pages (url,title,content) values(%s,%s,%s)\", \r\n                        (url,title,content))\r\n        except UnicodeEncodeError:\r\n            info_logger.info(\"%s: Couldn't store title %s\" % (url,title))\r\n        conn.commit()\r\n\r\ndef get_length(url,headers):\r\n    \"\"\"Attempts to find the length of a page, based on the\r\n    \"content-length\" header.  If the information is not available,\r\n    then sets the length to MAX_LENGTH. Otherwise, sets it to the\r\n    minimum of the \"content-length\" header and MAX_LENGTH.\"\"\"\r\n    try:\r\n        length = int(headers[\"content-length\"])\r\n    except (KeyError,ValueError):\r\n        info_logger.info(\r\n            \"%s: could not retrieve content-length\" % url)\r\n        length = MAX_LENGTH\r\n    if length > MAX_LENGTH:\r\n        info_logger.info(\r\n            \"%s: had length %s, truncating to %s\" %\r\n            (url,length,MAX_LENGTH))\r\n        length = MAX_LENGTH\r\n    return length\r\n\r\ndef get_title(tree):\r\n    \"\"\"Takes an lxml etree for a HTML document, and returns the text from\r\n    the first (if any) occurrence of the title tag.\"\"\"\r\n    x = tree.xpath(\"\/\/title\")\r\n    if len(x) > 0 and x[0].text:\r\n        return x[0].text\r\n    else:\r\n        return \"\"\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n<\/pre>\n<p>(Feedback on the code is welcome.  I&#8217;m not an experienced Python programmer, and I know I could learn a lot from people with more experience.  Aspects of this code that I suspect could be substantially improved with advice from the right person include the way I do error-handling, and logging.)<\/p>\n<p>I used this code on a <a href=\"http:\/\/www.slicehost.com\/\">Slicehost 512   slice<\/a> &#8211; a very lightweight Xen virtual private server, definitely not heavy machinery!  Download speed maxed out at around 15 crawler threads, and so that&#8217;s the number of threads I used.  The bottleneck appeared to be CPU (which was routinely in the range 50-80 percent), although, as we&#8217;ll see below, we were also using a substantial fraction of network capacity.  Neither memory nor disk speed seemed to be an issue.<\/p>\n<p>The crawler downloaded 5043 pages in 229 seconds, which is 22 pages per second.  Another way of looking at it is that that&#8217;s about 2 million pages per day.  The total length of the downloaded pages was 386 million characters &#8211; around 370 megabytes, assuming UTF-8 encoding, which gives a sustained download rate of about 1.7 megabytes per second.  Using <tt>wget<\/tt> alone I&#8217;m ordinarily able to get several times that (3-6 megabytes per second), and sometimes up to peak speeds over 10 Mb\/sec.  This suggests that the bottleneck is not network speed, although, as we&#8217;ll see below, a substantial fraction of the program&#8217;s time is spent downloading.<\/p>\n<p>I profiled the crawler using <a href=\"http:\/\/code.google.com\/p\/yappi\/\">yappi<\/a>, a multi-threaded Python profiler.  (The standard Python profiler only copes with a single thread.)  Here are some things I learnt (note that all numbers are approximate): <\/p>\n<ul>\n<li> Networking &#8211; both downloading data and making connections &#8211;   consumes 40 percent of time.\n<li> A surprising fraction of that time (6-7 percent) seems to be   spent on just opening the connection, using <tt>urllib2.urlopen<\/tt>.   I&#8217;m not sure what&#8217;s taking the time &#8211; DNS lookup maybe?\n<li> Another surprising aspect of networking is that dealing with   <tt>urllib2<\/tt> errors takes 5 percent of the time.\n<li> Redis consumes about 20 percent of time.  Well over half of that   is spent in <tt>client.py._execute_command<\/tt>.\n<li> The parser for <tt>robots.txt<\/tt> is suprisingly CPU intensive,   consuming about 5 percent of time.\n<li> <tt>urlparse<\/tt> consumes about 4 percent of the time.\n<li> For reasons I don&#8217;t understand, <tt>lxml<\/tt> and <tt>etree<\/tt>   don&#8217;t show up in the profiling results.  I have no idea why this is.\n<li> <tt>append_urls<\/tt> consumes only 3-4 percent of the time.\n<li> MySQL and the logging were both about 1\/4 of a percent. I must   admit to being suspicious about the first of these, when compared to   the Redis results.  It&#8217;s true that the program makes far more calls   to Redis than MySQL &#8211; half a million or so calls to Redis, versus   about 16,000 to MySQL &#8211; but this still seems wrong.  Other   possibilities that deserve further consideration: (1) I&#8217;ve got Redis   poorly configured; or (2) Redis lists are slower than I thought.\n<\/ul>\n<p>Many actions suggest themselves.  Most significantly, I&#8217;ll probably eliminate Redis, and store the queue of urls to be crawled (the url frontier) using a combination of MySQL (for persistence) and in-memory Python queues (for speed).  This is the right thing to do regardless of performance.  The reason is that Redis stores the url frontier in-memory, and that severely limits the size of the url frontier, unless you spend a lot of money on memory.  The obvious thing to do is to move to a disk-based solution to store the url frontier.<\/p>\n<p>With some alterations it should be easy to make this crawler scalable, i.e., to make it run on a cluster, rather than a single machine.  In particular, because the crawler is based on whitelisting of domains, all that&#8217;s needed is to start the crawl off by allocating different parts of the whitelist to different machines.  Those sites can then be crawled in parallel, with no necessity for inter-machine communication.<\/p>\n<p>Many natural follow-on questions suggest themselves.  Are any of these top blogs serving spam, or otherwise compromised in some way?  What fraction ban crawling by unknown crawlers (like this one)?  What fraction of pages from the crawl are duplicates, or near-duplicates? Do any of the webpages contain honeypots, i.e., url patterns designed to trap or otherwise mislead crawlers?  I plan to investigate these questions in future posts.<\/p>\n<p>  <em>Interested in more?  Please <a href=\"htp:\/\/www.michaelnielsen.org\/ddi\/feed\/>subscribe to this blog<\/a>, or <a href=\"http:\/\/twitter.com\/\\#!\/michael_nielsen\">follow me on Twitter<\/a>.  My new book about open science, <a href=\"http:\/\/www.amazon.com\/Reinventing-Discovery-New-Networked-Science\/dp\/product-description\/0691148902\">Reinventing Discovery<\/a> will be published in October, and can be <a href=\"http:\/\/www.amazon.com\/Reinventing-Discovery-New-Networked-Science\/dp\/product-description\/0691148902\">pre-ordered at Amazon<\/a>.<\/em> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post I describe a simple, single-machine web crawler that I&#8217;ve written, and do some simple profiling and benchmarking. In the next post I intend to benchmark it against two popular open source crawlers, the scrapy and Nutch crawlers. I&#8217;m doing this as part of an attempt to answer a big, broad question: if&hellip; <a class=\"more-link\" href=\"https:\/\/michaelnielsen.org\/ddi\/benchmarking-a-simple-crawler-working-notes\/\">Continue reading <span class=\"screen-reader-text\">Benchmarking a simple crawler (working notes)<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-15","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"_links":{"self":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts\/15","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/comments?post=15"}],"version-history":[{"count":0,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/posts\/15\/revisions"}],"wp:attachment":[{"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/media?parent=15"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/categories?post=15"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/michaelnielsen.org\/ddi\/wp-json\/wp\/v2\/tags?post=15"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}