{ "metadata": { "name": "planesweepcoverage.ipynb" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Plane-Sweep Coverage analysis

\n", "\n", "

Michael Schatz (mschatz@cshl.edu)

\n", "
\n", "\n", "In this notebook, we will consider the problem of computing the coverage across a sequence given a set of alignments. The alignments could come from reads aligned to a genome, or the layout of reads within a newly assembled contig. It could also be used to track other non-read features, such as the gene regions or other annotations.\n", " \n", "For example, given these five read alignments, the goal is to compute the coverage vector below it:\n", "\n", " pos: 1 10 20 30 40 50 60 70 80 90\n", " r1: [==========================]\n", " r2: [================]\n", " r3: [================================]\n", " r4: [=====================]\n", " r5: [===================]\n", " \n", " cov: 11111222222233333444444333332222333333332222221111111\n", "\n", "And here is an example with real data: blue and red lines represent PacBio reads, with the coverage profile above them in black, and green marking the position of maximum coverage:\n", "\n", "![Sample coverage plot](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA4kAAAEACAYAAAAeDjNsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9s1Pd9x/HXOSuwDAs1oXayUZOYuLYPQmwX24mmxBcWqNeMwogmo8nZNKgSIVqnBSIasa1OlIRBSBNghTpT02nKMrepZoVUoWAzHYFWnE3p2mAbVBcyqBRMnGTEzpqUZZ/9Ye7L+fy93/e9+37vng/pm5jv3X3v8z1/7+vv6/v+fL5fnzHGCAAAAAAASSX5bgAAAAAAwD0IiQAAAAAACyERAAAAAGAhJAIAAAAALIREAAAAAICFkAgAAAAAsDgaEj/55BPV19drxYoVkqTx8XGtXLlSFRUVWrVqlSYmJqzn7t69W1VVVfL7/Tp27JiTzQIAAAAAxOBoSNy1a5f8fr98Pp8kad++faqoqNCvfvUrzZs3T9/5znckSZcuXdLevXt1+PBh7du3Tx0dHU42CwAAAAAQg2Mh8Te/+Y1ef/11ffnLX5YxRpLU39+vdevWaebMmVq7dq1CoZAkKRQKqbW1VRUVFWppaZExRuPj4041DQAAAAAQg2Mh8etf/7qeeeYZlZRce4uBgQHV1NRIkmpqatTf3y9pMiTW1tZaz6uurrYeAwAAAADkjiMh8Uc/+pHKyspUX19vVRElTfk5kXAXVQAAAABA7vyeEwv96U9/qv379+v111/XRx99pA8++EAPPvigGhsbNTw8rPr6eg0PD6uxsVGS1NzcrL6+Puv1p0+fth6LdNttt+nXv/61E00GAAAAANdbsGCBRkZGHH0Pn0mlvJeGI0eOaOfOnXrttde0Y8cOXbhwQTt27NDmzZt16623avPmzRodHVVLS4sOHTqks2fPauPGjTp58uT0xvp8KVUjgVzq7OxUZ2dnvpsBTMO2Cbdi24SbsX3CrXKRiRypJEYLdx1dv3692tvbVV1drYaGBm3fvl2SVF5ervXr12vp0qWaMWOGurq6ctEsAAAAAEAUx0NiS0uLWlpaJEmlpaV69dVXbZ/3yCOP6JFHHnG6OQAAAACAOBy9TyJQTAKBQL6bANhi24RbsW3Czdg+UcwcH5OYTYxJBAAAAFDMcpGJqCQCAAAAACyERAAAAACAhZAIAAAAALAQEgEAAAAAFkIiAAAAAMBCSAQAAAAAWAiJAAAAAAALIREAAAAAYCEkAgAAAAAshEQAAAAAgIWQCAAAAACwEBIBAAAAABZCIgAAAADAQkgEAAAAAFgIiUAR8D3uk+9xX76bAQAAAA8gJAJFxOcjKAIAACA+QiJQ4AiGAAAASIVjIfHChQu69957tXDhQgUCAb388suSpPHxca1cuVIVFRVatWqVJiYmrNfs3r1bVVVV8vv9OnbsmFNNA4qaz+ebMgEAAACRHAuJn/rUp/Tcc89pcHBQP/zhD/W3f/u3Gh8f1759+1RRUaFf/epXmjdvnr7zne9Iki5duqS9e/fq8OHD2rdvnzo6OpxqGlBwkg1+xhhrinwtAAAAEOZYSLzppptUV1cnSZo7d64WLlyogYEB9ff3a926dZo5c6bWrl2rUCgkSQqFQmptbVVFRYVaWlpkjNH4+LhTzQM8LTIMhv8fGf5SCYzRzyc0AsA17BcBFKOcjEkcGRnR4OCgmpqaNDAwoJqaGklSTU2N+vv7JU2GxNraWus11dXV1mNAMUnlYCQyIIbZVQvjiXx+ZGgEADfLZD9lF/zslse+EECxcjwkjo+Pq62tTc8995xmz56d9IGrxM4ZxSe6OhjvOeHvUrzvVCrft+jXUFkEUEji7dOS2fcCQDH5PScXfuXKFT3wwAN68MEHtXLlSklSY2OjhoeHVV9fr+HhYTU2NkqSmpub1dfXZ7329OnT1mOROjs7rZ8DgYACgYCTq4Ai5fP50gpY2WCMmXYgE13hSyYgZtqGyPfL5+cBoHBkY1+STm+LsPD+1e5xu31v9LLYDwLIh2AwqGAwmNP39BmH9njGGP31X/+15s6dq29961vW/B07dujChQvasWOHNm/erFtvvVWbN2/W6OioWlpadOjQIZ09e1YbN27UyZMnpzaWHTQcZhfM8v2+kcEw3e+A7/Gry/hmeusU6335TgKZK4bvUXgdM11XuwCX6EJcae0zbdqZr78PABAtF383HKsk/uQnP9FLL72kxYsXq76+XpK0bds2rV+/Xu3t7aqurlZDQ4O2b98uSSovL9f69eu1dOlSzZgxQ11dXU41DbAVHcbszjBHytbZ8OjxhNESnd3OlegdUqLuWRxEAYnZfY8K4bsTvT+NnJfuutrtM8P7Rrt9dKrLT/Q6u3UphN8VANhxrJLohGI424r8SVQti9VtKd1tMtuhM+57ZVhJlJI7gx/ZTZXvKjCVXXCy+3c0L36Xktm/JbufSBQq3dADJFKsNrBfBJAtnq4kIveo6KQnmQpd9Flju1tQ2Em2G1S+q4TJSBSK43WTRWFLZfstpu0h3ti2sOjxxbH2NfE+t0Sff7xuk8k8P5nXpCtWmEylKpiNbqzpsvtsY50MCD9u9zoA03FiJb8IiQUg3hlbLjwSW7Jnn2N1O0q0PLsuoskGLLdK5bYakjfCL6bK5CA20WuiA08hHjBnenGp6OdH7kuSHSMXr+t89PNj7bfineDK1gmhWAErURf8ZJaVC/H+NsQashDv9wlg+v6G70r+EBI9Jp1qIQfq8bt5ZUuqQbNYOPF5F9r4LTdI9w9zKgEhVmiJF0q8+vvN5n43mbAXa16s8Bi97Gh23eyzPe7PbjnRlTivs/tdFOJ6AtkUa9wxcouQ6ALx/lDYhZtUviipdFkqBrn8o1zsn3W6kv0d2R04F9Jnno31inWgH4vdH+Zk3j+dfVLk+0T+P5oXf79OtDMby0ymJ0OsCmM22xGPV37HiST7neFvM3CNkyehkDpCYh7FOpvuROWvmLu4cLY2f1LpYhWvuhTru+GWs42x3tfJbS/RspMZMxtvGbG6zUU/lqlkP5tY3SDz/b0u5J4ahbxubkFVEfmSzvc7mSEFqezTE/1dR34REnMsmYPcZOenKl6XpVhtcrNEB+aRvLJOhciui1WikyF222qi70Aqy0+mvXZidVuOfCzW61LZBpMZU5bMe0f3QIi1rES/i+jn5qO6F6+raq7bUgwHNYW4Tm6VblWRYDmd3f4gG59TMr267PYLqexbsy2Zv4HxeppEr2e87TPZv9Wx9t9sw+5ESMyj6C9FMl2BsvmesSo1bu7elezOx23txlTJHGRHb5vJiHdWPtltJ9ZJhnjPSXQwkkwwi7U/SBRaU9nus3HQlM7vJVuy+Vk42RYgHamc3EpmnxXrNXb7xXiv84Lo734yJ9cyfR+7z9OuB0aszzvb4dXuPcKPxfs9J3vsmahHUPTfF7ueH3Zt9eo2VwwIiTnixgATbyfhVPfUZHfayR7IF8ofuGKRbvBL5z1SWU4yZ0+TWV4ybY11ptuJrj/JPicdxfZdo+slciXR/jFRVStRgIwXIpw4aI+3/0zlexXrJFcy+2e745lUv9OxjktSqRbG6t2RTk+TePOSLUKkKpkAHr095aJdcAYhMQe8VE6324GlKplKjN3j0a/N5Owo3M3NvyunKvrxThS5qSLnVdEHf9k82GUfg1xJFByit8VY22SqJ1DT7fKaqlg9liJ7MyUbxlLtRWG3jEzXM9OTkomOjWI9ls99USonJ91YIEHyCIlZlM++59kUr6tcovWIFTCTPciK110w2fYC2ZSN7YttNLfSPcEVjQMc5EOiak2yf0fT7U6ebFCM9f1IdII3Xi+MZEJXZKCM95p4y4jVznxJ1JZ01tcN0qkYwz0IiVnmpS9vMpLtmmH3xyLdz6LQPkMA+ZFJRcSLB2QoHLFOdDjVTT+Z944Uq8tkLr43mR5jRC/HDWIF2ETh2iu83PZiRkjMULGdHbE76OJgCoDbZFJNZJ8GN8j39peoy2tYOj2AcE2649sBpxESM1BsO8R4XWCK5TMA4D3JjkMqtpN+QCx2J1ncNK4agPMIiRkqth1jsa0vgOJQbCf9gFTw/QCKDyExRZxpBgDvsLuKIvtxIDlcvAkoXoTEFHAmDQC8p1Au/gDkklevqAkgOwiJcTD2DgC8i/01kBm+Q0DxIiTGwNkzAAAAAMWIkBiF/vcAAAAAillJvhsQ6Y033lBtba2qqqq0Z88ex98vfAGDyEmaDIcERAAAAADFyFUh8ZFHHlFXV5f6+vr07W9/W2NjY468j10gjJwAAAAAoFi5prvp5cuXJUn33HOPJGn58uUKhUK6//77pzzP/srl6VzO3MRZnneQaQEAAABkk2sqiQMDA6qpqbH+7ff7dfz48Tiv8EVMk5Evtcnn2gkAAAAA8sU1lcTk2YeowopWya+N1yuhyJHOyf9xE3EAAOA1xT4cLBgMKhgM5vQ9fcYln/rly5cVCAT085//XJL01a9+Va2trVO6m04e4No3N5UKnE9m2vN9EcuN9xiyxx1bXnHwPX614v5NPnQAAAAv8/l8jgdn13Q3nTNnjqTJK5y+9dZb6u3tVXNzc55bBQAAAADFxVXdTZ9//nk9/PDDunLlijo6OjR37txpz4kdmpNI0w51tSv0cYRUUgEAAIDi4ZrupsnIRWnV50u/u2khdVN1IvhGfx7e2fK8j+6mAAAAhSEXmchVlUQ3mPy8o8KMMi9C5rra6MaASigEAAAA3I+QmKTIgJNJYHRjeLNjd3EfAAAAAIWPkJiG6IpYqqExHL68EhgBAAAAFA9CYhZcC43R6TGDZUYESddV9Og3CgAAABQsQqKDpmSpDHOeq8Jilq4SG6+SSg4FAAAA8oOQmCtXU4+RMg6MYeGQ5ZrwmKK47XZilUieAAAAQEKExHyIDIyxeDP3AQAAAPA4QqJbTQmShtCYDRnfx4RKJAAAAAofIdEjrPs3Zmk8oJ1YYwTtuoUW5ZVZr34MZEUAAAAUMkKi10QklCk/ORge4/HqeMiMJFplUiQAAAA8jJBYKNINJlkMl3bVxeIMkVHrTGgEAACAhxASi11kgHEwz7nqFh4AAAAAYiIkwhK74GVXIbwqjdxXbOMZi2ttAQAA4HWERGQmnCwdHBPp+QqkT3Q5BQAAgGcQEpEdxsS8Nmq+LqrjKql8BgRKAAAA5FFJvhuAImAMuQcAAADwCCqJyJ2rSTGcF7NRYCzI8Y1OFF47ry6aez0CAAAgAUIi8ib5oJJal9XwGEavXlG1IIMvAAAAPIOQCG+YcquO5IMfgQsAAABIjSNjEh999FHV1taqoaFBX/va1/Tb3/7Wemz37t2qqqqS3+/XsWPHrPnDw8NqaGhQZWWltm7d6kSzUCiMuTZlshj5PFdpDLc51Sn69fLleAIAAIBnOBISly9frsHBQZ04cUIffvihXn75ZUnSpUuXtHfvXh0+fFj79u1TR0eH9ZpNmzZpy5YtGhgY0JEjR3TixAknmoYCE5kX082NdiHKrRMAAADgNEdC4rJly1RSUqKSkhJ94Qtf0JEjRyRJoVBIra2tqqioUEtLi4wxmpiYkCSdOXNGbW1tuvHGG7V69WqFQiEnmoZicDUtphMYw3Es+t9enwAAAIBkOX4LjH/6p3/SihUrJEn9/f2qra21HquurlYoFNLIyIjKysqs+X6/X8ePH3e6aSgC2a40eh0hEgAAAImkfeGaZcuW6eLFi9PmP/3001YofOKJJ1RaWqq/+Iu/kCQZmyN0n814JbvnhXV2dlo/BwIBBQKBFFuOYndt87r6Q5Jj5rwcFsMtd2QduJ8GAACAY4LBoILBYE7fM+2Q2NvbG/fxf/7nf9bBgwd1+PBha15zc7P6+vqsf58+fVqNjY0qLS3V6OioNX9oaEh33nmn7XIjQyKQFcYom/duLGQ+pdeNFwAAAOmJLow9/vjjjr+nI91Nf/zjH+uZZ57R/v37NWvWLGt+U1OTDh48qPPnzysYDKqkpESlpaWSpJqaGnV3d2tsbEw9PT1qbm52omlAXFaX1AzGNUpybVfObIxtJEgDAAAUNp+J17czTVVVVfrd736nG264QZJ01113ae/evZKkXbt2ac+ePZoxY4a6urp09913S5qsHra3t+v999/XmjVrtG3btumN9fnidkUFHJXm/Rnd0E3V1zn5f9OZpwbwvQUAAMiKXGQiR0KiUwiJcK2oAOnWkKhOb3x/+JoDAADYy0UmSntMIoAI0V/UFHJhbrqk5j+oAgAAwBsIiYADpmbG6ABpH9icrDg6enXTNLhtrCYAAACuISQCuRaZIIv0KjAJw2r0w/Q/BQAAyBlCIpBPxiTdEzSz6psvC8uIzy1VSgAAAGSGkAjkmTGFUVBMJYASKAEAANyLkAi4wGRvyuymxcgg5rYxiQAAAHAvQiLgJvHG3qWY76ZW9pzvbuqoq+vO0EQAAADnERIBj0j1ejduqCRGh9KM3z/Wy0mPAAAAWUNIBDwoqUzkC//PSFMCY+4DVTbCadx2J7l4siQAAEBihESgyKQT2DINlm7p5prJkE8CJgAAKBaERAAp8erFb9wSVAEAANyOkAgUqnDpy4FMFxm4vBIaHRsPWUgolwIAAEkl+W4AAGdNOe53MAT4ZIp2AgAAKCRUEoFik0ZQnPKKFCpq0dW7QgpUXqmgAgAApIpKIoDUZFiNDNffAAAA4E5UEgGkbGpOdG7so5ulUxVlyB8AAPACQiKArLANQJncckK+guqeCgAA4BWERADOiUqORso4OHoVgRcAAHgFIRFAbl0NjuHIlMkN7u34ZKww6YZgFm6LkS/pgGzXbrqqAgCAXHH0wjXPPvusSkpK9N5771nzdu/eraqqKvn9fh07dsyaPzw8rIaGBlVWVmrr1q1ONguAixhjP2Vt+RE3q8iHdG6lYfeofBlOAAAASXIsJF64cEG9vb2aP3++Ne/SpUvau3evDh8+rH379qmjo8N6bNOmTdqyZYsGBgZ05MgRnThxwqmmAfACY3ISHPMdIiM5ej9H8iIAAEiSYyFx48aN2rFjx5R5oVBIra2tqqioUEtLi4wxmpiYkCSdOXNGbW1tuvHGG7V69WqFQiGnmgbAq2KVHZ1IkuG3dEF4zIa0qpIAAKAoORISX331Vc2bN0+LFy+eMr+/v1+1tbXWv6urqxUKhTQyMqKysjJrvt/v1/Hjx51oGoACZ4yswMg4vmuoPgIAgGSlfeGaZcuW6eLFi9PmP/XUU9q2bZsOHTpkzTPhC1XYHLH5bI4+7J4X1tnZaf0cCAQUCARSaDWAomMix/pdRegBAAAeEQwGFQwGc/qePhMvkaXh1KlT+pM/+RNdf/31kqTf/OY3+qM/+iOFQiH19/err69Pu3btkiTV1dXp6NGjKi0tVWVlpc6ePStp8oI3s2bN0oYNG6Y21ueLGyAB2PM9fvUKm9/k+5MyX/L3a3Rr19RsXOWVXS8AAO6Qi0yU9e6mixYt0ujoqM6dO6dz585p3rx5OnnypMrLy9XU1KSDBw/q/PnzCgaDKikpUWlpqSSppqZG3d3dGhsbU09Pj5qbm7PdNABIXcQFdDIRfQVTu8ecmqatkkvDLAAAcAfH75MY2Z20vLxc69ev19KlSzVjxgx1dXVZj+3cuVPt7e167LHHtGbNGi1ZssTppgFASiKDopfH6aVTWXR6falUAgDgHlnvbuokupsC6aG7aY7ESVLhYGZXxYu+R2L0PDfKpBoZa93YvQMAkFguMpHjlUQAKBrxdthZqMQZ+WwDpZvEC7dubC8AAJiOkAgAOXAtP9qNEcxOd06fTN6DWDLvH6uamssuvFQtAQCIjZAIAC5gTPbH/bm9yyoAAHAnQiIAuMRkdSs8dtEGvTUBAEAOEBIBwCuMiQqPV/+VQgkyelyj26XafdZL6wYAgFsREgHA60xk9TH5fqv5Hr/oBNt1YgAiAAApISQCQKEJh6LCy4AAACAHCIkAUKBsC2gERwAAkAAhEQCKSeS4xmIJjKleNpbuqQCAIkdIBIBileF9N9xwX8ZMxLzITcQqkRcBAMWIkAgAxSydFBQnWEYHL7ddTdXLoRYAgFwhJAIAUpNhBTLXQc1NIRUAAC8gJAIAUmdMxmMa7aqOToi13KTez8k8S19WAIBLERIBAGmZzDhRQSdPvTk9WS10+LNyJHQTbAGgKBASAQBZMz1DXLuaagY9VNNri8vGQ6bCLuB5dV0AAN5DSAQA5MS1AHn1hzxUHblwDQAAiRESAQB5YVd1TEuGt/GYfOfCC4+OVB7jfEz0RAWAwkFIBAB4mzH2cSjF3Be+76Mbu3UWYogFALhXSb4bAACAI4xJu7rltlDmxuAKAChcjoXE733ve6qtrdXChQu1ZcsWa/7u3btVVVUlv9+vY8eOWfOHh4fV0NCgyspKbd261almAQCKjDHXpin/SDJBGvlyPgEAkE+OdDc9deqUXnjhBe3fv19VVVV65513JEmXLl3S3r17dfjwYZ07d04dHR06efKkJGnTpk3asmWL7rvvPq1cuVInTpzQkiVLnGgeAACSoi6mk0E2K/RKH+MNAaC4OBISDxw4oHXr1qmqqkqS9JnPfEaSFAqF1NraqoqKClVUVMgYo4mJCc2ePVtnzpxRW1ubJGn16tUKhUKERABAzoTv+2iklANjIVT/Cj3oAgCS50h300OHDunUqVNasmSJvvzlL2toaEiS1N/fr9raWut51dXVCoVCGhkZUVlZmTXf7/fr+PHjTjQNAIDEro5nLKYKWtzur74MJwCAp6RdSVy2bJkuXrw4bf5TTz2ljz76SO+9956OHj2qvr4+feUrX9F//Md/yNj8tfXZ/PGwe15YZ2en9XMgEFAgEEir/QAAJCX6b1IWbrmRa3mvdBZKUCymswYAXCMYDCoYDOb0PX0mXiJL06OPPqpAIKD7779fkvSHf/iHOnv2rHp7e9XX16ddu3ZJkurq6nT06FGVlpaqsrJSZ8+elSQ9++yzmjVrljZs2DC1sT5f3AAJwJ7v8ckDNPNNvj+AI5IMQZG32ch7cEuBXbj1UvuzhmMQAC6Qi0zkSHfTu+66SwcOHJAxRqFQSAsWLNCsWbPU1NSkgwcP6vz58woGgyopKVFpaakkqaamRt3d3RobG1NPT4+am5udaBoAANmX4hVTw9w+DjDc4TTeY4kmAID3OHLhmpUrV+rQoUPy+/2qqanRt771LUlSeXm51q9fr6VLl2rGjBnq6uqyXrNz5061t7frscce05o1a7hoDQDAm9LsnhoZqCKrdG4MWkVZRQSAIuJId1On0N0USA/dTQEX8flsu50mCokFFcz4Ww4AactFJnKkkggAAGIwxop9hZT7UpLNC9kQOAEg6wiJAADkSfjejJKKNzDaSKmLrY+cCADZRkgEAMAFpgYdM/2nLIZIN11dNToQWvdmTG0h05EcASBthEQAALwgi/drtBYRYxxkLjn2vm64NyNBFYBHERIBAPCiWAHEDeEoQ6le0dUtVVEAKBSERAAACkm86lUKAdKNt96IJZO2EjABYDpCIgAAxcJY11VVKldXJUgBQHEhJAIAUKRiXSwn2Yqjmy6Ak3OMNwRQwAiJAABgKpPePRx9Vo2yCIKjU2M/CZ8AXICQCAAAppmSVTIIjEiRA9mT3AkgVYREAAAQX8wrqWZh0cVQdYxCgAbgdoREAACQlsnsGDmWMbPlEZ4AwB0IiQAAICuuFRzTG9MIAHAHQiIAAMg6YzK/tkshdkVNtlrKOEIA+URIBAAAjkgu6CSXJumKCgC5Q0gEAAD5FU6TaRQO81FtdDqwUkUEkG+ERAAA4ArTwpFLe5tmI5hSGQXgZoREAADgTia1K6f6ZGTky2kASzcwxn1dNsIx5UgAGSAkAgAA17N6pLqsuujaimA6XXdduioAcq/EiYUODQ3pz/7sz1RXV6cVK1ZoeHjYemz37t2qqqqS3+/XsWPHrPnDw8NqaGhQZWWltm7d6kSzAACAxxkzdcok2Rj5XDkBQL45EhKfeOIJ/dVf/ZX+8z//U3/5l3+pJ554QpJ06dIl7d27V4cPH9a+ffvU0dFhvWbTpk3asmWLBgYGdOTIEZ04ccKJpgEAgEJjJcbU5T8STp/yxeebOgEoXo50N50zZ47effdd/d///Z/effddffrTn5YkhUIhtba2qqKiQhUVFTLGaGJiQrNnz9aZM2fU1tYmSVq9erVCoZCWLFniRPMAAEABmsyJJuMxffmu5rm2CyuAouFISHzmmWfU1NSkb3zjG7r55putqmB/f79qa2ut51VXVysUCmn+/PkqKyuz5vv9fv3rv/6rNmzY4ETzAABAAYtVVEylOua1oOZIsE1mkQxkBApS2iFx2bJlunjx4rT5Tz31lP7lX/5FX/3qV/Xwww/r29/+ttauXasf/OAHMjY7Ep/NHtvueWGdnZ3Wz4FAQIFAIK32AwCA4pKNW2wkCmP5CpdOvm++K6tAsQsGgwoGgzl9T5+Jl8jSdNNNN+ncuXP6/d//fU1MTOi2227TxYsX9dprr6mvr0+7du2SJNXV1eno0aMqLS1VZWWlzp49K0l69tlnNWvWrGmVRJ/PFzdAArDne3zyD7z5Jt8fAEjIF/s2GsmGxHwHq2yGxrjrwnEZkHO5yESOXLjm3nvv1f79+yVJr776qpYtWyZJampq0sGDB3X+/HkFg0GVlJSotLRUklRTU6Pu7m6NjY2pp6dHzc3NTjQNAAAgviwefEVfjMaLF7+J+16+6Re8iTUB8A5HKomDg4N68sknNTQ0pEWLFunv/u7vVFNTI0natWuX9uzZoxkzZqirq0t33323pMnbZrS3t+v999/XmjVrtG3btumNpZIIpIVKIgBkSYK0E11JjPXvRFJ9vtdwOAekLxeZyJGQ6BRCIpAeQiIA5JgvuyEx391XE0k1zHI4B6QvF5nIkaubAgAAFLXwAZy7s13WpBxi7Z5OcgRcg5AIAADgkGu5x0T8NwlphEvPd01NYZ3Jk4CzCIkAAABuY66Fykwu+hKvwuflUJnLC+EQSFGMCIkAAAAuNhm8w0oeAAANf0lEQVRSwt1Xc5OO3D4GMlu8HJQBJxESAQAAvCJeWcuhXOeTkVHse0cWipjBuJDyMmVRJImQCAAAUABsj/+zHHDsglS+wmMxBFcgXwiJAAAAhSpO5ShuvMowXOaqu6qbu8VmEmDdvF4oDoREAAAATBUZLjPIK3ZBqVgCULGsJwoTIREAAAAxhS+cYyRHxud5vcuoY2GQ8YPII0IiAAAAkmMTXDK9Tce15SR3u45cVeiSDa+ZhlwqjnAjQiIAAAAyMj07hm/Z4cz7ha+46iQnlx8ZLPMRgIFECIkAAABwhH2Pyeze8zGVSp7rQxhdTOEShEQAAADk3tVAZMWiHOQ3t1TtuH0H3I6QCAAAgPxL+nYdJuuBMhfdV6PZvl++C51UMnEVIREAAACeEr7iqqT8ByugABESAQAA4FmpFL+yNAwSKHiERAAAABSFqYEyuxfQCQuPNXT9RXLsJPlZJDOekp6r3kZIBAAAQPEyRlGX0JmUhfBYzBenCX98hEVvKkn3ha+88ooWLlyo6667TidPnpzy2O7du1VVVSW/369jx45Z84eHh9XQ0KDKykpt3brVmn/lyhWtW7dO8+fPVyAQ0MWLF9NtFgAAAJA5Y65N2V60fAU/wdvSDom33367enp6dM8990yZf+nSJe3du1eHDx/Wvn371NHRYT22adMmbdmyRQMDAzpy5IhOnDghSerp6dHly5c1PDys1tZWPfnkk+k2CwAAAMiuyMAYb8qQFyqPyUREeF/a3U1ramps54dCIbW2tqqiokIVFRUyxmhiYkKzZ8/WmTNn1NbWJklavXq1QqGQlixZolAopPb2dl1//fV66KGH9IUvfCHdZgEAAAD5YXVdlTK5VYfXg1Zk+xP12qU7qjulXUmMpb+/X7W1tda/q6urFQqFNDIyorKyMmu+3+/X8ePHrdf4/X5J0g033KDR0VF9/PHH2W4aAAAAkDMO9lgtGD5fchNyK24lcdmyZbbjA59++mmtWLHC9jXG5lvgs/nNGmOs+caYKa+zWwYAAADgVbZXVo1GGIJLxA2Jvb29KS+wublZfX191r9Pnz6txsZGlZaWanR01Jo/NDSk5uZm6zVDQ0Oqrq7We++9p/Lycs2cOdN2+Z2dndbPgUBAgUAg5TYCAAAAbjMZJNPvporCFAwGFQwGc/qeWbkFRmTlr6mpSY8++qjOnz+vs2fPqqSkRKWlpZImxzF2d3frvvvuU09Pj55//nlJkyHxpZde0vLly/XCCy/ozjvvjPlekSERAAAAKDTxOtXR9bL4RBfGHn/8ccffM+2Q2NPTo46ODo2Njen+++9XfX29Dhw4oPLycq1fv15Lly7VjBkz1NXVZb1m586dam9v12OPPaY1a9ZoyZIlkqQ///M/149//GPV1taqsrJS3d3dma8ZAAAAUGAYlYVc8BkPDQD0+XyMVwQAAABQtHKRibJ+dVMAAAAAgHcREgEAAAAAFkIiAAAAAMBCSAQAAAAAWAiJAAAAAAALIREAAAAAYCEkAgAAAAAshEQAAAAAgIWQCAAAAACwEBIBAAAAABZCIgAAAADAQkgEAAAAAFgIiQAAAAAACyERAAAAAGAhJAIAAAAALIREAAAAAICFkAgAAAAAsBASAQAAAAAWQiIAAAAAwJJ2SHzllVe0cOFCXXfddfrZz35mze/t7dWSJUu0ePFirVq1Sv39/dZjw8PDamhoUGVlpbZu3WrNv3LlitatW6f58+crEAjo4sWL6TYLAAAAAJCBtEPi7bffrp6eHt1zzz3y+XzW/M985jP60Y9+pF/+8pfauHGjNm/ebD22adMmbdmyRQMDAzpy5IhOnDghSerp6dHly5c1PDys1tZWPfnkkxmsEpAfwWAw300AbLFtwq3YNuFmbJ8oZmmHxJqaGn3uc5+bNr+urk433XSTJOnuu+/WqVOn9Mknn0iSzpw5o7a2Nt14441avXq1QqGQJCkUCqm9vV3XX3+9HnroIWs+4CX8MYFbsW3Crdg24WZsnyhmjo5J/Ld/+zfddddduu666zQyMqKysjLrMb/fr+PHj0uS+vv75ff7JUk33HCDRkdH9fHHHzvZNAAAAACAjd+L9+CyZctsxwc+/fTTWrFiRdwFv/nmm/r7v/979fb2SpKMMVMeN8ZY3VSNMVMej34uAAAAACBHTIYCgYD52c9+NmXehQsXzOc+9znz05/+dMr8W2+91fp5586d5h//8R+NMcZs3LjR/Pu//7sxxph3333XfP7zn7d9rwULFhhJTExMTExMTExMTExMRTktWLAg0wiXUNxKYrJMROXvv//7v3X//fdr+/btuuuuu6Y8r6amRt3d3brvvvvU09Oj559/XpLU3Nysl156ScuXL9cLL7ygO++80/Z9RkZGstFcAAAAAEAMPmPS69vZ09Ojjo4OjY2Nac6cOaqvr9eBAwf05JNP6h/+4R9UVVVlPbe3t1dz587V0NCQ2tvb9f7772vNmjXatm2bpMlbYDz88MPq6+tTZWWluru7rYvfAAAAAAByJ+2QCAAAAAAoPI5e3TSb3njjDdXW1qqqqkp79uzJd3NQoG655RYtXrxY9fX1ampqkiSNj49r5cqVqqio0KpVqzQxMWE9f/fu3aqqqpLf79exY8es+cPDw2poaFBlZaW2bt1qzb9y5YrWrVun+fPnKxAI2F4YCghbu3atysvLdfvtt1vzcrU9vvLKK6qurlZ1dbV++MMfOrym8Bq7bbOzs1Pz5s1TfX291bsojG0TuXLhwgXde++9WrhwoQKBgF5++WVJ7DvhDrG2T1fuPx0f9ZgldXV15siRI+att94y1dXV5p133sl3k1CAbrnlFvPuu+9Ombd9+3bzla98xXz00Udmw4YN5plnnjHGGDM6Omqqq6vNf/3Xf5lgMGjq6+ut1/zpn/6p6e7uNmNjY+aP//iPzcDAgDHGmO9///vmgQceMB9++KHZtm2b2bBhQ+5WDp7zxhtvmJMnT5pFixZZ83KxPX7yySemsrLSvPnmm+YXv/hFTgbIw1vsts3Ozk7z7LPPTnsu2yZy6e233zY///nPjTHGvPPOO+bWW281H3zwAftOuEKs7dON+09PVBIvX74sSbrnnns0f/58LV++XKFQKM+tQqEyUT2w+/v7tW7dOs2cOVNr1661tr1QKKTW1lZVVFSopaVFxhjrzOSZM2fU1tamG2+8UatXr57ymvb2dl1//fV66KGH2I4R1913361Pf/rTU+blYnscHBzUokWLtGjRIi1evFh+v1+Dg4M5XHO4nd22KU3ff0psm8itm266SXV1dZKkuXPnauHChRoYGGDfCVeItX1K7tt/eiIkDgwMqKamxvq33+/X8ePH89giFCqfz6elS5dq1apV2r9/v6Sp219NTY36+/slTX4Ja2trrddWV1crFAppZGREZWVl1vzI7bW/v19+v1+SdMMNN2h0dFQff/xxTtYNhcHp7fGjjz5SKBSy5ke/Bohnz549uvPOO7V9+3aNj49LmtzO2DaRDyMjIxocHFRTUxP7TrhOePtsbm6W5L79pydCIpArP/nJT/SLX/xC27Zt08aNG3Xx4kXbMzux+Hy+afOMMdZ8Y8yU5aWybEBKbZvJ5vZotywg0vr163Xu3DkdPHhQv/71r9XV1SXJfrti24TTxsfH1dbWpueee06zZ89m3wlXidw+/+AP/sCV+09PhMTGxkadPn3a+vfg4GDMeykCmbj55pslSbW1tfrSl76k1157TY2NjRoeHpY0OUi4sbFR0uT9PYeGhqzXnj59Wo2Njbrttts0OjpqzR8aGrLOEkW+5r333lN5eblmzpyZk3VDYXB6e5w1a9a0ZUW+BoilrKxMPp9Pc+bM0YYNG9TT0yOJbRO5d+XKFT3wwAN68MEHtXLlSknsO+EedtunG/efngiJc+bMkTR5hdO33npLvb29fOmQdf/zP/9jlfffeecdHTx4UK2trWpubtaLL76o3/72t3rxxRetExRNTU06ePCgzp8/r2AwqJKSEpWWlkqa7MrS3d2tsbEx9fT0TPnivvTSS/rwww/1wgsvcLIDKcvF9uj3+3Xq1Cm9+eab+uUvf6nBwUEtXLgwPysMz3j77bclSf/7v/+rl19+WV/84hclsW0it4wxWrdunRYtWqSvfe1r1nz2nXCDWNunK/efSV2KxwWCwaCpqakxCxYsMLt27cp3c1CAzp49a+644w5zxx13mKVLl5rvfve7xhhjPvjgA/OlL33JfPaznzUrV6404+Pj1muef/55s2DBAlNbW2veeOMNa/7g4KCpr683t9xyi/nGN75hzf/d735n/uZv/sZ89rOfNS0tLebtt9/O3QrCc9asWWNuvvlmM2PGDDNv3jzz4osv5mx7/P73v2+qqqpMVVWV+cEPfpCbFYZnhLfNT33qU2bevHnmu9/9rnnwwQfN7bffbj7/+c+br3/961OuFM22iVw5evSo8fl85o477jB1dXWmrq7OHDhwgH0nXMFu+3z99ddduf/0GcOgKAAAAADAJE90NwUAAAAA5AYhEQAAAABgISQCAAAAACyERAAAAACAhZAIAAAAALAQEgEAAAAAFkIiAAAAAMBCSAQAAAAAWP4fyzwBccqyU+QAAAAASUVORK5CYII=)\n", "\n", "\n", "##
The main result is with a smart algorithm and smart data structure
we can accelerate the basic brute force approaches by **over 300 fold!**
##\n", "\n", "Note the algorithms and data structures presented are also generally useful for many other genomic analyses working with intervals. \n", "\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Table of Contents

\n", "\n", "** Part I: Brute Force Analysis **\n", "\n", "1. [Extract read placements from the Celera Assembler](#extract)\n", "2. [Load the read positions](#load)\n", "3. [Plot the read layouts](#plotlayouts)\n", "4. [Brute-Force coverage analysis](#bruteforce)\n", "5. [Compute the max coverage](#maxcoverge)\n", "6. [Plot the read depth histogram](#plothistogram)\n", "7. [Delta encode the coverage profile](#deltaencode)\n", "8. [Plot the coverage profile](#plotcoverage)\n", "\n", "** Part II: Plane Sweep Analysis **\n", "\n", "1. [Plane-sweep coverage](#planesweepcoverage)\n", "2. [Checking for errors](#checkerrors)\n", "3. [Refine the plane-sweep update rule](#fixplanesweep)\n", "4. [Heap-based plane-sweep algorithm](#heap)\n", "5. [Max-coverage read ids](#maxreadids) \n", "6. [Plot the max depth position](#plotmaxdepth) \n", "7. [Summary](#summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Part I: Brute force analyiss

\n", "\n", "1. Extract the PacBio read placements from the Celera Assembler\n", "---------------------------------------------------------------\n", "\n", "To get started, we will first load in the layout of PacBio reads within a contig assembled using the Celera Assemler. In this example, we will be using the read placements from our recent assembly of the [SKBR3 Breast Cancer Cell Line](http://schatzlab.cshl.edu/data/skbr3/). The information of how reads are placed is stored within the \"tigStore\", which is an on-disk binary database. Because it is a binary database, you cannot read it directly, but instead should use the 'tigStore' command to query or update the database.\n", "\n", "These commands will display the unitig layouts:\n", "\n", " $ cd /seq/schatz/mnattest/skbr3/local_assembly/assembly_HLA_Feb17/CA/r1\n", " \n", " $ /sonas-hs/schatz/hpc/home/gurtowsk/sources/wgs.svn/Linux-amd64/bin/tigStore -g r1.gkpStore -t r1.tigStore/ 1 -d layout -U | head -20\n", " unitig 0\n", " len 0\n", " cns\n", " qlt\n", " data.unitig_coverage_stat 1.000000\n", " data.unitig_microhet_prob 1.000000\n", " data.unitig_status X\n", " data.unitig_suggest_repeat F\n", " data.unitig_suggest_unique F\n", " data.unitig_force_repeat F\n", " data.unitig_force_unique F\n", " data.contig_status U\n", " data.num_frags 2884\n", " data.num_unitigs 0\n", " FRG type R ident 18318 container 0 parent 16010 hang 2962 1054 position 19674 0\n", " FRG type R ident 2248 container 18318 parent 18318 hang 217 -4192 position 15480 218\n", " FRG type R ident 8536 container 2248 parent 2248 hang 609 -4597 position 827 10881\n", " FRG type R ident 10602 container 2248 parent 2248 hang 848 -420 position 15057 1067\n", " FRG type R ident 16010 container 0 parent 18318 hang 1054 2962 position 22636 1089\n", " FRG type R ident 4611 container 18318 parent 18318 hang 1155 -3014 position 1157 16657\n", " \n", "This shows the first few reads (fragments, coded as FRG) within unitig 0. Note here we are looking at the \"version 1\" unitigs, meaning the raw output of the unitigger based on the overlapper before consensus has run. Consequently the unitig length is listed as 0 bp, although there are many reads (2884) reads in this unitig. Also note the unitig consensus and quality strings are also both empty for this reads.\n", "\n", "\n", "** Finding the longest reads in the assembly** \n", "\n", "For the coverage analysis, the most important values are the last two numbers for the FRG records which encodes the position of the read in the contig. These positions are based on the pair-wise overlap of each read with its immediate neighbors, so the positions may change a little during consensus generation but will show the overall trends. For example, the first read in this unitig is read 18318, and spans from position 19674 to position 0. Because the start position is greater than the end, we know this read has been reverse complemented and really starts at position 0.\n", "\n", "For PacBio assemblies, we are often interested in how well the longest reads were assembled. This command will determine which unitig has the longest read in it by annotating each FRG record with the unitig id and the read length as fields 1 and 2:\n", "\n", " $ /sonas-hs/schatz/hpc/home/gurtowsk/sources/wgs.svn/Linux-amd64/bin/tigStore -g r1.gkpStore -t r1.tigStore/ 1 -d layout -U \\\n", " | awk '{if (/^unitig/){tig=$2}else if(/^FRG/){l=$15-$14;if(l<0){l*=-1;} print tig,l,$0}}' \\\n", " | sort -nrk2 | head -10 | column -t\n", "\n", " 3 37839 FRG type R ident 5708 container 0 parent 10683 hang 122 20199 position 416262 454101\n", " 3 36519 FRG type R ident 5022 container 0 parent 167 hang 1199 9553 position 402020 365501\n", " 4 35962 FRG type R ident 3652 container 0 parent 10438 hang 199 13564 position 15094 51056\n", " 0 34550 FRG type R ident 11392 container 0 parent 11328 hang 1010 18989 position 182086 216636\n", " 15 34302 FRG type R ident 2622 container 0 parent 5938 hang 6855 7174 position 948470 914168\n", " 15 33695 FRG type R ident 16040 container 0 parent 16226 hang 5741 9003 position 539963 573658\n", " 15 33671 FRG type R ident 5938 container 0 parent 11492 hang 3720 16069 position 941296 907625\n", " 17 33504 FRG type R ident 2341 container 0 parent 9030 hang 192 15632 position 382387 415891\n", " 3 33148 FRG type R ident 8132 container 0 parent 7619 hang 1171 14849 position 162425 129277\n", " 3 32963 FRG type R ident 5377 container 0 parent 14108 hang 1184 17010 position 102322 69359\n", " \n", "This means that unitig 3 (column 1) has the longest read at 38,839bp which is read 5708. It also has the second longest read with 36,519bp, which is read 5022.\n", "\n", "\n", "** Report the layout of a selected unitig **\n", "\n", "This command will report the read layout for just unitig 3, and just save the readid, start position, and end position to a file:\n", "\n", " $ /sonas-hs/schatz/hpc/home/gurtowsk/sources/wgs.svn/Linux-amd64/bin/tigStore -g r1.gkpStore -t r1.tigStore/ 1 -d layout -u 3 \\\n", " | grep '^FRG' | awk '{print $5,$14,$15}' > ~/readid.start.stop.txt\n", " \n", "With ``head`` and ``tail`` we can check on the output format\n", "\n", " $ head -3 ~/readid.start.stop.txt\n", " 1 0 19814\n", " 2 799 19947\n", " 3 1844 13454\n", " \n", " $ tail -3 ~/readid.start.stop.txt\n", " 1871 973590 965902\n", " 1872 966703 973521\n", " 1873 973632 966946\n", "\n", "Now we can load this file and examine the coverage across the contig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Load the read position file\n", "------------------------------\n", "\n", "The expected file format is: ``readid \\t startpos \\t endpos``\n", "\n", "If ``startpos > endpos``, the read has been reverse complemented\n", " \n", "It would also be very easy to update the code to process BED, GTF, GFF or even SAM files.\n", "\n", "Here is the example readid.start.stop.txt file from the [breast cancer assembly](http://schatzlab.cshl.edu/teaching/exercises/coverage/readid.start.stop.txt)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import matplotlib.pyplot as plt\n", "from collections import deque\n", "import heapq\n", "import time\n", "\n", "\n", "## Limit the number of reads to load, -1 for unlimited\n", "MAXREADS = -1\n", "\n", "## Limit the number of reads to plot\n", "MAX_READS_LAYOUT = 500\n", "\n", "## Path to reads file\n", "READFILE = \"/Users/mschatz/build/planesweep/readid.start.stop.txt\"\n", "print \"Loading reads from \" + READFILE\n", "f = open(READFILE)\n", "\n", "## This will have a list of tuples containing (readid, start, end, rc)\n", "## rc is 0 if the read was originally oriented forward, 1 for reverse\n", "\n", "reads = []\n", "totallen = 0\n", "readidx = 0\n", "\n", "for line in f:\n", " line = line.rstrip()\n", " fields = line.split()\n", " readid = fields[0]\n", " start = int(fields[1])\n", " end = int(fields[2])\n", " rc = 0\n", "\n", " if (start > end):\n", " start = int(fields[2])\n", " end = int(fields[1])\n", " rc = 1\n", "\n", " readinfo = (readid, start, end, rc)\n", " reads.append(readinfo)\n", "\n", " if (end > totallen):\n", " totallen = end\n", "\n", " readidx += 1\n", " if (readidx == MAXREADS):\n", " break\n", "\n", "print \"Loaded layout information for %d reads\" % (len(reads))\n", "\n", "for i in xrange(3):\n", " print \" %d: %s [%d - %d] %d\" % (i, reads[i][0], reads[i][1], reads[i][2], reads[i][3])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(reads)-3, len(reads)):\n", " print \" %d: %s [%d - %d] %d\" % (i, reads[i][0], reads[i][1], reads[i][2], reads[i][3])\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Loading reads from /Users/mschatz/build/planesweep/readid.start.stop.txt\n", "Loaded layout information for 1873 reads\n", " 0: read1 [0 - 19814] 0\n", " 1: read2 [799 - 19947] 0\n", " 2: read3 [1844 - 13454] 0\n", " ...\n", " 1870: read1871 [965902 - 973590] 1\n", " 1871: read1872 [966703 - 973521] 0\n", " 1872: read1873 [966946 - 973632] 1\n", "\n", "\n", "\n" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "3. Plot the read layouts" ] }, { "cell_type": "code", "collapsed": false, "input": [ "## Note to keep it tractable, we only plot the first MAX_READ_LAYOUT reads\n", "\n", "plt.figure(figsize=(15,4), dpi=100)\n", "\n", "print \"Plotting layout\"\n", "\n", "## draw the layout of reads\n", "for i in xrange(min(MAX_READS_LAYOUT, len(reads))):\n", " r = reads[i]\n", " readid = r[0]\n", " start = r[1]\n", " end = r[2]\n", " rc = r[3]\n", " color = \"blue\"\n", "\n", " if (rc == 1): \n", " color = \"red\"\n", "\n", " plt.plot ([start,end], [-2*i, -2*i], lw=4, color=color)\n", "\n", "plt.draw()\n", "plt.show()\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Plotting layout\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAA4kAAAEACAYAAAAeDjNsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9sVfX9x/HX6b5DdDZEUNAEWy12bS8oP6QFs0ArAezI\nEINZIEt1mWwSgyMqIWjIsmpU4pSNQgZBo1uMM6hLGjUZ49dSfizhtozNSVtIOnC4RAqIUXCibPt8\n/yj32Nvee3vv7T3nfM65z0fSRE5v7z2XHK48eX/OOY4xxggAAAAAAEklQe8AAAAAAMAeRCIAAAAA\nwEUkAgAAAABcRCIAAAAAwEUkAgAAAABcRCIAAAAAwGVVJO7bt081NTWqrKzUpk2bgt4dAAAAACg6\njk33SZw6dapaWlpUXl6uu+66SwcOHNC1114b9G4BAAAAQNGwZpL46aefSpJmz56t8vJyzZ8/X/F4\nPOC9AgAAAIDiYk0kdnR0qLq62v11LBbTwYMHA9wjAAAAACg+1kQiAAAAACB4/xf0DiTU1tZq9erV\n7q87OzvV2NiY9BjHuUXSP3zeMwAAAACww4QJE9TT0+Ppa1gTiaNGjZLUd4XTsrIy7dq1Sz//+c8H\nPOofklJfZ8fIyfq1HJlBj3fSPK+NcnmvhXvR8Pz+BKW5uVnNzc1B7wYwCMcmbMWxCZtxfMJWjuN9\nC1gTiZK0YcMGLV++XJcuXdLKlStTXtk0basU8PcqkAjLUphiFgAAAED4WBWJ9fX16u7uzu+Hc5l0\n5diAtoVZIPvjQTcznAQAAADsY1Uk+qUvTgYUir3DQ4REQ0ND0LsApMSxCVtxbMJmHJ8oZo4x4Znn\nOI4jX3bXsfN8xUIsg7XlvUhMEgEAAIBc+dFERTlJHNLA33RLpow2BR4AAACAaCISszCoGdNEY5iv\nmDqQJxfvYXQIAAAAWK8k6B0II1oHAAAAQFQxScxT/1DM5lYlttxWI8zTTQAAAADeY5JYAMZkP10k\n0gAAAADYjEliIQ0oRaPMU0abg9GRYVktAAAAUISIRI+5oZXjalMrlqcO3AWqEQAAAIg8lpv6JZc1\nqf1YNW10nOxOwAQAAAAQWkwSfWaM8rrvol2xmP5bDBsBAACAcCMSg3C5pAb21FBDOiuWoA4lh13M\nFL7EJgAAABAMItEiyWF0+RchvfDNcDkOoQgAAAAEgUi0XDbLU0MxYcxHod8W1QkAAAAMiQvXhAFx\nAwAAAMAnTBJDoq8TB8RiRAeInhnulVmJdQAAABQBIjHE8r0HYzpBnuMY2SWzAAAAQMgQiVGQacJV\ngPsapgq4QgflcJ6PwAQAAAAKh0iMuqGWSA4zIm0LNE+noZffKqtOAQAAEGVEYrHrXzx29V5efInW\noV6CigQAAECIEYlwpW2bLLsryvdtHMi2CSoAAABQKEQihpaiHo2yW6lalDE18DeGySIAAABChPsk\nIm+5tE8xTRkBAACAMGOSiOGJ2DmNAAAAQLEjElEwKSeLKcKx2KaKxfVuAQAAEHZEIryVKEemjAAA\nAEAoEInwhTHKOxSNHGunj1ldmMcRF68BAABAaBCJ8M8wpoqhv0pqNpeCTSAoAQAAECAiEb5L1UC5\nNJQNMk02Qx+0AAAAKGpEIqyQcXiWobn8XIbaP/4IQQAAAEQVkQj7GWPFqNG3IPXhrbKiFQAAAOkQ\niQiHpPsxZq6oxJQv6IvdFGLaGPR7AAAAQPEhEhE+ScEY3G54hTAEAABAkIhEhJolK1FTIvYAAAAQ\nRkQiQm/Q+XVZRmMYLj6TTWhyfiEAAAAKiUhE9OR4P8YwTfxShq39rUvJAgAAhEiJF0+6evVq1dTU\naNq0aXrkkUf0xRdfuN/buHGjKisrFYvFdODAAXd7d3e3pk2bpoqKCq1du9aL3UKRMSb5K+3jLKgs\nIyerLwAAAMBrnkTi/Pnz1dnZqUOHDunzzz/X66+/Lkk6ffq0Nm/erD179mjLli1auXKl+zOrVq3S\nmjVr1NHRob179+rQoUNe7BqK2FCxGKTsEnF4XwAAAEA2PInEefPmqaSkRCUlJbrrrru0d+9eSVI8\nHldjY6PKyspUX18vY4wuXLggSTp27JiWLFmiMWPGaPHixYrH417sGpBVLDK5AwAAQLHyJBL7e+ml\nl7Rw4UJJUnt7u2pqatzvVVVVKR6Pq6enR2PHjnW3x2IxHTx40OtdQ7HLcrTo/YzPm6+BPJsqDlzX\nm+oLAAAAoZH3hWvmzZunU6dODdr+7LPPulH41FNPqbS0VN///vclSSbFXxadFPcvSPW4hObmZve/\nGxoa1NDQkOOeAwMY42ZTuttpODKhmywWZH8JPAAAgEC1tbWpra3N19d0TKYiG4bf/va3eumll7Rn\nzx6NHDlSkvTuu+9q9+7damlpkSRNmTJF+/fvV2lpqSoqKnT8+HFJ0vr16zVy5EitWLEieWcdJ2NA\nAgVzuRYTcRjGSPQMfwYBAAAC40cTebLc9I9//KOef/55vfPOO24gSlJdXZ127NihkydPqq2tTSUl\nJSotLZUkVVdXa9u2bTp79qxaW1s1Y8YML3YNyM7lZZK5/PkLfoGpP18AAACINk8miZWVlfrqq680\nevRoSdIdd9yhzZs3S5JaWlq0adMmjRgxQlu3btWsWbMkSV1dXWpqatInn3yipUuXat26dYN3lkki\nguAwSUxwlFs4AwAAoLD8aCLPlpt6gUiENdKdvJj4dr+Jm01xWahJIH8MAQAAguFHE+V94RqgqA38\ngzlENNqiYMGa69NQlQAAAKHh+S0wgKJABAEAACAimCQChdI/FLOctIXpQjDDmkL2n7QS1AAAAFYj\nEgEPJHVQhrby+3zF4URpwYI2i7dMRwIAAASHSAS8NsT5i2GaJgIAACD6iETAb0nLUu294M1wp5zE\nLwAAQDgRiYAlbLpVRiFkej8EJAAAgL2IRCBIxuR+O4kIGDKI+3+bExQBAAB8RSQCATMmu1Wntk/f\nPJuEOg6hCAAA4CPukwhYgAYCAACALZgkArbIdqRoqVwmnVE7/xIAACBKiETAJulGiiGORwAAAIQL\nkQiEgTEyyq0VQzGtY50tAACAdYhEIEQSTRWWC92EIlQBAACQhEgEQigspy8OGao5vgcGjwAAAN4j\nEoGQGjKYLlekDRPFhGFPFtP9OPUIAABQMEQiUARsWvbpSbRm+fZoSQAAgKERiUCRCWKy2D9S8w1W\nmyaiAAAAUUYkAkXGpqliLnLdb6ISAAAgP0QiEFXupVCH/1RRCa7hXOyHpaoAAKBYEIlAxPXFTeEu\nhxrWSWRUQhcAAMBrRCJQLHK5yeIQwhRcYY1aAACAoBCJQLHJdt1kljEZlgjz7PYbUcKaWgAAICIR\nQDr9giEpHdLE0sDpoq3x6MUU1Nb3CgAAkA8iEUBujMlqqubIWBlPNu4TAACATYhEADlLWpWYQ3OF\n6VzGoRCbAAAgqohEAMOT4602EnEVpWDMGuf8AQCAECASARTEwP4p0B03rJWIXCaKAAAgaohEAJ5w\n78/YXwR7qv9ElEEhAACIAiIRgG+MUU7LUsO2JHWo6SkRCQAAwoBIBOAvk1im2Sfqy1IBAADChkgE\nECj3ujdZxGKYz/8L21QUAAAULyIRgBVyicVMiDEAAIDhIRIBWOXr8/ZM3he6sek2G0nTzyzfT6r9\n5nxGAADgFyIRgLUGhVEe0RjGJaop93m4b4PKBAAAWSrx8snXr1+vkpISnTt3zt22ceNGVVZWKhaL\n6cCBA+727u5uTZs2TRUVFVq7dq2XuwUgrIYZOo5MoNPFxOun+wIAALCBZ5H44YcfateuXSovL3e3\nnT59Wps3b9aePXu0ZcsWrVy50v3eqlWrtGbNGnV0dGjv3r06dOiQV7sGIMyMyfw1DENF3HC/sn19\nAACAIHkWiY899ph+8YtfJG2Lx+NqbGxUWVmZ6uvrZYzRhQsXJEnHjh3TkiVLNGbMGC1evFjxeNyr\nXQMQYcbIDcZCrLD0NhtTf3nCcXL/AgAARcmTSHz77bc1fvx43XbbbUnb29vbVVNT4/66qqpK8Xhc\nPT09Gjt2rLs9Fovp4MGDXuwagGJTgCmj/5no/xQTAAAgIe8L18ybN0+nTp0atP2ZZ57RunXrtHPn\nTnebSdw8O8Vf0pwU/1qd6nEAMFxJV05NKJKJWV6h2O+3ho9lAACKR96RuGvXrpTbjxw5ohMnTmjy\n5MmSpH/961+6/fbbFY/HNWPGDO3evdt97NGjR1VbW6vS0lL19va627u6ujRz5syUz9/c3Oz+d0ND\ngxoaGvJ9CwCQVD+DLqZaHP0IAAAs1tbWpra2Nl9f0zEej+1uvvlm/eUvf9Ho0aPV29ur+vp67dy5\nU8ePH9djjz2mw4cPS5IWLFig+++/X3PnztU999yjDRs2aPr06ck76zhMGQH4jlhkkggAgC38aCLP\n75PYfznpuHHj9NBDD2nOnDkaMWKEtm7d6n7vhRdeUFNTk5544gktXbp0UCACQFASn8NDxaKt92Tk\nnEQAAJALzyeJhcQkEUDQMoWirZGYFp+nAACETiQmiQAQJdlOFQdKTPNsCUlHhgvTAACAlIhEAMhD\nyqjKs/9sWA6aLnqJRwAAig+RCACFku+Y0WJ+vhWCFAAAOxCJAFBoqWony9iyZTmql2yYnAIAgPSI\nRADwwdfdmF1ApgqpIAIyXdAVQ8wCAFCsiEQACFguq1T7R5sfoZbra2SaEhKWAACEA5EIAJYwRnlf\n/EayfxnnUPvH+Y8AANiBSAQAm7hjxTx/PKTTOtsDFwCAYkIkAoCFUk+6TE7xSHgBAIB8EIkAECLJ\n8Ri9W24AAIDgEYkAEHaXy9FIQ04abViO6veEk/MPAQDIDZEIAFHSv4jS9GCYlqEW8uqqAAAgO0Qi\nABQpG6aKAADAPkQiAESUMcV3uiJLSwEAGD4iEQAibFA0RT0aU1Ux5QgAQE5Kgt4BAICPCCYAADAE\nJokAUGS+7sQiXI8KAACGRCQCAHLm51VEucAOAAD+IhIBoJjluvz08uTRyPEtFIf9OsNoTFbnAgCK\nEZEIAMieBZdMLdRkkXsqAgCQGpEIAMhNv1AcGGx+hFeur8FyVQAAcsPVTQEAnrEh0BwZ9wsAAAyN\nSSIAIHfG9CVX8A04SLbTTc43BAAgNSIRAOCrRMR5NdnL9LxJ3wsgcAlTAEAYEIkAgPwNqJ5BDZRD\niAW1NJVlqAAAJCMSAQDeMcbKJan9+Rqnlv9eDIlRKAAUBSIRAOCpQV3hQSgxDRzMhosGAQDCiUgE\nAPjrcjUOZ2lqVi8TsUgihAEAfiESAQB2yHYpo5N9/IU1rKIWuACAcCESAQDhkioms2yqooovzh8E\nAOSJSAQAhF7KHsqiB8M6acxGDgPXgqBJASA6iEQAQDTlcGVVI8eKYMw06bRh/wAAxYFIBABEVq5X\nVi2q5agAAKRBJAIAisfAakzRhI6MlbFoy7QTABB9RCIAoGi5zZgmFm3F+X8AAC8RiQAAGNOXhCli\nMTFVtCkauSgNAMBLJV498W9+8xvV1NRo4sSJWrNmjbt948aNqqysVCwW04EDB9zt3d3dmjZtmioq\nKrR27VqvdgsAgPQy1JCRE9ovAABy4RhT+H8fPHLkiH7yk5/o1VdfVWVlpc6cOaPrrrtOp0+f1uzZ\ns7Vz506dOHFCjz76qA4fPixJWrBggX74wx9q7ty5WrRokTZs2KDp06cn76zjyIPdBQBgkMS0LpvI\nsmnK6AX+1wsA9vCjiTxZbrp9+3YtW7ZMlZWVkqTrrrtOkhSPx9XY2KiysjKVlZXJGKMLFy7o6quv\n1rFjx7RkyRJJ0uLFixWPxwdFIgAAfsl0vuKgx0ZgWhf10AUAZM+T5aY7d+7UkSNHNH36dP34xz9W\nV1eXJKm9vV01NTXu46qqqhSPx9XT06OxY8e622OxmA4ePOjFrgEAkBtjchqlBb+4NL+vVG85x7cO\nAIiIvCeJ8+bN06lTpwZtf+aZZ3Tx4kWdO3dO+/fv1+7du/Xwww/rT3/6U8qxqJPi7HuWlAIArNP/\n/00ZrhwThaliwd8C/18HgFDJOxJ37dqV9nv79+9XQ0ODrrzySi1cuFDLly/XxYsXNWPGDO3evdt9\n3NGjR1VbW6vS0lL19va627u6ujRz5syUz93c3Oz+d0NDgxoaGvJ9CwAA5GfQ/RYjEIZeisrvD7EL\nIABtbW1qa2vz9TU9OSfxjjvu0Pbt27VgwQK1t7drwoQJGjlypOrq6rR69WqdPHlSx48fV0lJiUpL\nSyVJ1dXV2rZtm+bOnavW1lZt2LAh5XP3j0QAAMIkqPP+IjHdBIAiNXAw9uSTT3r+mp5E4qJFi7Rz\n507FYjFVV1frl7/8pSRp3LhxeuihhzRnzhyNGDFCW7dudX/mhRdeUFNTk5544gktXbqUi9YAAMIj\ny6WoUl+w+R2LXrwe4QkA0eXJLTC8wi0wAAChMiAYHRk3EoOIxSBEKib5OwgAC4T2FhgAAEApzl1M\n8RCLI6pgEUtcAUCoEIkAAPjEGKW9cmj/ILMlHAu2HwOX4BKNAGA1T+6TCAAA0jAmp0ZKdR9DAAC8\nRCQCABCEHGMRAAC/sNwUAIAgJUrRjhWm/nDSX7SHcAaA4BGJAABYIDmOTHRuQJ9G2vMdc33bVCUA\nFByRCACAjS7Hj5tA0W7GJDmdg+nQiQBQaEQiAABhMOh2GtGtxpyvqprq4ZQjAOSNC9cAABBGw4gg\nm66Wmrh6K1dxBQB7MEkEACCsUoViDhNGv6PMlvs/AgAyIxIBAIiSdBPGIeKxf8B5FY/ZPG/aq57m\nvATVgiBlySuAkCISAQAoBv2DZYh+CnrilyoUMwVm0PsLAFFDJAIAUGT6etFkjEVbzg8kAAHAf0Qi\nAABFKmk1pKUt5nWsEqEAMBiRCAAABixHLaJw4rxBABiESAQAAMlM8lLUdNM2W5ak5sN9T9n2MDEJ\noIgQiQAAYBBjshsoFs1yzYG/GUQjgAgjEgEAQEpuB2XZgWGcLOYduV4tySU+AViASAQAAAVVNNNF\nAIgoIhEAAGSWZrplpLRTxjBOFa3gQV8znASQKyIRAADkL+mqqGkeEqHJIvELoBgQiQAAoCDSncOY\nKazCFpCF2F9CE4DtSoLeAQAAEDGsbwSAUGOSCAAACi7bW2j0x4Tta3Q2gCAxSQQAAJ4whtgBgDBi\nkggAADyVORTTjxyNHKaLABAAIhEAAAQrUZEpWjFsF7YZCtELIAyIRAAAYAW3FVN0YdTjimW5AGxC\nJAIAAKuku5UGAMAfRCIAALBTFpdIDWI5qhdTzf5vk6kigKARiQAAwF4ZzlcMSiHCNOrLZwGEG5EI\nAACsl+l8xQRHJjQXusm4n4V4C4wjAQwDkQgAAEKjf/ukC0Y/p3RhiVIAyAWRCAAAQqkvGIc+b9FL\n1i4bzeO3hOEjgAQiEQAAhJuF5y0CQJgRiQAAIBK+nixm+XhLq9La6SSAolHixZN2dXXpe9/7nqZM\nmaKFCxequ7vb/d7GjRtVWVmpWCymAwcOuNu7u7s1bdo0VVRUaO3atV7sFgAAQBJHxrovAAiaJ5H4\n1FNP6f7779ff/vY3/eAHP9BTTz0lSTp9+rQ2b96sPXv2aMuWLVq5cqX7M6tWrdKaNWvU0dGhvXv3\n6tChQ17sGgAAiDhjvv5C9hwn+QtA8fJkuemoUaP08ccf63//+58+/vhjXXPNNZKkeDyuxsZGlZWV\nqaysTMYYXbhwQVdffbWOHTumJUuWSJIWL16seDyu6dOne7F7AACgSGRzNdRBPxPwMlSmiQCC5kkk\nPv/886qrq9Pjjz+uG264wZ0Ktre3q6amxn1cVVWV4vG4ysvLNXbsWHd7LBbT7373O61YscKL3QMA\nAEVo0GRxiHsu2sb3eM3m5RjXApGUdyTOmzdPp06dGrT9mWee0auvvqqf/vSnWr58uX7961/rgQce\n0JtvvimT4oPESfHPeqkel9Dc3Oz+d0NDgxoaGvLafwAAUOSMSUpB25dYZgrXoKefALzT1tamtrY2\nX1/TMZmKLE/XX3+9Tpw4oSuvvFIXLlzQLbfcolOnTundd9/V7t271dLSIkmaMmWK9u/fr9LSUlVU\nVOj48eOSpPXr12vkyJGDJomO42QMSAAAgGFznLRBNlSM2TiBHK6M75m/lwG+86OJPLlwzZ133ql3\n3nlHkvT2229r3rx5kqS6ujrt2LFDJ0+eVFtbm0pKSlRaWipJqq6u1rZt23T27Fm1trZqxowZXuwa\nAABAZoQPgCLnySSxs7NTTz/9tLq6ujRp0iT97Gc/U3V1tSSppaVFmzZt0ogRI7R161bNmjVLUt9t\nM5qamvTJJ59o6dKlWrdu3eCdZZIIAAB8kG7pabaTxMTjojBZ5K9egF38aCJPItErRCIAAAjUECcu\npotEv88ZLGSc8lcvwC5+NJEnVzcFAACIpCH+YmaUuSPDOFks1AV9iE0gPIhEAACAAjJG2d0+ItNz\nRGi5akL/2CQYAbsRiQAAAIWWqCDuTAEghIhEAAAAj3w9MRteNNp+H8QoTTwBEIkAAAC+yXqZ5RBN\n6EeU2R6mALxDJAIAANjGJK6KmvrCMakCrtDhmOvz5RSVqR7KiYqANYhEAAAAixXiQjh+GHakOnQi\nYAsiEQAAwHbuhXByq8XQLRn1enepUCArRCIAAEBY5BmLqRTlxWby+G2jK1GMiEQAAICwSVEuiS0F\n6MfwTSDzVJShDGSBSAQAAIiQpH4sQOv5HVLFEqiAzYhEAACAqMq0VjKHFjNyfIvFoKZ7aeM0Ss3K\n2llkiUgEAAAoQhl7IU0Y2TblY7ko4A0iEQAAAMlyuEAOoQZED5EIAACA1IzpS0C7BoiesCl2bZvY\novgQiQAAAMgsx1hMRE6q8CKAssT5gwgQkQgAAIDspAoXmg+IHCIRAAAAeevfjYW4RyOA4BGJAAAA\nKAg3GHOIxcSS1LAtQw3rfgPZIBIBAABQWCYRUCm/mfPIsf+5jX5FWbYXshnuBW+MHM4/hHWIRAAA\nAPjL5B6K/Tkynseil8/fPywdGXfySivCFkQiAAAA/OfeizHY3QAwGJEIAACAwHw9PcswRssQkjbd\n37A/zlVEmBGJAAAAsFryMkwTiuljunglHhEGRCIAAABCxRgNOxSDjDUuVgPbEYkAAAAInwKe0+jH\nhXAGv6hFE0WCFQMQiQAAAAitXPrGpi6zisPlVZGsJOgdAAAAAADYg0kiAAAAisKgC+AkMGEEkhCJ\nAAAAKGp98Zg4x5FiBIhEAAAAIMEMLxZtvW9jVgrYx5zeGG5EIgAAADBQ/8rJIxiNHN+C0e8rs4Y6\nhJEVIhEAAADIJN1YzJKlqTZGGxdMDTciEQAAAMhHqgLKoht9vyejj2wMVuSOSAQAAAAKxO3GPDow\nKoHl51JbeCPv+yS+9dZbmjhxor7xjW/o8OHDSd/buHGjKisrFYvFdODAAXd7d3e3pk2bpoqKCq1d\nu9bdfunSJS1btkzl5eVqaGjQqVOn8t0tAAAAIHjGyJjcl1saOVZ/oTjkHYm33nqrWltbNXv27KTt\np0+f1ubNm7Vnzx5t2bJFK1eudL+3atUqrVmzRh0dHdq7d68OHTokSWptbdWnn36q7u5uNTY26umn\nn853t4DAtLW1Bb0LQEocm7AVxyZsVsjjMxGL2UZj8CmY/isb/R/nOOm/YK+8I7G6ulrf/va3B22P\nx+NqbGxUWVmZ6uvrZYzRhQsXJEnHjh3TkiVLNGbMGC1evFjxeNz9maamJl111VV68MEH3e1AmPCX\nHdiKYxO24tiEzTw9PnMpxgjLFJAEZbDyjsR02tvbVVNT4/66qqpK8XhcPT09Gjt2rLs9Fovp4MGD\n7s/EYjFJ0ujRo9Xb26svv/yy0LsGAAAA2KVfMBZ5M8IiGS9cM2/evJTnBz777LNauHBhyp8xKY5u\nJ0X+G2Pc7caYpJ9L9RwAAABA5BlT0JvaA/nIGIm7du3K+QlnzJih3bt3u78+evSoamtrVVpaqt7e\nXnd7V1eXZsyY4f5MV1eXqqqqdO7cOY0bN05XXHHFoOeeMGFCyuAEbPHkk08GvQtAShybsBXHJmzG\n8WkPEuBrEyZM8Pw1CnILjP6Tv7q6Oq1evVonT57U8ePHVVJSotLSUkl95zFu27ZNc+fOVWtrqzZs\n2CCpLxJfe+01zZ8/Xy+++KJmzpyZ8nV6enoKsbsAAAAAgDQck+faztbWVq1cuVJnz57VqFGjNHXq\nVG3fvl2S1NLSok2bNmnEiBHaunWrZs2aJalvetjU1KRPPvlES5cu1bp16yT13QJj+fLl2r17tyoq\nKrRt2zZdf/31BXqLAAAAAIBs5R2JAAAAAIDoKfjVTb2yb98+1dTUqLKyUps2bQp6dxBRN910k267\n7TZNnTpVdXV1kqTz589r0aJFKisr0z333OPe0kWSNm7cqMrKSsViMR04cMDd3t3drWnTpqmiokJr\n1651t1+6dEnLli1TeXm5GhoaUl4YCkh44IEHNG7cON16663uNr+Ox7feektVVVWqqqrS73//e4/f\nKcIm1bHZ3Nys8ePHa+rUqUmriySOTfjnww8/1J133qmJEyeqoaFBr7/+uiQ+O2GHdMenlZ+fJiSm\nTJli9u7daz744ANTVVVlzpw5E/QuIYJuuukm8/HHHydte+6558zDDz9sLl68aFasWGGef/55Y4wx\nvb29pqqqyvzzn/80bW1tZurUqe7PfPe73zXbtm0zZ8+eNd/5zndMR0eHMcaYN954w9x7773m888/\nN+vWrTMrVqzw780hdPbt22cOHz5sJk2a5G7z43j873//ayoqKsz7779v3nvvPTNhwgQf3zXCINWx\n2dzcbNavXz/osRyb8NNHH31k/vrXvxpjjDlz5oy5+eabzWeffcZnJ6yQ7vi08fMzFJPETz/9VJI0\ne/ZslZeXa/78+YrH4wHvFaLKDFiB3d7ermXLlumKK67QAw884B578XhcjY2NKisrU319vYwx7r9M\nHjt2TEuWLNGYMWO0ePHipJ9pamrSVVddpQcffJDjGBnNmjVL11xzTdI2P47Hzs5OTZo0SZMmTdJt\nt92mWCyhTzP2AAAD70lEQVSmzs5OH985bJfq2JRS38KKYxN+uv766zVlyhRJ0rXXXquJEyeqo6OD\nz05YId3xKdn3+RmKSOzo6FB1dbX761gspoMHDwa4R4gqx3E0Z84c3XPPPXrnnXckJR9/1dXVam9v\nl9T3h7Cmpsb92aqqKsXjcfX09Gjs2LHu9v7Ha3t7u2KxmCRp9OjR6u3t1ZdffunLe0M0eH08Xrx4\nUfF43N0+8GeATDZt2qSZM2fqueee0/nz5yX1HWccmwhCT0+POjs7VVdXx2cnrJM4PhO3BLTt8zMU\nkQj45c9//rPee+89rVu3To899phOnTqV8l920kl1H09jjLvdGJP0fLk8NyDldswU8njkHrUYykMP\nPaQTJ05ox44d+sc//qGtW7dKSn1ccWzCa+fPn9eSJUv0q1/9SldffTWfnbBK/+PzW9/6lpWfn6GI\nxNraWh09etT9dWdnZ9p7KQLDccMNN0iSampqdPfdd+vdd99VbW2turu7JfWdJFxbWyup7/6eXV1d\n7s8ePXpUtbW1uuWWW9Tb2+tu7+rqcv+VqP/PnDt3TuPGjdMVV1zhy3tDNHh9PI4cOXLQc/X/GSCd\nsWPHynEcjRo1SitWrFBra6skjk3479KlS7r33nt13333adGiRZL47IQ9Uh2fNn5+hiISR40aJanv\nCqcffPCBdu3axR86FNy///1vd7x/5swZ7dixQ42NjZoxY4ZeeeUVffHFF3rllVfcf6Coq6vTjh07\ndPLkSbW1tamkpESlpaWS+paybNu2TWfPnlVra2vSH9zXXntNn3/+uV588UX+sQM58+N4jMViOnLk\niN5//339/e9/V2dnpyZOnBjMG0ZofPTRR5Kk//znP3r99de1YMECSRyb8JcxRsuWLdOkSZP0yCOP\nuNv57IQN0h2fVn5+ZnUpHgu0tbWZ6upqM2HCBNPS0hL07iCCjh8/biZPnmwmT55s5syZY15++WVj\njDGfffaZufvuu82NN95oFi1aZM6fP+/+zIYNG8yECRNMTU2N2bdvn7u9s7PTTJ061dx0003m8ccf\nd7d/9dVX5kc/+pG58cYbTX19vfnoo4/8e4MInaVLl5obbrjBjBgxwowfP9688sorvh2Pb7zxhqms\nrDSVlZXmzTff9OcNIzQSx+Y3v/lNM378ePPyyy+b++67z9x6663m9ttvN48++mjSlaI5NuGX/fv3\nG8dxzOTJk82UKVPMlClTzPbt2/nshBVSHZ9/+MMfrPz8dIzhpCgAAAAAQJ9QLDcFAAAAAPiDSAQA\nAAAAuIhEAAAAAICLSAQAAAAAuIhEAAAAAICLSAQAAAAAuIhEAAAAAICLSAQAAAAAuP4f3nlAymFS\n1JEAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Brute force the coverage profile\n", "-----------------------------------\n", "\n", "There are several techniques for computing the coverage profile. One basic approach would be to separately compute how many reads span each position in the genome. This would take ``O(G * N)`` time where ``G`` is the length of the genome, and ``N`` is the number of reads. This is easy to see since it requires ``O(1)`` time to detemine if a read spans a given position:\n", "\n", " if ((startpos <= querypos) and (querypos <= endpos)):\n", " print \"This read spans the query position!\"\n", "\n", "Here we do something slightly smarter which is to initialize the coverage array with all zeros, and then process each read to increment the coverage array at each position in the read. This requires ``O(G + N*L)`` where ``L`` is the average length of the read. Since ``L << G``, this can be a huge advantage.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Brute force computing coverage over %d bp\" % (totallen)\n", "\n", "starttime = time.time()\n", "brutecov = [0] * totallen\n", "\n", "for r in reads:\n", " # print \" -- [%d, %d]\" % (r[1], r[2])\n", "\n", " for i in xrange(r[1], r[2]):\n", " brutecov[i] += 1\n", "\n", "brutetime = (time.time() - starttime) * 1000.0\n", "\n", "print \" Brute force complete in %0.02f ms\" % (brutetime)\n", "print brutecov[0:10]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Brute force computing coverage over 973898 bp\n", " Brute force complete in 4435.00 ms" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Notice that it took 4435 ms for this to complete **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Compute the max coverage\n", "---------------------------\n", "\n", "With the coverage profile, it is trivial to compute the max coverage or other simple queries" ] }, { "cell_type": "code", "collapsed": false, "input": [ "maxcov = 0\n", "lowcov = 0\n", "LOW_COV_THRESHOLD = 10\n", "\n", "for c in brutecov:\n", " if c > maxcov:\n", " maxcov = c\n", " if c <= LOW_COV_THRESHOLD:\n", " lowcov += 1\n", "\n", "print \"max cov: %d, there are %d low coverage bases (<= %d depth)\" % (maxcov, lowcov, LOW_COV_THRESHOLD)\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "max cov: 49, there are 83209 low coverage bases (<= 10 depth)\n", "\n", "\n", "\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Plot the read depth histogram\n", "--------------------------------" ] }, { "cell_type": "code", "collapsed": false, "input": [ " plt.figure()\n", " plt.hist(brutecov)\n", " plt.show()\n" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEACAYAAABCl1qQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGtdJREFUeJzt3X9sleX9//HX6RQIsxp+hNpP2gJtmtOelrYH155mTHsk\n+5hCAmWQWJbAH9CFBTGoc0YWllgTIyNqQNxESWiyjGARNhJYhp04j4gLpzUiP04PfCgisG9KpbLZ\nU6SjwPv7B/OeCPSi5ccpnOcjuRP6Pue6r+u+6Llfvc997nP7zMwEAEAf0pI9AADA4EdYAACcCAsA\ngBNhAQBwIiwAAE6EBQDAqc+wOH78uB5++GEVFRUpHA5r/fr1kqT6+nplZWUpGAwqGAxq27ZtXptV\nq1YpPz9fgUBAO3fu9OrxeFwTJ05Ubm6uli5d6tV7e3tVV1ensWPHKhwO68SJE95jGzdulN/vl9/v\n16ZNm27YRgMA+sn60N7ebrt37zYzs5MnT9r48eOtq6vL6uvr7ZVXXrns+R0dHeb3++3o0aMWiUQs\nGAx6j02ZMsUaGxuts7PTJk2aZC0tLWZmtmHDBps1a5adPn3ali1bZosWLTIzs/Pnz1tubq7t27fP\n9uzZY3l5eX0NFQBwE/V5ZHH//ferrKxMkjR69GgVFRWppaXlm5C57PnRaFTV1dXKyclRVVWVzEzd\n3d2SpIMHD6q2tlajRo3SzJkzFY1GvTZz5szR8OHDtWDBAq8ei8VUXFys4uJilZSUKBAIKBaL3biU\nBABcs2s+Z9HW1qZYLKZQKCRJeu2111RZWanly5crkUhIkpqbm1VYWOi18fv9ikajamtr05gxY7x6\nIBDQrl27vDaBQECSNHLkSHV0dKinp0fRaNSrf7cNAODWuqawSCQSqq2t1YoVK/T9739fCxcu1JEj\nR9TU1KTDhw/rzTfflHTlow2fz3dZzcy8upld0u5K6+hrXQCAW8D1PtXZs2ftf//3f23FihVXfPzT\nTz+1H/7wh2ZmtmXLFlu8eLH3WGlpqXV1dZmZ2fjx4736yy+/bL/97W/NzOwXv/iF/elPfzIzsy+/\n/NIeeOABMzPbu3evTZ8+3Wszbdo0279//2X95+XlmSQWFhYWln4s/T0P3OeRhZmprq5OxcXFevLJ\nJ716e3u7JOncuXNav369pk6dKkmqqKhQU1OTjh07pkgkorS0NKWnp0uSCgoK1NjYqM7OTm3evNl7\nOysUCmndunU6ffq01qxZo8rKSkkX33bav3+/9u3bp7179yoWi6moqOiyMR4+fNg7Okn15bnnnkv6\nGAbLwlwwF8xF38vhw4f72v1f5q6+Hvzoo4+0bt06lZSUKBgMSpJefPFFvfXWW/r00081ZMgQPfTQ\nQ1q4cKEkKSMjQwsXLtTkyZM1ZMgQ7+0pSXr55Zc1Z84c/epXv9Ls2bP1gx/8QJL0k5/8RO+8844K\nCwuVm5urxsZGSdL3vvc9LVu2TLNmzZIk/eY3v+nXhgEAbpw+w+JHP/qRLly4cFl9ypQpV23zxBNP\n6IknnrisHggE9Mknn1xWv/vuu9XQ0HDFdT366KN69NFH+xoiAOAW4AruO0g4HE72EAYN5uK/mIv/\nYi4GzmdmluxBXA+fz6fbfBMA4Jbr776TIwsAgBNhAQBwIiwAAE6EBQDAibAAADgRFgAAJ8ICAOBE\nWAAAnAgLAIATYQEAcCIsAABOhAUAwImwAAA4ERYAACfCAgDg1Oed8gAM3L33jlQi8c+k9Z+ePkJd\nXaeS1j/uLNz8CLhJfD6fpGT+bvLawNVx8yMAwA1HWAAAnAgLAIATYQEAcCIsAABOhAUAwImwAAA4\nERYAACfCAgDgRFgAAJwICwCAE2EBAHAiLAAAToQFAMCJsAAAOBEWAAAnwgIA4NRnWBw/flwPP/yw\nioqKFA6HtX79eklSIpFQTU2NcnJyNGPGDHV3d3ttVq1apfz8fAUCAe3cudOrx+NxTZw4Ubm5uVq6\ndKlX7+3tVV1dncaOHatwOKwTJ054j23cuFF+v19+v1+bNm26YRsNAOgn60N7e7vt3r3bzMxOnjxp\n48ePt66uLlu+fLk9/vjj1tPTY4sWLbKXXnrJzMw6OjrM7/fb0aNHLRKJWDAY9NY1ZcoUa2xstM7O\nTps0aZK1tLSYmdmGDRts1qxZdvr0aVu2bJktWrTIzMzOnz9vubm5tm/fPtuzZ4/l5eVdcYyOTQCS\nRpJJlsSF1waurr+/H30eWdx///0qKyuTJI0ePVpFRUVqaWlRc3Oz6urqNHToUM2fP1/RaFSSFI1G\nVV1drZycHFVVVcnMvKOOgwcPqra2VqNGjdLMmTMvaTNnzhwNHz5cCxYs8OqxWEzFxcUqLi5WSUmJ\nAoGAYrHYTQlMAEDfrvmcRVtbm2KxmCoqKtTS0qKCggJJUkFBgZqbmyVd3PEXFhZ6bfx+v6LRqNra\n2jRmzBivHggEtGvXLklSc3OzAoGAJGnkyJHq6OhQT0+PotGoV/9uGwDArXXXtTwpkUiotrZWK1as\n0D333KOLRzDXxufzXVYzM69uZpesr691X2ldklRfX+/9OxwOKxwOX/P4ACAVRCIRRSKRAbd3hkVv\nb69mzZqluXPnqqamRpJUXl6ueDyuYDCoeDyu8vJySVIoFNL27du9tgcOHFB5ebnS09PV0dHh1Vtb\nWxUKhbw2ra2t8vv9OnXqlDIyMjRs2DCFQiFt3br1kjZz58694hi/HRYAgMt99w/p559/vl/t+3wb\nysxUV1en4uJiPfnkk149FAqpoaFBZ86cUUNDgyorKyVJFRUVampq0rFjxxSJRJSWlqb09HRJF9+u\namxsVGdnpzZv3nxJWKxbt06nT5/WmjVrvHUFAgHt379f+/bt0969exWLxVRUVNSvjQMA3CB9nf3+\n8MMPzefzWWlpqZWVlVlZWZlt27bNurq6bPr06ZadnW01NTWWSCS8NitXrrS8vDwrLCy0HTt2ePVY\nLGbBYNDGjRtnS5Ys8epnz561efPmWXZ2tlVVVVl7e7v32IYNGyw/P9/y8/Pt7bffviFn9IFbRXwa\nCoNYf38/fP9pdNvy+Xz9OocC3CoXz7El83eT1waurr/7Tq7gBgA4ERYAACfCAgDgRFgAAJwICwCA\nE2EBAHAiLAAAToQFAMCJsAAAOBEWAAAnwgIA4HRN97MAbkf33jtSicQ/kz0M4I7AFwnijjUYvsgv\n2f3z2sDV8EWCAIAbjrAAADgRFgAAJ8ICAOBEWAAAnAgLAIATYQEAcCIsAABOhAUAwImwAAA4ERYA\nACfCAgDgRFgAAJwICwCAE2EBAHAiLAAAToQFAMCJ26ripuG2psCdg9uq4qbhtqbJ75/XBq6G26oC\nAG44wgIA4ERYAACcnGExf/58ZWRkaMKECV6tvr5eWVlZCgaDCgaD2rZtm/fYqlWrlJ+fr0AgoJ07\nd3r1eDyuiRMnKjc3V0uXLvXqvb29qqur09ixYxUOh3XixAnvsY0bN8rv98vv92vTpk3XvbEAgAEy\nhx07dtgnn3xixcXFXq2+vt5eeeWVy57b0dFhfr/fjh49apFIxILBoPfYlClTrLGx0To7O23SpEnW\n0tJiZmYbNmywWbNm2enTp23ZsmW2aNEiMzM7f/685ebm2r59+2zPnj2Wl5d3xfFdwyYgSSSZZElc\n6B+4mv7+fjiPLB588EGNGDHiSiFzWS0ajaq6ulo5OTmqqqqSmam7u1uSdPDgQdXW1mrUqFGaOXOm\notGo12bOnDkaPny4FixY4NVjsZiKi4tVXFyskpISBQIBxWKxAUYiAOB6DPicxWuvvabKykotX75c\niURCktTc3KzCwkLvOX6/X9FoVG1tbRozZoxXDwQC2rVrl9cmEAhIkkaOHKmOjg719PQoGo169e+2\nAQDcWgMKi4ULF+rIkSNqamrS4cOH9eabb0q68tHGxc/aX8rMvLqZXdLuSuvoa10AgJtvQFdwf3OU\ncN9992nRokV67LHH9Mtf/lKhUEjbt2/3nnfgwAGVl5crPT1dHR0dXr21tVWhUEiSFAqF1NraKr/f\nr1OnTikjI0PDhg1TKBTS1q1bL2kzd+7cK46nvr7e+3c4HFY4HB7IZgHAHSsSiSgSiQy4/YDCor29\nXZmZmTp37pzWr1+vqVOnSpIqKir0zDPP6NixY/rss8+Ulpam9PR0SVJBQYEaGxv14x//WJs3b9bK\nlSslXQyLdevW6ZFHHtGaNWtUWVkp6eLbTvv379e+fftkZorFYioqKrrieL4dFgCAy333D+nnn3++\nfytwnQGfPXu2ZWZm2t13321ZWVm2du1amzt3rk2YMMEeeOABe+qpp+zLL7/0nr9y5UrLy8uzwsJC\n27Fjh1ePxWIWDAZt3LhxtmTJEq9+9uxZmzdvnmVnZ1tVVZW1t7d7j23YsMHy8/MtPz/f3n777Rty\nRh+3jgbBp4FSvX/gavr7+8F3Q+Gm4buhkt8/rw1cDd8NBQC44QgLAIATYQEAcCIsAABOhAUAwImw\nAAA4ERYAACfCAgDgRFgAAJwICwCAE2EBAHAiLAAAToQFAMCJsAAAOA3o5kcAbgd3Jf1WxOnpI9TV\ndSqpY8CNwf0scNNwP4tU7//iGHh9Dk7czwIAcMMRFgAAJ8ICAOBEWAAAnAgLAIATYQEAcOI6izvY\nvfeOVCLxz2QPA8AdgOss7mBc50D/XGeBq+E6CwDADUdYAACcCAsAgBNhAQBwIiwAAE6EBQDAibAA\nADgRFgAAJ8ICAOBEWAAAnAgLAIATYQEAcHKGxfz585WRkaEJEyZ4tUQioZqaGuXk5GjGjBnq7u72\nHlu1apXy8/MVCAS0c+dOrx6PxzVx4kTl5uZq6dKlXr23t1d1dXUaO3aswuGwTpw44T22ceNG+f1+\n+f1+bdq06bo3FgAwMM6wmDdvnt55551LaqtXr1ZOTo4OHTqkrKwsvfHGG5KkL774Qq+//rree+89\nrV69WosXL/baPP3003r22WfV0tKiDz74QB9//LEkafPmzfrqq68Uj8dVXV2tF154QZJ04cIFLVmy\nRH/84x+1ceNGLVmy5IZtNACgf5xh8eCDD2rEiBGX1Jqbm1VXV6ehQ4dq/vz5ikajkqRoNKrq6mrl\n5OSoqqpKZuYddRw8eFC1tbUaNWqUZs6ceUmbOXPmaPjw4VqwYIFXj8ViKi4uVnFxsUpKShQIBBSL\nxW7oxgMArs2Azlm0tLSooKBAklRQUKDm5mZJF3f8hYWF3vP8fr+i0aja2to0ZswYrx4IBLRr1y5J\nF4MnEAhIkkaOHKmOjg719PQoGo169e+2AQDcWgMKi/7cMOPiDXgub/9N3cwuWV9f677SugAAN9+A\nbqtaXl6ueDyuYDCoeDyu8vJySVIoFNL27du95x04cEDl5eVKT09XR0eHV29tbVUoFPLatLa2yu/3\n69SpU8rIyNCwYcMUCoW0devWS9rMnTv3iuOpr6/3/h0OhxUOhweyWQBwx4pEIopEIgNfgV2DI0eO\nWHFxsffz8uXL7fHHH7evv/7aHnvsMXvppZfMzOzEiRPm9/vt6NGj9v7771swGPTaTJkyxd566y07\nefKkTZo0yVpaWszMbMOGDTZz5kzr7u62ZcuW2aJFi8zM7Ny5c5abm2t79+61PXv2WG5u7hXHdo2b\nkJIkmWRJXOg/tfu/OAYMTv39v3E+e/bs2ZaZmWlDhgyxrKwsa2hosK6uLps+fbplZ2dbTU2NJRIJ\n7/krV660vLw8KywstB07dnj1WCxmwWDQxo0bZ0uWLPHqZ8+etXnz5ll2drZVVVVZe3u799iGDRss\nPz/f8vPz7e23374hG5xKkr+zoP/U7v/iGDA49ff/xvefRret/t50PJVcPMeTzLmh/9Tu/+IYeH0O\nTv3dd3IFNwDAibAAADgRFgAAJ8ICAOBEWAAAnAgLAIATYQEAcCIsAABOhAUAwImwAAA4ERYAACfC\nAgDgRFgAAJwICwCAE2EBAHAiLAAAToQFAMCJsAAAOBEWAAAnwgIA4ERYAACcCAsAgBNhAQBwIiwA\nAE6EBQDA6a5kD+BOdu+9I5VI/DPZwwCA6+YzM0v2IK6Hz+fTYN0En88nKZljo3/6T/ZrY/C+PlNd\nf/edvA0FAHAiLAAAToQFAMCJsAAAOBEWAAAnwgIA4ERYAACcCAsAgBNhAQBwuq6wGDdunEpKShQM\nBlVRUSFJSiQSqqmpUU5OjmbMmKHu7m7v+atWrVJ+fr4CgYB27tzp1ePxuCZOnKjc3FwtXbrUq/f2\n9qqurk5jx45VOBzWiRMnrme4AIABuq6w8Pl8ikQi2r17t5qbmyVJq1evVk5Ojg4dOqSsrCy98cYb\nkqQvvvhCr7/+ut577z2tXr1aixcv9tbz9NNP69lnn1VLS4s++OADffzxx5KkzZs366uvvlI8Hld1\ndbVeeOGF6xkuAGCArvttqO9+t0hzc7Pq6uo0dOhQzZ8/X9FoVJIUjUZVXV2tnJwcVVVVycy8o46D\nBw+qtrZWo0aN0syZMy9pM2fOHA0fPlwLFizw6gCAW+u6jywmT56sGTNmaMuWLZKklpYWFRQUSJIK\nCgq8I45oNKrCwkKvrd/vVzQaVVtbm8aMGePVA4GAdu3aJeli8AQCAUnSyJEj1dHRoX//+9/XM2QA\nwABc11eUf/TRR8rMzFQ8Hte0adNUUVHRr28xvPitrJcyM69uZpes72rrrq+v9/4dDocVDoeveQwA\nkAoikYgikciA219XWGRmZkqSCgsLNX36dG3dulXl5eWKx+MKBoOKx+MqLy+XJIVCIW3fvt1re+DA\nAZWXlys9PV0dHR1evbW1VaFQyGvT2toqv9+vU6dOKSMjQ0OHDr1sHN8OCwDA5b77h/Tzzz/fr/YD\nfhvq66+/ViKRkCSdPHlSTU1Nqq6uVigUUkNDg86cOaOGhgZVVlZKkioqKtTU1KRjx44pEokoLS1N\n6enpki6+XdXY2KjOzk5t3rz5krBYt26dTp8+rTVr1njrAgDcYjZAn332mZWWllppaalNnjzZ1q5d\na2ZmXV1dNn36dMvOzraamhpLJBJem5UrV1peXp4VFhbajh07vHosFrNgMGjjxo2zJUuWePWzZ8/a\nvHnzLDs726qqqqy9vf2ycVzHJtx0kkyyJC70T//J7N9Muus/40jOkp4+Itm7gUGrv/tO7pR3E3Gn\nPPpP7f4HwxgG7/4h2bhTHgDghiMsAABOhAUAwImwAAA4ERYAACfCAgDgRFgAAJwICwCAE2EBAHAi\nLAAAToQFAMCJsAAAOBEWAAAnwgIA4ERYAACcCAsAgBNhAQBwIiwAAE6EBQDAibAAADgRFgAAJ8IC\nAOBEWAAAnO5K9gAA4Oa5Sz6fL2m9p6ePUFfXqaT1fyPdsWHx3nvv6fPPP09a/2lpHLQByXdOkiWt\n90QieUF1o/nMLHkzeQP4fD5daRP+53/y9a9/lcrnuy8Jo5IuXGhST8//UzJ/USUf/dN/EvsfDGNI\nfv+DdRd7tX3n1dyxRxYXLkhnziyTlJ+U/u+7b8p/wgIAbn+8VwIAcCIsAABOhAUAwImwAAA4ERYA\nACfCAgDgRFgAAJzu2OssACD57pyvGxn0RxY7duxQYWGh8vPz9dprryV7OADQD9983UhylkTinzds\nSwZ9WDzxxBN68803tX37dv3ud79TZ2dnsoc0iEWSPYBBJJLsAQwikWQPYBCJJHsAt61BHRZfffWV\nJOmhhx7S2LFj9cgjjygajSZ5VINZJNkDGEQiyR7AIBJJ9gAGkUiyB3DbGtRh0dLSooKCAu/nQCCg\nXbt2JXFEAJCa7tgT3HfdlaZ77lmgtLR7ktJ/T8/HSekXAG6GQR0W5eXleuaZZ7yfY7GYqqurL3lO\nXl5eH582+L+bOLprdas/CfF8kvv/LvofHP1/9/ciGWMYLP3f6rlI7vZfbf+Yl5fXr/UM6rC4776L\n96LYsWOHcnJy9O677+q555675DltbW3JGBoApJRBHRaStHLlSv385z9Xb2+vFi9erNGjRyd7SACQ\ncm77O+UBAG6+Qf1pqL6k8sV68+fPV0ZGhiZMmODVEomEampqlJOToxkzZqi7uzuJI7x1jh8/rocf\nflhFRUUKh8Nav369pNScj56eHoVCIZWVlamyslIrVqyQlJpz8Y3z588rGAxq2rRpklJ3LsaNG6eS\nkhIFg0FVVFRI6v9c3LZhkcoX682bN0/vvPPOJbXVq1crJydHhw4dUlZWlt54440kje7Wuvvuu7Vi\nxQrFYjFt2rRJv/71r5VIJFJyPoYNG6b3339fn376qT744AOtXbtWhw4dSsm5+Marr76qQCDgneRN\n1bnw+XyKRCLavXu3mpubJfV/Lm7LsEj1i/UefPBBjRgx4pJac3Oz6urqNHToUM2fPz9l5uP+++9X\nWVmZJGn06NEqKipSS0tLys7H8OHDJUnd3d06d+6chg4dmrJz8Y9//EN/+ctf9LOf/UzfvNueqnMh\nSd8949Dfubgtw4KL9S737TkpKCjw/npIJW1tbYrFYqqoqEjZ+bhw4YJKS0uVkZGhxx9/XDk5OSk7\nF0899ZReeuklpaX9dzeXqnPh8/k0efJkzZgxQ1u2bJHU/7kY9J+GwrVJ9c8pJBIJ1dbWasWKFbrn\nnntSdj7S0tK0Z88eff7555o6daomTZqUknPx5z//WWPGjFEwGFQkEvHqqTgXkvTRRx8pMzNT8Xhc\n06ZNU0VFRb/n4rY8sigvL9eBAwe8n2OxmCorK5M4ouQrLy9XPB6XJMXjcZWXlyd5RLdOb2+vZs2a\npblz56qmpkZSas+HdPGE5tSpUxWNRlNyLv7+979ry5YtGj9+vH7605/qb3/7m+bOnZuScyFJmZmZ\nkqTCwkJNnz5dW7du7fdc3JZh8e2L9T7//HO9++67CoVCSR5VcoVCITU0NOjMmTNqaGhImfA0M9XV\n1am4uFhPPvmkV0/F+ejs7NS//vUvSdKXX36pv/71r6qpqUnJuXjxxRd1/PhxHTlyRI2NjZo8ebL+\n8Ic/pORcfP3110okEpKkkydPqqmpSdXV1f2fC7tNRSIRKygosLy8PHv11VeTPZxbavbs2ZaZmWlD\nhgyxrKwsa2hosK6uLps+fbplZ2dbTU2NJRKJZA/zlvjwww/N5/NZaWmplZWVWVlZmW3bti0l52Pv\n3r0WDAatpKTEHnnkEfv9739vZpaSc/FtkUjEpk2bZmapORefffaZlZaWWmlpqU2ePNnWrl1rZv2f\nCy7KAwA43ZZvQwEAbi3CAgDgRFgAAJwICwCAE2EBAHAiLAAAToQFAMCJsAAAOP1/D0Kr13psC4QA\nAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Delta encode the coverage profile\n", "-----------------------------------\n", "\n", "Imagine the coverage profile looks like this:\n", "\n", " ## ##### ###\n", " #### ############### #########\n", " ############ ######### ### ################### ########### \n", " ############################### ###################### #############\n", " \n", " cov: 11222234432222112222222221122210001112222333344444333333033444333322110\n", " \n", " pos: 00000000001111111111222222222233333333334444444444555555555566666666667\n", " 01234567890123456789012345678901234567890123456789012345678901234567890\n", " \n", " \n", "Notice that most consecutive positions have the same depth. This means we can recover exactly this plot if we just record the transitions\n", "\n", " # # # \n", " ## # # # # # # # \n", " # ## ## # # # # # # # # # # \n", " # # ## ## # # # # # # # # # # # # # # # \n", " \n", " cov: 1 2 34 32 1 2 1 2 10 1 2 3 4 3 3 4 3 2 1 0 \n", " \n", " pos: 0 0 00 01 1 1 2 2 33 3 3 4 4 5 55 5 6 6 6 7\n", " 0 2 67 90 4 6 5 7 01 4 7 1 5 0 67 9 2 6 8 0\n", "\n", "Here, instead of 70 depth values, we can record just 24 without loss of information!\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Delta encoding coverage plot\"\n", "\n", "starttime = time.time()\n", "\n", "deltacov = []\n", "curcov = -1\n", "for i in xrange(0, len(brutecov)):\n", " if brutecov[i] != curcov:\n", " curcov = brutecov[i]\n", " delta = (i, curcov)\n", " deltacov.append(delta)\n", "\n", "## Finish up with the last position\n", "deltacov.append((totallen, 0))\n", "\n", "print \"Delta encoding required only %d steps, saving %0.02f%% of the space in %0.02f ms\" % (len(deltacov), (100.0*float(totallen-len(deltacov))/totallen), (time.time()-starttime) * 1000.0)\n", "\n", "for i in xrange(3):\n", " print \" %d: [%d,%d]\" % (i, deltacov[i][0], deltacov[i][1])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(deltacov)-3, len(deltacov)):\n", " print \" %d: [%d,%d]\" % (i, deltacov[i][0], deltacov[i][1])\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Delta encoding coverage plot\n", "Delta encoding required only 3697 steps, saving 99.62% of the space in 151.32 ms" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", " 0: [0,1]\n", " 1: [799,2]\n", " 2: [1844,3]\n", " ...\n", " 3694: [973770,2]\n", " 3695: [973779,1]\n", " 3696: [973898,0]\n", "\n", "\n", "\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Plot the coverage profile\n", "----------------------------\n", "\n", "In theory you could plot the raw coverage profile, but python doesnt like to draw thousands and thousands of little lines. But it does fine for this representation." ] }, { "cell_type": "code", "collapsed": false, "input": [ " plt.figure(figsize=(15,4), dpi=100)\n", "\n", " print \"Plotting coverage profile\"\n", " \n", " ## expand the coverage profile by this amount so that it is easier to see\n", " YSCALE = 5\n", "\n", " ## draw the layout of reads\n", " for i in xrange(min(MAX_READS_LAYOUT, len(reads))):\n", " r = reads[i]\n", " readid = r[0]\n", " start = r[1]\n", " end = r[2]\n", " rc = r[3]\n", " color = \"blue\"\n", "\n", " if (rc == 1): \n", " color = \"red\"\n", "\n", " plt.plot ([start,end], [-2*i, -2*i], lw=4, color=color)\n", "\n", " ## draw the base of the coverage plot\n", " plt.plot([0, totallen], [0,0], color=\"black\")\n", "\n", " ## draw the coverage plot\n", " for i in xrange(len(deltacov)-1):\n", " x1 = deltacov[i][0]\n", " x2 = deltacov[i+1][0]\n", " y1 = YSCALE*deltacov[i][1]\n", " y2 = YSCALE*deltacov[i+1][1]\n", "\n", " ## draw the horizonal line\n", " plt.plot([x1, x2], [y1, y1], color=\"black\")\n", " \n", " ## and now the right vertical to the new coverage level\n", " plt.plot([x2, x2], [y1, y2], color=\"black\")\n", "\n", " plt.draw()\n", " plt.show()\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Plotting coverage profile\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAA4wAAAEACAYAAAD4J/goAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9s1Pd9x/HX16uIk8WiCZlhW2ISg2v7oKntBTtdBHay\nQF21lIyoglbOpkGUCNHSpCRiHZ1EoyYoIawxrFASiU5bFrnJJCskCwFMdQSr42xE1jXGsHmQQaYA\nIWQEM2gZ/ewP5y7n8/fuvve97/fu+717PiQL/PX9+Nz3Pt/vfV/f9+f7OcsYYwQAAAAAQIqKYjcA\nAAAAABBMBEYAAAAAgC0CIwAAAADAFoERAAAAAGCLwAgAAAAAsEVgBAAAAADY8jUwXr16Vc3NzVq4\ncKEk6cKFC1q0aJFqamp03333aXR0NHHbTZs2qa6uTpFIRP39/X42CwAAAADggK+Bsbu7W5FIRJZl\nSZK2bt2qmpoa/cd//Iduvvlm/eQnP5EknTlzRlu2bNHevXu1detWrVq1ys9mAQAAAAAc8C0wvvfe\ne3rjjTf04IMPyhgjSRoYGNDy5ct1zTXXaNmyZYrFYpKkWCymzs5O1dTUqL29XcYYXbhwwa+mAQAA\nAAAc8C0wPvroo9qwYYMqKj59isHBQTU0NEiSGhoaNDAwIGksMDY2NiZuV19fn/gbAAAAAKA4fAmM\nr7/+uqqrq9Xc3JyoLkoa9/9s4sNYAQAAAADF8Rk/HvQXv/iFduzYoTfeeEOXL1/Wxx9/rAceeEBz\n5szR8PCwmpubNTw8rDlz5kiS2tra1NfXl7j/kSNHEn9LNnPmTP3nf/6nH00GAAAAgMCbMWOGRkZG\nCvZ8lsml7OfCvn379Oyzz+q1117TM888o5MnT+qZZ57RY489pttuu02PPfaYTp8+rfb2du3evVvH\njh3Td7/7XR06dGhiYy0rpyolUEjr1q3TunXrit0MYAL6JoKKvokgo38iqAqdiXypMKaKDy9dsWKF\nurq6VF9fr5aWFj399NOSpKlTp2rFihW65557NGnSJG3btq0QzQIAAAAAZOB7YGxvb1d7e7skqaqq\nSq+++qrt7b7zne/oO9/5jt/NAQAAAAA45Ov3MALlpKOjo9hNAGzRNxFU9E0EGf0TGOP7NYxe4hpG\nAAAAAOWs0JmICiMAAAAAwBaBEQAAAABgi8AIAAAAALBFYAQAAAAA2CIwAgAAAABsERgBAAAAALYI\njAAAAAAAWwRGAAAAAIAtAiMAAAAAwBaBEQAAAABgi8AIAAAAALBFYAQAAAAA2CIwAgAAAABsERgB\nAAAAALYIjAAAAAAAWwRGAAAAAIAtAiMAAAAAwBaBEQAAAABgi8BYRJZlFbsJAAAAAJAWgREASpBl\nWZyUAgAAeSMwFhkHdAC8xn4FAAB4xbfAePLkSd19992aNWuWOjo69NJLL0mSLly4oEWLFqmmpkb3\n3XefRkdHE/fZtGmT6urqFIlE1N/f71fTAAAAAAAOWMYY48cDnzp1SqdOnVJTU5POnj2r1tZW/fKX\nv9TWrVt18uRJPfvss1q9erVuvfVWPfbYYzpz5ozmzZun3bt36/jx43r00Ud16NCh8Y21LPnU3IJK\nd/a/FF4bgOKK71+MMSWzzwQAAJ8q9Oe7bxXGadOmqampSZJ00003adasWRocHNTAwICWL1+ua665\nRsuWLVMsFpMkxWIxdXZ2qqamRu3t7TLG6MKFC341r2hSw6IxhgM6jzAMDxhTivsUtm8AAIqjINcw\njoyMaGhoSK2trRocHFRDQ4MkqaGhQQMDA5LGAmNjY2PiPvX19Ym/lQIOdryRuh6Z2APIjG0EAADk\n4zN+P8GFCxe0ZMkS/ehHP9L111+f05lvu4OcdevWJf7f0dGhjo4OD1rpL7uqot1yho/lJnn9cUCM\ncme3DcSHpYZFcluT94Xx5ewjAQDlKBqNKhqNFu35fQ2MV65c0f33368HHnhAixYtkiTNmTNHw8PD\nam5u1vDwsObMmSNJamtrU19fX+K+R44cSfwtWXJgDBInBzLpgiPSs1uvHDQC6aVuG2ELjalSr8lk\n+wcAlJvUItkPfvCDgj6/b0NSjTFavny5Zs+erUceeSSxvK2tTdu3b9elS5e0fft23XnnnZKk1tZW\n7dq1SydOnFA0GlVFRYWqqqr8al7BxQ9w0h3opKs6lgK3rynT/ZIPIu0qEQDCiTAIFA5D1gE44dss\nqf39/Zo3b55uv/32xM5o/fr1uuuuu9TV1aW3335bLS0tevHFF3X99ddLkrq7u7V582ZNmjRJ27Zt\n09y5c8c3NqBnlrNVDuPtztb+Uq1Aun3fkkNh8u9xdo8Z1D4C5CNTv3a6XaT7W1CkVhKT/293UijI\nrwUIi1I97gBKXaGPd30LjH4IahjItMPN5eCm1Hbc6a6pyvW+yYEx9WAy23OGfR0Ckv1+JJftK921\ngamPXax9bGr7vNqnAkgvn89ooJyl+8wqaIAr8Oe175PelIvkDpPP8I4gDE3NdHDp9eOnPo/d3+w2\nikztSn0v+ABEqcvWx91ex1iI7cfugzZTe8N+TSbgVCEOQrmkA8hPuWw3BflajaDxc8y+3Y7d6c6+\nlIJNrqEz3XtiFySzhcVS5WWfLcfrVsL4elPfp/jv+b6WdCdmnN4218fOppS3WyBfYdx3wRt2s+nD\nOdaXd8oyMMZ50ZGyBZmwHAhl+uoPt+speUKaXCunqZPZJD9GOSrlnZ6fr80ucIVFLtuKm8dNt23n\nM1LCy/Vrtw/w67mAuGKfTPPqpBDCL11YzLTvpt98qpDrIgijA/1WVoHR640p3XARr4JNoXYImYZ+\nZbuNG9kOBL1an6W4Aef7/qerWHn5HG7bFebH90qmg8X4dpN8EiaXbSPX22dqm12oDOKojVxxsB58\nfr0/mfaNhTyplSx5m3XSBq9PgiMYnJzcy/a5Xs78/GzK93M1TMoiMCZvPIV6g71+Di+GoDkJCqm8\nDI2p68Tuvcj2tSO5PJeb+wVZPn3K7v3O9t76/aFTqJDotsrt5vnyDfOp0u2v8unXdqEz9XnSnQxL\n197U2ztdD34Mrc1XKZ5oCrtCBvl0k0IVYn+Vz0loL9pH3y8+u8/pUjyeKRY/T6qU+vZT8oHR7zfO\nrw3Y7mA+l4OwbGekUn/P1NGLcZazXNhVbeLLk/+e73Okk+7gyOn98+VXhSpTOA4iuwPFQpzUcvr3\nTO9Nvus63xEEXmG/FUy59L18H7+Q2182ds/v5eevk5FFQVdq22ymz327k3zJ93HbX0qd3eVV+RxX\nZTuZWsqhseQDYzZuO06xOkNq508XNDLJ9kGR7QPDyQe4m4NGPz+oCvF+eTEsJN198l03ToYyFroK\nn9y21Db4/VxeVwPzqdBmOqNcDOmq/m4OpsPwoZmpssPQruLIdsAcv42Xz2f3XKn/91ou+/pMJ/fy\n+dxFeNlVoov9+REU6U4EFUKprv+S/lqNbNUzY3L7+oViHDik+0BxcsCautzufrlIPdj2avio31Lf\nZ7vlubQ5247I7r3JFors+qhfwdxuWaaTCLlsI06ltt+P5/H7DGBqO+OvwWlFJGjbiRPphsrZvWd+\nV4aSH8vNunRSeU/dHtM9T/Lf3G6bmMjpfim+LNfHyPS73TI/9oHpnjvT/bLtw7zog+m26aD0az8+\nl4oln8+FdJ/fmY578hX0z7FMI3ac7tOdPm622wdx/bhV0oFRyn6mMFuISveGF6ITxNvm9IMs9W9O\nzl7aHfS6FbYz8bkcDMZlC4ipITRTsHeyrp302XRtz2VnlamvxR/Lb8UYyuHFDj3bSZl04TjoH7rJ\nsu2DMgVJPz80k9elm+fItVKayzZSagcLfoivo0wHvOn6mZP9RbrKYa77GL/2TW7DQT4Hrk5eg912\nFaSAlvoZG4Q25Suf9of9tfvBSUEjl+3ZzT4jbMfE2ZT9kNRMkneQQX/jkw9G3dzP7/sESepBR6YD\nwGwH/+kOdNy+H3ZtdPN3N/01lx1s6rqxO5OZrS2ZXkMhA2qunFSBM/Urr9oRBpleWy7ryAk/+oxf\n22+5S95/OKn0ZpJ6csLJ46b+vdDvl5/7t9Ttys3+N9NjF5vdCTikl+mkdT6PVyrrPdfj+1J53W4Q\nGOVsWGGm2/rJi+fz+sAs23OVm/j6dRriMt021/cqCCcynIbF1AO6dIpRaXTDScBOfj/DVFEMEzf7\nt3wOELL1Yd7bzJxs/15X/e2ew0lb8nl+P4eL5tIOt1KDeJDkcsxWzpyeHHfKru+Geb3nOsLEjTCv\nn1QlOSTV7RtkNzwmeacZ9DNa6YaFFUIYNop0w5My3c7uYDH5b05ft9/rJ90wWK+eN93Bg9OKotN1\nb3c/P8NArs/h5dntoO5H/BDE/YOXoST+eNlGJJSzdKMxvJAuADrZTt3uX5yEFj9OEtkN1XXzGPnc\npxj92sk+pBhDU9OtC6dtKca+MV3bcg2EQRl2mXzsnu/IoXTHfG5ea1DWj1dKMjBK7q8NSHd/PvCz\nC8M6ynYA4fTMdKbbZHtur26faadezEp4plCd6zDN1Oql16/L7YdAPs9Xbtz2ATf8OmDMduIo0/9L\n6YDBC36+/35vX3b7pdRlmfqKV/uxXIbqxp/XzbZRjOqsnWzrLWjbml3/KBa7Yxa7z9XUfmr32ViM\nAOT0s8NtuzIdRzH8eTyGpKIsFHJYbjGeu1CvzWmITn7NXldynNzG6/URlAORsAvCB64X72VqH0dm\nQay0evn+BfH1JStUoHJ6/aibxwziurU7GZBLe4v52nIZieTl53mm9iSPYLIbCu1k9FK+fS/1eYPY\n74ql5AKjF50F5S3MB4KF7L+5ric/P2iS//VLvtcDhbVPhUUQ+mKhhOVzKvngK+zrPBun1a9SWg+p\nB/jJ/3opaOsstV/nE8iD9NrcvhY/Ln3Jp11BWqdSePbX2ZTkkFS3naXUxhv7rdTWVdhfT6H6b77X\nv+R6v+ThMalDfLw6u5hp6FDY+wWKz6uwYDdsLF/5nk0v1+0jl4CU7/VVXvB7KHj8OVKHY/p57Wq6\ntqSuby/Deq7bSxC3j0zDpjPdLt1tch3y7Edhx4vjH7vhuvkqpVxRkoExH0E7MxEWYV9vQfhA90LY\n228neShMPh9smR7fyZAcuFfO6y+f6oPdgW7y4+W7z0o90E93TVOyXK9rK8X33s3nRamth3TrwK4v\npLtGzq82+T3qxMmlGcntyeVxiiHTdY5O7+/09tn2K5nWSbYTEF6sz+TnCMr7ExQlFRhLJcUDyK4Q\nO3X2KaUhU7UjqLJdY+TV67HbfjI9rl3lJvWxyuGAq5Rfm1PZ1kG6aly60SJOK19O2+b19p7PkNOw\nnZB2O9rAaVXSrbDtx0tJSQVGiZ14obG+UWiFviCdPh4u6Q5awhRiMlVn0t0+19flpqrutIIYhnUM\nfzkdrulHFS5bxSyffUE+l1eUMqf7Ka+f0y+l/n65UTKT3nDGobCSL0IuFaX2ekqR3dA8r6QeULBP\nCZ+gB8Vch3mlDrXKtRqYrR2p21Mu1yCxfcCtTNehe/nYqX06td867cf0d+fs1lPqcPegC0Mbi6Fk\nAqPEmwyUC6+39XSPxz4lnNINlQzKpCNuDkDTVfLcXiOZqc9n+1sQ1ifCL7U/+RnKkvus3fOm2yYJ\nis4V6oRdKRYswqAkAiMbNFA+CvVBwQdS+AXxs8GPNhXigDvdcwJuZJosx6+KnpPh03aT5qQGTOSG\nE0ylIfSBMUxlbgDBF8SQAeeCPPFKtkk98jlQTj3gBsImSNsq3PN7lloUR6AC41tvvaXGxkbV1dVp\n8+bNju/HTgaAF4IYMpC7IL9/TobD5TOxht3jpv4AQVWMIc921UO2E3eKMdoBhWGZAH2yNjc3q7u7\nW9OnT9eXvvQl9ff366abbkr8Pd2MWgF6CQBCLGxTnyP8/DxJUegZhQEA4/l1XFHo45XAVBjPnz8v\nSZo3b56mT5+uBQsWKBaLTbjdp59/JvFjWcr5BwBScVCNQvPz2ij6MwDAC4EJjIODg2poaEj8HolE\ndODAAZtbfnI2VlbOPwAAAAAA5z5T7Aa45S7+JX//jlctAQAg2LimCADCKxqNKhqNFu35A3MN4/nz\n59XR0aG3335bkvTtb39bnZ2d+spXvpK4jWVZSm6sJeN95TAYqwMAAAAAJijbaxgnT54saWym1Hff\nfVd79uxRW1ubo/vmPjg1ww/XPAIAAACApIANSX3uuef08MMP68qVK1q1atW4GVIT4mk6dabUPCqN\n4+uWAAAAAAApQENSnUgtv1rWWFCMD031M/iFZy0BAAAAKFWFHpIaqApjrozRhNlv8gmO6aqUVCAB\nAAAAlKNQB0ZJkjFjcS4p6+UyPNVJGKS6CAAAAKAcBWbSm7y5THV8PyMAAAAA2At/hTFJ6hDVXIeS\n+vI1HQAAAAAQUiUVGCWNpcakGVRzuaYxERYZgwoAAAAAJRgY85AIlmmKjORIAAAAAOWkdK5hTGaT\n7LwYamoxWhUAAABAGSndCmOes6emZfcQlB4BAAAAlKDSDYxxxqQdYprLpDhMhgMAAACg3JR+YNTE\n2VMTy70KgaljVak4AgAAACgBZREYJU2YPTUu+as0qDgCAAAAwKfKJzBKn1b+0sxe43nFkUojAAAA\ngBArr8AYlyE45vydjQAAAABQokrzazWcyqMCaMkkftLfiFAJAAAAILzKs8KYZCwzTqw4UkEEAAAA\nUO7KPjCmk8sEOJlwFSMAAACAsCIwJkvznY35VRuJjAAAAADCicCYIstEqrlLfiBmTQUAAAAQIuU9\n6Y2PvBrSCgAAAADFQoUxDWMmVhlzDYETbp+maknhEQAAAEAQUWHMwJiJYY7ZUwEAAACUCwKjE5QA\nAQAAAJQhhqQ6ZTODqptqI9c2AgAAAAgLXyqMjz/+uBobG9XS0qJHHnlEly5dSvxt06ZNqqurUyQS\nUX9/f2L58PCwWlpaVFtbq7Vr1/rRrLx5VWg0ssb9yMrwAwAAAABF4ktgXLBggYaGhnTw4EFdvHhR\nL730kiTpzJkz2rJli/bu3autW7dq1apVifusXr1aa9as0eDgoPbt26eDBw/60bT85ZkauQYSAAAA\nQFj4Ehjnz5+viooKVVRU6Etf+pL27dsnSYrFYurs7FRNTY3a29tljNHo6Kgk6ejRo1qyZImmTJmi\nxYsXKxaL+dE0T6RmRmtCzTC3HwAAAAAIIt8nvXnhhRe0cOFCSdLAwIAaGxsTf6uvr1csFtPIyIiq\nq6sTyyORiA4cOOB30/KSHBrzi4tUHAEAAAAEk+tJb+bPn69Tp05NWP7UU08lAuITTzyhqqoqff3r\nX5ckGZvhnJbNdXp2t4tbt25d4v8dHR3q6OjIseXes2S8D37MzAoAAACUvWg0qmg0WrTnt0ymdJaH\nv/u7v9MLL7ygvXv3qrKyUpL02muvqa+vT93d3ZKkpqYm7d+/X1VVVaqtrdWxY8ckSRs3blRlZaVW\nrlw5vrGWlTFMFoVleR8Yg/YaAQAAAARCoTORL0NS33zzTW3YsEE7duxIhEVJam1t1a5du3TixAlF\no1FVVFSoqqpKktTQ0KCenh6dPXtWvb29amtr86Np3kt5s/K9npFrGgEAAAAEhS8Vxrq6Ov3mN7/R\njTfeKEn64he/qC1btkiSuru7tXnzZk2aNEnbtm3T3LlzJUmHDx9WV1eXPvroIy1dulTr16+f2Ngg\nVhg19u0X8Qoj1UYAAAAAfil0JvJtSKofghoYE/wYnioRGgEAAABIKnwmcj3pDWwYIzdZMeswVOvT\nhwcAAACAQiEweswYJQIe1yMCAAAACDPfv4exLOVZCuQ7GgEAAAAEARVGv3wyPDXf0Dfu/pYYlwoA\nAACgYAiMBeJmeCoVRgAAAADFxJDUAON7GQEAAAAUExVGHyVPgDNuOZVDAAAAACFAYPRb/JrDcZci\nOq8aGr5TAwAAAECREBgDLhEukwIn2REAAABAIRAYC8QYyYoXC3MYkmrJTLx9/FeSIwAAAAAfERgL\nKJHvLOfDUu3CZWrVkdwIAAAAwA8ExiLJZ+Kb5PsyiyoAAAAAvxAYAyDf0GelZE8qjgAAAAC8wPcw\nhhxf0QEAAADAL1QYiyF5BpzkxbJyrjba3d6yXAZJSpMAAAAAkhAYiyU5nKUOKXUQ9rh2EQAAAIDf\nCIwBYFdwJBACAAAAKDauYQyI1NGgXJsIAAAAoNioMAaJMWN1RQ+yYnKFkksTAQAAALhBhTGISHgA\nAAAAAoAKY0jkN0SVAAoAAAAgdwTGoPpkeKonk+GkzsJKfgQAAADgAIEx4FJnUPVmMhwSIwAAAIDs\nfL2GcePGjaqoqNC5c+cSyzZt2qS6ujpFIhH19/cnlg8PD6ulpUW1tbVau3atn80qCblUGi2Z8be3\nrPE/AAAAAGDDt8B48uRJ7dmzR9OnT08sO3PmjLZs2aK9e/dq69atWrVqVeJvq1ev1po1azQ4OKh9\n+/bp4MGDfjUtdDINIY2HwUw/2W4LAAAAAHZ8C4zf/e539cwzz4xbFovF1NnZqZqaGrW3t8sYo9HR\nUUnS0aNHtWTJEk2ZMkWLFy9WLBbzq2nhZIx/Fx9SbQQAAABgw5fA+Oqrr+rmm2/W7bffPm75wMCA\nGhsbE7/X19crFotpZGRE1dXVieWRSEQHDhzwo2nh50NoHFdxJDMCAAAA+ITrSW/mz5+vU6dOTVj+\n5JNPav369dq9e3dimfkk5BibsGPZJBS728WtW7cu8f+Ojg51dHTk0OoS8ckMqp7MfwMAAAAgsKLR\nqKLRaNGe3zKZ0pkL77zzjv7kT/5E1113nSTpvffe0x/+4R8qFotpYGBAfX196u7uliQ1NTVp//79\nqqqqUm1trY4dOyZpbLKcyspKrVy5cnxjLStjmCxX+cyiancNI6sYAAAACKZCZyLPh6TOnj1bp0+f\n1vHjx3X8+HHdfPPNOnTokKZOnarW1lbt2rVLJ06cUDQaVUVFhaqqqiRJDQ0N6unp0dmzZ9Xb26u2\ntjavm1ay7PqLk8lwmPAGAAAAQCa+fw9j8pDTqVOnasWKFbrnnns0adIkbdu2LfG3Z599Vl1dXfre\n976npUuX6o477vC7aSUl3fc15hIKqSwCAAAASOb5kFQ/MSTVgZRrQr2sIrLqAQAAgOIqdCbyvcKI\n4jKyZMk4vrYxHjCTb8/QVQAAAKA8ERhLTfxsg02l0UloJCgCAAAAiCMwlqo0wTGnh0gJmJaVPkAy\nXBUAAAAoPQTGMkTlEAAAAIATnn+tBgLGGKp/AAAAAFyhwlgmjJFk87Ub+aBSCQAAAJQ2KozlJKnU\n6FXYM4brFwEAAIBSRWAsNx6mOy+qlAAAAACCi8BYhqgIAgAAAHCCwAgAAAAAsMWkN2UqdRIc15K/\n55HSJQAAAFBSCIzlzJhxgc/pRDh21y5aMpJFZgQAAABKCYGx3MUTnmXJyJIlw2Q2AAAAACRxDSPi\nXH7lBt/FCAAAAJQuKoxIy2kYTL6dZVOcZJgqAAAAEE4ERmTkyfDUXB+ChAkAAAAEAoERnzJGRvZV\nwjivh6ByvSQAAAAQXFzDiAn8LPAREAEAAIDwoMIIe59UG9PlO4IfAAAAUPoIjMgstdxopf6aXznS\n9v6W/VMDAAAAKCwCI3JizPhrHN1UGvkqDgAAACAcuIYReSH8AQAAAKWLCiNyNjZU1GSeTjXT/TNU\nJQmgAAAAQHD4VmH86U9/qsbGRs2aNUtr1qxJLN+0aZPq6uoUiUTU39+fWD48PKyWlhbV1tZq7dq1\nfjULXvLoIkNLJvETf1iuXwQAAACKz5cK4zvvvKPnn39eO3bsUF1dnT744ANJ0pkzZ7Rlyxbt3btX\nx48f16pVq3To0CFJ0urVq7VmzRrde++9WrRokQ4ePKg77rjDj+YhAKgkAgAAAMHnS4Vx586dWr58\nuerq6iRJv/d7vydJisVi6uzsVE1Njdrb22WM0ejoqCTp6NGjWrJkiaZMmaLFixcrFov50TR4zG0l\n0IyrK1p8TQcAAAAQQL4Ext27d+udd97RHXfcoQcffFCHDx+WJA0MDKixsTFxu/r6esViMY2MjKi6\nujqxPBKJ6MCBA340DX7waPxoIjhaKT8AAAAAisL1kNT58+fr1KlTE5Y/+eSTunz5ss6dO6f9+/er\nr69P3/rWt/Tzn/9cxiZYWDaBwO52cevWrUv8v6OjQx0dHa7aD4+lft+Gl3J9XC6ABAAAQImIRqOK\nRqNFe37LZEpnLj3++OPq6OjQV77yFUnSH/zBH+jYsWPas2eP+vr61N3dLUlqamrS/v37VVVVpdra\nWh07dkyStHHjRlVWVmrlypXjG2tZGcMkAuCTcJfvNYp5DVGljwAAAKBEFToT+TIk9Ytf/KJ27twp\nY4xisZhmzJihyspKtba2ateuXTpx4oSi0agqKipUVVUlSWpoaFBPT4/Onj2r3t5etbW1+dE0FJjd\ntYqZfuKSZ00FAAAAUBy+zJK6aNEi7d69W5FIRA0NDfqbv/kbSdLUqVO1YsUK3XPPPZo0aZK2bduW\nuM+zzz6rrq4ufe9739PSpUuZITWsUoamWjI5VQuZ/AYAAAAIDl+GpPqFIanhkXrZYd5BkPcdAAAA\nKHgm8qXCCNjNgZPXEFOLzAgAAAAUGoERvkkNjXlXGZPvTnoEAAAAfOfLpDdAnNe5LjE9Dpc6AgAA\nAL6jwgjfGSPZFRfdDFEdV6W0RKURAAAA8BEVRhRGHsGOmVMBAACA4qDCiMKxmQnHyMpaaUz+e/L/\njayJM+vYPScAAAAAVwiMKKx4gHN5ESLVRgAAAKBwCIwoDmNsr2vMJl01kiAJAAAAeI/AiKJJngyH\nwAcAAAAED5PeoLiM/fWJnjwu1y8CAAAAeaHCiECxZLypNtpdI0mABAAAAHJCYETRpX5Po6eVxmRp\ncig5EgAAALBHYEQwuJwEx64a6VvgBAAAAMoMgRGBkfo1jW6HphIiAQAAAG8QGBEoqaGRoAcAAAAU\nD4ERgeem0pj2+xrJnwAAAIBjBEYETiLU5TFZanLIpEoJAAAAuENgRHAlTYRD6AMAAAAKr6LYDQAy\nYQgpAAAGZpeNAAAOZElEQVQAUDxUGBEabmdNlahQAgAAAG4QGBF8Kd/R6CT8pYZL8+nY1omPDQAA\nAMAWgRElyXFFMfl7H8mOAAAAwDhcw4hQIMwBAAAAhedLYDx8+LC++tWvqqmpSQsXLtTw8HDib5s2\nbVJdXZ0ikYj6+/sTy4eHh9XS0qLa2lqtXbvWj2Yh7FJSo5GV8w8AAAAA53wJjE888YT+7M/+TP/6\nr/+qb37zm3riiSckSWfOnNGWLVu0d+9ebd26VatWrUrcZ/Xq1VqzZo0GBwe1b98+HTx40I+mIeyS\nQmPucTFzmdIiTwIAAADj+BIYJ0+erA8//FC//e1v9eGHH+qGG26QJMViMXV2dqqmpkbt7e0yxmh0\ndFSSdPToUS1ZskRTpkzR4sWLFYvF/GgaShAVRgAAAMAfvgTGDRs2qLu7WzfccIP+9m//Vs8884wk\naWBgQI2NjYnb1dfXKxaLaWRkRNXV1YnlkUhEBw4c8KNpKAHGjB+dmmuFMWOgtCxKjQAAAMAnXM+S\nOn/+fJ06dWrC8ieffFJ///d/r29/+9t6+OGH9eMf/1jLli3Tyy+/LGMzc4llc3Bud7u4devWJf7f\n0dGhjo4OV+1HaYlXD53MjprtNlQiAQAAEBTRaFTRaLRoz2+ZTOnMpWnTpun48eO69tprNTo6qpkz\nZ+rUqVN67bXX1NfXp+7ubklSU1OT9u/fr6qqKtXW1urYsWOSpI0bN6qyslIrV64c31jLyhgmUZ4s\nK7fAmI2RxbSsAAAACKRCZyJfhqTefffd2rFjhyTp1Vdf1fz58yVJra2t2rVrl06cOKFoNKqKigpV\nVVVJkhoaGtTT06OzZ8+qt7dXbW1tfjQNZYJrGgEAAID8+VJhHBoa0g9/+EMdPnxYs2fP1l//9V+r\noaFBktTd3a3Nmzdr0qRJ2rZtm+bOnStp7Ks4urq69NFHH2np0qVav379xMZSYUQ6VvoKYy7B0Pb+\ndDkAAAAERKEzkS+B0S8ERjjhx5w1dDsAAAAEQaEzketJb4AwcTP81IvrIQEAAIAwIzCi5BjjTZVx\nXMi0RJkRAAAAZYfAiJI07nsax+W+PEJf0uOQHQEAAFAOfJklFQgSu3CXzwypRtZYCo3/AAAAACWK\nCiPKXt5fr2FZ4yqXVB8BAABQKgiMKA/xFOdTQXDC9Y453ZmECQAAgGAiMKKsJLJZSqizZBxVGjNd\nA5l3pRIAAAAIGAIjypMxn0Y/rybFAQAAAEoMgRFI4WWlMDWAUoUEAABAmBAYAWNsq4xehLsJj8H1\nigAAAAgRAiOgT3KcTT7MZ4gq1UQAAACEHYERiEupNEqfhj43wZHhqAAAAAg7AiOQxBjJSpPr8gl8\nTKYDAACAMCIwAinSffXG2CJnwc82XHL9IgAAAEKGwAikk6ncmAUVRQAAAJQCAiOQiYnPmKq01zc6\nQYAEAABAGBEYgQIwsmyHuI6/EaESAAAAwVJR7AYAoWEMmQ4AAABlhQojkKNMk+IAAAAApYTACLiV\nZlIc19crMrEqAAAAAoYhqUA+SHQAAAAoYVQYgXylqTTmMotqOpbFdZMAAAAoHtcVxldeeUWzZs3S\n7/zO7+jQoUPj/rZp0ybV1dUpEomov78/sXx4eFgtLS2qra3V2rVrE8uvXLmi5cuXa/r06ero6NCp\nU6fcNgsoDmNsq42WTF4/AAAAQDG5Doyf//zn1dvbq3nz5o1bfubMGW3ZskV79+7V1q1btWrVqsTf\nVq9erTVr1mhwcFD79u3TwYMHJUm9vb06f/68hoeH1dnZqR/+8IdumwUUTTQaZYgqAikajRa7CYAt\n+iaCjP4JjHEdGBsaGvS5z31uwvJYLKbOzk7V1NSovb1dxhiNjo5Kko4ePaolS5ZoypQpWrx4sWKx\nWOI+XV1duu666/TQQw8llgNhEv9g8TozWtb4HyBXHPQgqOibCDL6JzDG80lvBgYG1NjYmPi9vr5e\nsVhMIyMjqq6uTiyPRCI6cOBA4j6RSESSdOONN+r06dP69a9/7XXTgIKh0AgAAIBSkHHSm/nz59te\nT/jUU09p4cKFtvcxdtdx2U0IYkxiuTFm3P3sHgMIHWNkRFUQAAAA4ZUxMO7ZsyfnB2xra1NfX1/i\n9yNHjmjOnDmqqqrS6dOnE8sPHz6stra2xH0OHz6s+vp6nTt3TlOnTtU111wz4bFnzJhhGz6BoPjB\nD37g+3OwCcCNQvRNwA36JoKM/okgmjFjRkGfz5Ov1UiuCLa2turxxx/XiRMndOzYMVVUVKiqqkrS\n2HWPPT09uvfee9Xb26vnnntO0lhgfPHFF7VgwQI9//zzuvPOO22fZ2RkxIvmAgAAAAAcsIzL8Z+9\nvb1atWqVzp49q8mTJ6u5uVk7d+6UJHV3d2vz5s2aNGmStm3bprlz50oaqyp2dXXpo48+0tKlS7V+\n/XpJY1+r8fDDD6uvr0+1tbXq6enRtGnTPHqJAAAAAAA3XAdGAAAAAEBp83yWVL+89dZbamxsVF1d\nnTZv3lzs5qBEnDx5UnfffbdmzZqljo4OvfTSS5KkCxcuaNGiRaqpqdF9992X+GoYSdq0aZPq6uoU\niUTU39+fWD48PKyWlhbV1tZq7dq1ieVXrlzR8uXLNX36dHV0dIybSOqVV15RfX296uvr9U//9E8F\neMUIm6tXr6q5uTkx0Rh9E0Fx8eJF/fmf/7k+97nPKRKJKBaL0T8RCC+88IL++I//WH/0R3+kRx55\nRBL7ThTPsmXLNHXqVH3+859PLCt2f3z//ffV3t6u6dOn68EHH9TVq1czvwgTEk1NTWbfvn3m3Xff\nNfX19eaDDz4odpNQAt5//33z9ttvG2OM+eCDD8xtt91mPv74Y/P000+bb33rW+by5ctm5cqVZsOG\nDcYYY06fPm3q6+vNf/3Xf5loNGqam5sTj/XlL3/Z9PT0mLNnz5q77rrLDA4OGmOM+dnPfmbuv/9+\nc/HiRbN+/XqzcuVKY4wxV69eNbW1teZXv/qV+eUvf2lmzJhR4FePMNi4caP55je/aRYuXGiMMfRN\nBMbq1avN97//fXPp0iVz5coV8z//8z/0TxTdhx9+aG699VYzOjpqrl69ar785S+bN998k76Jonnr\nrbfMoUOHzOzZsxPLit0fV6xYYZ5++mkzOjpq/vRP/9S88sorGV9DKCqM58+flyTNmzdP06dP14IF\nCxSLxYrcKpSCadOmqampSZJ00003adasWRocHNTAwICWL1+ua665RsuWLUv0t1gsps7OTtXU1Ki9\nvV3GmMRZoaNHj2rJkiWaMmWKFi9ePO4+XV1duu666/TQQw8llg8NDWn27NmaPXu2br/9dkUiEQ0N\nDRVhLSCo3nvvPb3xxht68MEHE5OL0TcRFH19ffqrv/orVVZW6jOf+YwmT55M/0TRXXvttTLG6Pz5\n87p06ZL+93//V5/97GfpmyiauXPn6oYbbhi3rNj9cWBgQA899JB+93d/V11dXVlzVSgC4+DgoBoa\nGhK/RyIRHThwoIgtQikaGRnR0NCQWltbx/W5hoYGDQwMSBrbKBsbGxP3qa+vVywW08jIiKqrqxPL\nk/vowMCAIpGIJOnGG2/U6dOndfnyZcViscTy1PsAkvToo49qw4YNqqj4dFdN30QQvPfee7p8+bJW\nrFihtrY2Pf3007p06RL9E0V37bXXauvWrbr11ls1bdo03XXXXWpra6NvIlCK2R8vXbqkM2fO6LOf\n/awkqbGxMWs/DUVgBPx24cIFLVmyRD/60Y90/fXXj/uqmGzsvhvUGJNYbowZ93iZHpvvGUXc66+/\nrurqajU3NzvuP6nom/DL5cuX9e///u+6//77FY1GNTQ0pJdffpn+iaL74IMPtGLFCh0+fFjvvvuu\n/uVf/kWvv/46fROBUsz+aFlWTs8vhSQwzpkzR0eOHEn8PjQ0lPa7GoFcXblyRffff78eeOABLVq0\nSNJYnxseHpY0dpHxnDlzJI19Z+jhw4cT9z1y5IjmzJmjmTNn6vTp04nlhw8fVltb24T7nDt3TlOn\nTlVlZeWEx0q+D/CLX/xCO3bs0G233aZvfOMb+vnPf64HHniAvolAmDlzpurr67Vw4UJde+21+sY3\nvqE333yT/omiGxgY0J133qmZM2dqypQp+vrXv679+/fTNxEoxeyPlZWVqq6u1kcffTThsdIJRWCc\nPHmypLGZUt99913t2bOHDRCeMMZo+fLlmj17dmImNWls49u+fbsuXbqk7du3J05QtLa2ateuXTpx\n4oSi0agqKipUVVUlaWxIQU9Pj86ePave3t5xG/KLL76oixcv6vnnn088ViQS0TvvvKNf/epX+rd/\n+zcNDQ1p1qxZBV4DCKqnnnpKJ0+e1PHjx9XT06N77rlH//AP/0DfRGDU1dUpFovpt7/9rf75n/9Z\n9957L/0TRTd37lwdPHhQ586d069//Wvt3LlTCxYsoG8iUIrdH9va2vT888/r4sWL+sd//MfshThn\n8/sUXzQaNQ0NDWbGjBmmu7u72M1Bidi/f7+xLMt84QtfME1NTaapqcns3LnTfPzxx+ZrX/uaueWW\nW8yiRYvMhQsXEvd57rnnzIwZM0xjY6N56623EsuHhoZMc3OzufXWW81f/uVfJpb/5je/MX/xF39h\nbrnlFtPe3m7ef//9xN9+9rOfmbq6OlNXV2defvnlwrxohE40Gk3MkkrfRFAcPXrUtLW1mS984Qtm\n9erVZnR0lP6JQPjpT39q5s2bZ+644w7z/e9/31y9epW+iaJZunSp+f3f/30zadIkc/PNN5vt27cX\nvT/+93//t5k3b5655ZZbzLJly8z//d//ZXwNljE5DmIFAAAAAJSFUAxJBQAAAAAUHoERAAAAAGCL\nwAgAAAAAsEVgBAAAAADYIjACAAAAAGwRGAEAAAAAtgiMAAAAAABbBEYAAAAAgK3/B7hYc9rR6HAV\nAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Part II: Plane-sweep analysis

\n", "\n", "1. Plane-sweep coverage\n", "------------------------\n", "\n", "Notice the coverage only changes at the beginning or end of a read\n", "so just need to walk from read to read and keep track of depth along the way\n", "\n", "Returning to our example again:\n", "\n", " pos: 1 10 20 30 40 50 60 70 80 90\n", " r1: [==========================]\n", " r2: [================]\n", " r3: [================================]\n", " r4: [=====================]\n", " r5: [===================]\n", "\n", " cov: 11111222222233333444444333332222333333332222221111111\n", "\n", "We can imagine scanning across this from left to right to\n", "keep track of how many reads span each position. This is a widely\n", "used approach in computational geometry called a [plane-sweep](http://en.wikipedia.org/wiki/Sweep_line_algorithm)\n", "algorithm.\n", "\n", "The basic technique follows like this:\n", "\n", "- Assume layout is in sorted order by start position (or explicitly sort by start position)\n", "- walking from start position to start position\n", " - check to see if we past any read ends \n", " - coverage goes down by one when a read ends\n", " - coverage goes up by one when new read is encountered\n", "- use a list/deque/heap to track how many reads currently intersect the plane\n", " - the number of elements in the list corresponds to the current depth\n", "\n", "In this example, the plane sweeps like this:\n", "\n", " arrive at 1\n", " 1 (add 50): 50\n", " \n", " arrive at 10\n", " 10 (add 40): 40, 50 <- notice insert out of order \n", " \n", " arrive at 20\n", " 20 (add 80): 40, 50, 80\n", " \n", " arrive at 30\n", " 30 (add 70): 40, 50, 70, 80 <- out of order again\n", " \n", " arrive at 60\n", " 40 (sub 40): 50, 70, 80\n", " 50 (sub 50): 70, 80\n", " 60 (add 90): 70, 80, 90\n", " \n", " no more starts, flush the end points\n", " 70 (sub 70): 80, 90\n", " 80 (sub 80): 90\n", " 90 (sub 90): 0\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Beginning list-based plane sweep over %d reads\" % (len(reads))\n", "\n", "starttime = time.time()\n", "\n", "## record the delta encoded depth using a plane sweep\n", "deltacovplane = []\n", "\n", "## use a list to record the end positions of the elements currently in plane\n", "planelist = []\n", "\n", "## BEGIN SWEEP \n", "for r in reads:\n", " startpos = r[1]\n", " endpos = r[2]\n", "\n", " ## clear out any positions from the plane that we have already moved past\n", " while (len(planelist) > 0):\n", "\n", " if (planelist[0] <= startpos):\n", " ## the coverage steps down, extract it from the front of the list \n", " oldend = planelist.pop(0) \n", " deltacovplane.append((oldend, len(planelist)))\n", " else:\n", " break\n", "\n", " ## Now insert the current endpos into the correct position into the list\n", " insertpos = -1\n", " for i in xrange(len(planelist)):\n", " if (endpos < planelist[i]):\n", " insertpos = i\n", " break\n", "\n", " if (insertpos > 0):\n", " planelist.insert(insertpos, endpos)\n", " else:\n", " planelist.append(endpos)\n", "\n", "\n", " ## Finally record that the coverage has increased\n", " deltacovplane.append((startpos, len(planelist)))\n", "\n", "\n", "## Flush any remaining end positions\n", "while (len(planelist) > 0):\n", " oldend = planelist.pop(0) \n", " deltacovplane.append((oldend, len(planelist)))\n", "\n", "\n", "## Report statistics\n", "planelisttime = (time.time() - starttime) * 1000.0\n", "print \"Plane sweep found %d steps, saving %0.02f%% of the space in %0.2f ms (%0.02f speedup)!\" % (len(deltacovplane), (100.0*float(totallen-len(deltacovplane))/totallen), planelisttime, brutetime/planelisttime)\n", "\n", "for i in xrange(3):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(deltacovplane)-3, len(deltacovplane)):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Beginning list-based plane sweep over 1873 reads\n", "Plane sweep found 3746 steps, saving 99.62% of the space in 48.90 ms (90.69 speedup)!\n", " 0: [0,1]\n", " 1: [799,2]\n", " 2: [1844,3]\n", " ..." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", " 3743: [973770,2]\n", " 3744: [973779,1]\n", " 3745: [973898,0]\n", "\n", "\n", "\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Checking for errors\n", "-----------------------\n", "\n", "The above implementation is flawed because there may be multiple events (read starts and/or read ends) at the same position. This will leading to multiple entries at the same position, although there should only be one\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Checking for duplicates:\"\n", "\n", "lastpos = -1\n", "for i in xrange(len(deltacovplane)):\n", " if deltacovplane[i][0] == lastpos:\n", " print \" Found duplicate: %d: [%d,%d] == %d: [%d,%d]\" % (i-1, deltacovplane[i-1][0], deltacovplane[i-1][1], i, deltacovplane[i][0], deltacovplane[i][1])\n", " lastpos = deltacovplane[i][0]\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Checking for duplicates:\n", " Found duplicate: 45: [20763,38] == 46: [20763,39]\n", " Found duplicate: 119: [36471,22] == 120: [36471,21]\n", " Found duplicate: 158: [46334,37] == 159: [46334,36]\n", " Found duplicate: 438: [99894,27] == 439: [99894,28]\n", " Found duplicate: 459: [102773,30] == 460: [102773,31]\n", " Found duplicate: 527: [118454,16] == 528: [118454,15]\n", " Found duplicate: 529: [118455,14] == 530: [118455,13]\n", " Found duplicate: 634: [140597,33] == 635: [140597,32]\n", " Found duplicate: 650: [142846,29] == 651: [142846,30]\n", " Found duplicate: 680: [148015,37] == 681: [148015,38]\n", " Found duplicate: 734: [156928,39] == 735: [156928,40]\n", " Found duplicate: 736: [157049,39] == 737: [157049,38]\n", " Found duplicate: 776: [163850,35] == 777: [163850,34]\n", " Found duplicate: 893: [192138,22] == 894: [192138,23]\n", " Found duplicate: 925: [200780,26] == 926: [200780,27]\n", " Found duplicate: 953: [207176,24] == 954: [207176,23]\n", " Found duplicate: 956: [207608,21] == 957: [207608,20]\n", " Found duplicate: 1183: [263441,28] == 1184: [263441,29]\n", " Found duplicate: 1228: [272371,21] == 1229: [272371,20]\n", " Found duplicate: 1274: [284943,19] == 1275: [284943,18]\n", " Found duplicate: 1404: [321683,31] == 1405: [321683,32]\n", " Found duplicate: 1451: [332607,12] == 1452: [332607,11]\n", " Found duplicate: 1625: [379379,32] == 1626: [379379,33]\n", " Found duplicate: 1679: [388495,34] == 1680: [388495,33]\n", " Found duplicate: 1682: [388799,33] == 1683: [388799,34]\n", " Found duplicate: 1781: [406457,30] == 1782: [406457,31]\n", " Found duplicate: 1865: [421769,28] == 1866: [421769,27]\n", " Found duplicate: 1957: [442758,22] == 1958: [442758,23]\n", " Found duplicate: 1983: [448441,36] == 1984: [448441,37]\n", " Found duplicate: 2031: [455780,36] == 2032: [455780,37]\n", " Found duplicate: 2054: [459807,39] == 2055: [459807,38]\n", " Found duplicate: 2101: [472150,8] == 2102: [472150,7]\n", " Found duplicate: 2164: [493322,25] == 2165: [493322,26]\n", " Found duplicate: 2196: [499862,21] == 2197: [499862,20]\n", " Found duplicate: 2212: [504669,23] == 2213: [504669,24]\n", " Found duplicate: 2233: [511527,22] == 2234: [511527,21]\n", " Found duplicate: 2422: [598503,23] == 2423: [598503,24]\n", " Found duplicate: 2580: [637735,23] == 2581: [637735,24]\n", " Found duplicate: 2633: [648890,26] == 2634: [648890,27]\n", " Found duplicate: 2640: [650223,27] == 2641: [650223,26]\n", " Found duplicate: 2668: [656130,17] == 2669: [656130,16]\n", " Found duplicate: 2722: [674166,25] == 2723: [674166,26]\n", " Found duplicate: 2784: [688606,27] == 2785: [688606,26]\n", " Found duplicate: 2909: [715460,36] == 2910: [715460,37]\n", " Found duplicate: 3216: [826087,15] == 3217: [826087,16]\n", " Found duplicate: 3328: [852933,31] == 3329: [852933,30]\n", " Found duplicate: 3459: [890989,8] == 3460: [890989,7]\n", " Found duplicate: 3642: [950674,15] == 3643: [950674,16]\n", "\n", "\n", "\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Refine the plane-sweep update rule\n", "--------------------------------------\n", "\n", "Lets fix the implementation to check for multiple events at the same position by peeking ahead to check for multiple ends or multiple starts at the exact same position\n", " \n", "Strictly speaking this may give slightly different results than the brute force coverage profile since this will create an event if a read ends at exactly the same place another begins, whereas the brute force approach will link together those regions because they have the same coverage\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Beginning correct list-based plane sweep over %d reads\" % (len(reads))\n", "\n", "starttime = time.time()\n", "\n", "## record the delta encoded depth using a plane sweep\n", "deltacovplane = []\n", "\n", "## use a list to record the end positions of the elements currently in plane\n", "planelist = []\n", "\n", "## BEGIN SWEEP (note change to index based so can peek ahead)\n", "for rr in xrange(len(reads)):\n", " r = reads[rr]\n", " startpos = r[1]\n", " endpos = r[2]\n", "\n", " ## clear out any positions from the plane that we have already moved past\n", " while (len(planelist) > 0):\n", "\n", " if (planelist[0] <= startpos):\n", " ## the coverage steps down, extract it from the front of the list \n", " oldend = planelist.pop(0) \n", "\n", " nextend = -1\n", " if (len(planelist) > 0):\n", " nextend = planelist[0]\n", "\n", " ## only record this transition if it is not the same as a start pos\n", " ## and only if not the same as the next end point\n", " if ((oldend != startpos) and (oldend != nextend)):\n", " deltacovplane.append((oldend, len(planelist)))\n", " else:\n", " break\n", "\n", " ## Now insert the current endpos into the correct position into the list\n", " insertpos = -1\n", " for i in xrange(len(planelist)):\n", " if (endpos < planelist[i]):\n", " insertpos = i\n", " break\n", "\n", " if (insertpos > 0):\n", " planelist.insert(insertpos, endpos)\n", " else:\n", " planelist.append(endpos)\n", "\n", "\n", " ## Finally record that the coverage has increased\n", " ## But make sure the current read does not start at the same position as the next\n", " if ((rr == len(reads)-1) or (startpos != reads[rr+1][1])):\n", " deltacovplane.append((startpos, len(planelist)))\n", "\n", " ## if it is at the same place, it will get reported in the next cycle\n", "\n", "## Flush any remaining end positions\n", "while (len(planelist) > 0):\n", " oldend = planelist.pop(0) \n", " deltacovplane.append((oldend, len(planelist)))\n", "\n", "\n", "## Report statistics\n", "planelisttime = (time.time() - starttime) * 1000.0\n", "print \"Plane sweep found %d steps, saving %0.02f%% of the space in %0.2f ms (%0.02f speedup)!\" % (len(deltacovplane), (100.0*float(totallen-len(deltacovplane))/totallen), planelisttime, brutetime/planelisttime)\n", "\n", "for i in xrange(3):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(deltacovplane)-3, len(deltacovplane)):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \"\\n\\n\"\n", "\n", "print \"Checking for duplicates in corrected version:\"\n", "\n", "lastpos = -1\n", "for i in xrange(len(deltacovplane)):\n", " if deltacovplane[i][0] == lastpos:\n", " print \" Found duplicate: %d: [%d,%d] == %d: [%d,%d]\" % (i-1, deltacovplane[i-1][0], deltacovplane[i-1][1], i, deltacovplane[i][0], deltacovplane[i][1])\n", " lastpos = deltacovplane[i][0]\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Beginning correct list-based plane sweep over 1873 reads\n", "Plane sweep found 3698 steps, saving 99.62% of the space in 15.03 ms (295.17 speedup)!\n", " 0: [0,1]\n", " 1: [799,2]\n", " 2: [1844,3]\n", " ...\n", " 3695: [973770,2]\n", " 3696: [973779,1]\n", " 3697: [973898,0]\n", "\n", "\n", "\n", "Checking for duplicates in corrected version:\n", "\n", "\n", "\n" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## This version is 295 times faster than the basic brute force approach to produce the same output!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Heap-based plane-sweep algorithm\n", "------------------------------------\n", "\n", "Lists are bad for maintaining the plane because:\n", "\n", "- removing from the front requires shifting everything over: ``O(N)`` time\n", "- inserting into the middle (or end) also requires shifting and possibly resizing: ``O(N)``\n", "\n", "This can result in a worst case runtime of ``O(N^2)`` because for each read we may need to shift the list by all ``N`` positions. Fortunately, with real datasets the maximum length of the list is bounded by the maximum read depth (D), so will result in an ``O(N*D)`` runtime\n", "\n", "If the reads (intervals) were all the same length, we would always append to the end\n", "and remove from the very front. A good data structure for this is called a [deque](http://en.wikipedia.org/wiki/Double-ended_queue)\n", "and enables ``O(1)`` remove-from-front and ``O(1)`` append-to-end operations\n", " _________________________________\n", " appendleft -> | | | | | | | | <-- append\n", " popleft <- |____|____|____|____|____|____|___| --> pop\n", "\n", "\n", "Deques are implemented as a doublely-linked-list, and can be used from the [deque collection API](https://docs.python.org/2/library/collections.html#collections.deque)\n", "\n", " from collections import deque\n", " d = deque()\n", " d.append() <- adds to the end\n", " d.popleft() <- removes from the front\n", "\n", "this estalishes a \"queue\" for the end positions and accelerates to O(N) runtime!\n", "\n", "\n", "## Plane-sweep with a heap ##\n", "\n", "However, in general, the intervals will not be of uniform length so this will not\n", "be sufficient. The data structure we really want should support very fast (``O(1)``) ability\n", "to find the minimum item of the plane, \"fast\" ability to remove it, \n", "and \"fast\" ability to add a new item to the plane\n", "\n", "A good data structure for this is called a heap (aka min-heap, aka priority queue), and more specifically a [binary heap](http://en.wikipedia.org/wiki/Binary_heap)\n", "\n", "Binary heaps are complete binary trees such that the parent is smaller than both of the children. By construction, the height of the tree is completely balanced, except for the bottommost level which may be incomplete. This is made possible because the the relative ordering of the left and right children is arbitrary. (Note it is possible to maintain balanced binary trees in other ways, but wont support constant time look up of minimum values). In a heap it is trival to find the minimum value, since by construction it is always the top of the heap! \n", "\n", "\n", "![Binary Heap](http://upload.wikimedia.org/wikipedia/commons/6/69/Min-heap.png)\n", "\n", "\n", "\n", "\n", "\n", "** Removing an element **\n", "\n", "Removing the minimum is a bit more compplex and requires ``O(log n)`` steps. The first step is to shift a leaf value into the root position. If that value happens to be smaller than the children, then done. Otherwise, interatively swap it with the smaller of its two children. In the worst case, it will be reswapped down to the botton of the heap again in ``O(log n)`` steps. Note the deletions are always picked from the leaf nodes so the balance of tree will be guaranteed.\n", "\n", "Here is a nice tutorial on [removing an element](http://www.algolist.net/Data_structures/Binary_heap/Remove_minimum)\n", "\n", "\n", "** Inserting into a heap **\n", " \n", "Adding a new value into a heap is fast: the value is added as the first available leaf node and then repeatedly pushed up the heap if its value is smaller than its parents. In the worst case this could require ``O(log n)`` up-heap operations, but in average can be done in ``O(1)`` time. \n", "\n", "Note if the values to be inserted are in sorted order, the values are easily added in ``O(1) time`\n", "\n", "Here is a nice tutorial on [inserting an element](http://www.algolist.net/Data_structures/Binary_heap/Insertion)\n", "\n", "\n", "\n", "** Representing a heap **\n", "We can store a heap by embedding the tree inside of an array so that the children at node ``k`` are indexed at nodes ``2k+1`` and ``2k+2``. This is useful because then we dont have to incure the storage costs of the tree pointers. Up-heap or down-heap operations are a simple matter of swapping elements in the list, and finding the next free leaf is just the first free cell at the end of the list.\n", "\n", "For example, the above heap can be represented as\n", "\n", " idx 0 1 2 3 4 5 6 7 8\n", " val 1 2 3 17 19 36 7 25 100 \n", " \n", "The next free cell is 9, which is a child of 4 (``2*4+1 = 9``)\n", "\n", "In python, these techniques are implemented in the [heapq](https://docs.python.org/2/library/heapq.html) package. This restores us to ``O(N log N)`` worst case runtime. However, the time to maintain the heap is bounded by the size of it, so in practice the runtime will be much faster: ``O(N log D)`` where ``D`` is the maximum depth of coverage. For more genomics project, this is effectively a constant value around ``log(100) = 8``." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Beginning heap-based plane sweep over %d reads\" % (len(reads))\n", "\n", "starttime = time.time()\n", "\n", "## record the delta encoded depth using a plane sweep\n", "deltacovplane = []\n", "\n", "## use a list to record the end positions of the elements currently in plane\n", "planeheap = []\n", "\n", "## BEGIN SWEEP (note change to index based so can peek ahead)\n", "for rr in xrange(len(reads)):\n", " r = reads[rr]\n", " startpos = r[1]\n", " endpos = r[2]\n", "\n", " ## clear out any positions from the plane that we have already moved past\n", " while (len(planeheap) > 0):\n", "\n", " if (planeheap[0] <= startpos):\n", " ## the coverage steps down, extract it from the front of the list \n", " ## oldend = planelist.pop(0) \n", " oldend = heapq.heappop(planeheap)\n", "\n", " nextend = -1\n", " if (len(planeheap) > 0):\n", " nextend = planeheap[0]\n", "\n", " ## only record this transition if it is not the same as a start pos\n", " ## and only if not the same as the next end point\n", " if ((oldend != startpos) and (oldend != nextend)):\n", " deltacovplane.append((oldend, len(planeheap)))\n", " else:\n", " break\n", "\n", " ## Now insert the current endpos into the correct position into the list\n", "\n", " ##insertpos = -1\n", " ##for i in xrange(len(planelist)):\n", " ## if (endpos < planelist[i]):\n", " ## insertpos = i\n", " ## break\n", "\n", " ##if (insertpos > 0):\n", " ## planelist.insert(insertpos, endpos)\n", " ##else:\n", " ## planelist.append(endpos)\n", "\n", " heapq.heappush(planeheap, endpos)\n", "\n", "\n", " ## Finally record that the coverage has increased\n", " ## But make sure the current read does not start at the same position as the next\n", " if ((rr == len(reads)-1) or (startpos != reads[rr+1][1])):\n", " deltacovplane.append((startpos, len(planeheap)))\n", "\n", " ## if it is at the same place, it will get reported in the next cycle\n", "\n", "\n", "## Flush any remaining end positions\n", "while (len(planeheap) > 0):\n", " ##oldend = planelist.pop(0) \n", " oldend = heapq.heappop(planeheap)\n", " deltacovplane.append((oldend, len(planeheap)))\n", "\n", "\n", "\n", "## Report statistics\n", "planeheaptime = (time.time()-starttime) * 1000.0\n", "print \"Heap-Plane sweep found %d steps, saving %0.02f%% of the space in %0.02f ms (%0.02f speedup)!\" % (len(deltacovplane), (100.0*float(totallen-len(deltacovplane))/totallen), planeheaptime, brutetime/planeheaptime)\n", "\n", "for i in xrange(3):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(deltacovplane)-3, len(deltacovplane)):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \"\\n\\n\"\n", "\n", "print \"Checking for duplicates in heap-ified version:\"\n", "\n", "lastpos = -1\n", "for i in xrange(len(deltacovplane)):\n", " if deltacovplane[i][0] == lastpos:\n", " print \" Found duplicate: %d: [%d,%d] == %d: [%d,%d]\" % (i-1, deltacovplane[i-1][0], deltacovplane[i-1][1], i, deltacovplane[i][0], deltacovplane[i][1])\n", " lastpos = deltacovplane[i][0]\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Beginning heap-based plane sweep over 1873 reads\n", "Heap-Plane sweep found 3698 steps, saving 99.62% of the space in 14.26 ms (311.08 speedup)!\n", " 0: [0,1]\n", " 1: [799,2]\n", " 2: [1844,3]\n", " ...\n", " 3695: [973770,2]\n", " 3696: [973779,1]\n", " 3697: [973898,0]\n", "\n", "\n", "\n", "Checking for duplicates in heap-ified version:\n", "\n", "\n", "\n" ] } ], "prompt_number": 19 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "
The heap-based plane-sweep algorithm is 311 times faster than the brute force approach!
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Max-coverage read ids\n", "-------------------------\n", "\n", "Now lets use this framework to figure out which reads span the position of max coverage as we are scanning the alignments. The technique is to update the heap to store the read end position as the primary key, but also associate it with the read id. Then whenever we see we have established a new maximum coverage level, we record the ids of the reads in the heap." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"Beginning heap-based plane sweep over %d reads to search for max reads\" % (len(reads))\n", "\n", "starttime = time.time()\n", "\n", "## record the delta encoded depth using a plane sweep\n", "deltacovplane = []\n", "\n", "## use a list to record the end positions of the elements currently in plane\n", "planeheap = []\n", "\n", "## use a list to record the ids at max coverage\n", "maxcov = -1\n", "maxcovpos = -1\n", "maxcovreads = []\n", "\n", "## BEGIN SWEEP (note change to index based so can peek ahead)\n", "for rr in xrange(len(reads)):\n", " r = reads[rr]\n", " readid = r[0]\n", " startpos = r[1]\n", " endpos = r[2]\n", "\n", " ## clear out any positions from the plane that we have already moved past\n", " while (len(planeheap) > 0):\n", "\n", " if (planeheap[0][0] <= startpos):\n", " ## the coverage steps down, extract it from the front of the list \n", " oldend = heapq.heappop(planeheap)[0]\n", "\n", " nextend = -1\n", " if (len(planeheap) > 0):\n", " nextend = planeheap[0][0]\n", "\n", " ## only record this transition if it is not the same as a start pos\n", " ## and only if not the same as the next end point\n", " if ((oldend != startpos) and (oldend != nextend)):\n", " deltacovplane.append((oldend, len(planeheap)))\n", " else:\n", " break\n", "\n", " ## Now insert the current endpos into the correct position into the list\n", " heapq.heappush(planeheap, (endpos, readid))\n", "\n", "\n", " ## Finally record that the coverage has increased\n", " ## But make sure the current read does not start at the same position as the next\n", " if ((rr == len(reads)-1) or (startpos != reads[rr+1][1])):\n", " cov = len(planeheap)\n", " if (cov > maxcov):\n", " maxcov = cov\n", " maxcovpos = startpos\n", " maxcovreads = []\n", " for rr in planeheap:\n", " maxcovreads.append(rr[1])\n", "\n", " deltacovplane.append((startpos, len(planeheap)))\n", "\n", "\n", "## Flush any remaining end positions\n", "while (len(planeheap) > 0):\n", " oldend = heapq.heappop(planeheap)[0]\n", " deltacovplane.append((oldend, len(planeheap)))\n", "\n", "\n", "\n", "## Report statistics\n", "planeheaptime = (time.time()-starttime) * 1000.0\n", "print \"Heap-Plane sweep found %d steps, saving %0.02f%% of the space in %0.02f ms (%0.02f speedup)!\" % (len(deltacovplane), (100.0*float(totallen-len(deltacovplane))/totallen), planeheaptime, brutetime/planeheaptime)\n", "\n", "print \"The %d reads at the position of maximum depth (%d) are:\" % (maxcov, maxcovpos)\n", "print maxcovreads\n", "\n", "print \"\\n\\n\"\n", "\n", "for i in xrange(3):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \" ...\"\n", "\n", "for i in xrange(len(deltacovplane)-3, len(deltacovplane)):\n", " print \" %d: [%d,%d]\" % (i, deltacovplane[i][0], deltacovplane[i][1])\n", "\n", "print \"\\n\\n\"\n", "\n", "print \"Checking for duplicates in heap-ified version:\"\n", "\n", "lastpos = -1\n", "for i in xrange(len(deltacovplane)):\n", " if deltacovplane[i][0] == lastpos:\n", " print \" Found duplicate: %d: [%d,%d] == %d: [%d,%d]\" % (i-1, deltacovplane[i-1][0], deltacovplane[i-1][1], i, deltacovplane[i][0], deltacovplane[i][1])\n", " lastpos = deltacovplane[i][0]\n", "\n", "print \"\\n\\n\"\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Beginning heap-based plane sweep over 1873 reads to search for max reads\n", "Heap-Plane sweep found 3698 steps, saving 99.62% of the space in 14.36 ms (308.74 speedup)!\n", "The 49 reads at the position of maximum depth (62132) are:\n", "['read118', 'read112', 'read102', 'read113', 'read107', 'read82', 'read129', 'read114', 'read104', 'read105', 'read111', 'read125', 'read124', 'read110', 'read108', 'read135', 'read119', 'read121', 'read126', 'read128', 'read138', 'read146', 'read93', 'read96', 'read137', 'read127', 'read106', 'read148', 'read136', 'read131', 'read140', 'read139', 'read130', 'read117', 'read120', 'read141', 'read122', 'read132', 'read133', 'read134', 'read142', 'read143', 'read144', 'read145', 'read149', 'read147', 'read150', 'read151', 'read152']\n", "\n", "\n", "\n", " 0: [0,1]\n", " 1: [799,2]\n", " 2: [1844,3]\n", " ...\n", " 3695: [973770,2]\n", " 3696: [973779,1]\n", " 3697: [973898,0]\n", "\n", "\n", "\n", "Checking for duplicates in heap-ified version:\n", "\n", "\n", "\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Plot the max depth position\n", "-------------------------------" ] }, { "cell_type": "code", "collapsed": false, "input": [ " f = plt.figure(figsize=(15,4), dpi=100)\n", " \n", " print \"Plotting max depth\\n\\n\"\n", "\n", " lastend = -1\n", " lastidx = -1\n", " \n", " ## draw the layout of reads\n", " for i in xrange(min(MAX_READS_LAYOUT, len(reads))):\n", " r = reads[i]\n", " readid = r[0]\n", " start = r[1]\n", " end = r[2]\n", " rc = r[3]\n", " color = \"blue\"\n", "\n", " if (rc == 1): \n", " color = \"red\"\n", "\n", " plt.plot ([start,end], [-2*i, -2*i], lw=4, color=color)\n", " lastend = end\n", " lastidx = i\n", "\n", " ## draw the base of the coverage plot\n", " plt.plot([0, lastend], [0,0], color=\"black\")\n", "\n", " ## draw the coverage profile\n", " for i in xrange(len(deltacov)-1):\n", " x1 = deltacov[i][0]\n", " x2 = deltacov[i+1][0]\n", " y1 = YSCALE * deltacov[i][1]\n", " y2 = YSCALE * deltacov[i+1][1]\n", "\n", " ## draw the horizonal line\n", " plt.plot([x1, x2], [y1, y1], color=\"black\")\n", " \n", " ## and now the right vertical to the new coverage level\n", " plt.plot([x2, x2], [y1, y2], color=\"black\")\n", " if (x2 >= lastend):\n", " break\n", "\n", " ## draw a vertical bar with the max coverage\n", " plt.plot([maxcovpos, maxcovpos], [YSCALE*maxcov, -(2*lastidx+20)], color=\"green\", lw=2)\n", " plt.draw()\n", " plt.show()\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Plotting max depth\n", "\n", "\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAA4kAAAEACAYAAAAeDjNsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X9s1Pd9x/HXOSuwDAs1oXayUZOYuLYPQmwX24mmxBcW\nqNeMwogmo8nZNKgSIVqnBSIasa1OlIRBSBNghTpT02nKMrepZoVUoWAzHYFWnE3p2mAbVBcyqBRM\nnGTEzpqUZZ/9Ye7L+fy93/e9+37vng/pm5jv3X3v8z1/7+vv6/v+fL5fnzHGCAAAAAAASSX5bgAA\nAAAAwD0IiQAAAAAACyERAAAAAGAhJAIAAAAALIREAAAAAICFkAgAAAAAsDgaEj/55BPV19drxYoV\nkqTx8XGtXLlSFRUVWrVqlSYmJqzn7t69W1VVVfL7/Tp27JiTzQIAAAAAxOBoSNy1a5f8fr98Pp8k\nad++faqoqNCvfvUrzZs3T9/5znckSZcuXdLevXt1+PBh7du3Tx0dHU42CwAAAAAQg2Mh8Te/+Y1e\nf/11ffnLX5YxRpLU39+vdevWaebMmVq7dq1CoZAkKRQKqbW1VRUVFWppaZExRuPj4041DQAAAAAQ\ng2Mh8etf/7qeeeYZlZRce4uBgQHV1NRIkmpqatTf3y9pMiTW1tZaz6uurrYeAwAAAADkjiMh8Uc/\n+pHKyspUX19vVRElTfk5kXAXVQAAAABA7vyeEwv96U9/qv379+v111/XRx99pA8++EAPPvigGhsb\nNTw8rPr6eg0PD6uxsVGS1NzcrL6+Puv1p0+fth6LdNttt+nXv/61E00GAAAAANdbsGCBRkZGHH0P\nn0mlvJeGI0eOaOfOnXrttde0Y8cOXbhwQTt27NDmzZt16623avPmzRodHVVLS4sOHTqks2fPauPG\njTp58uT0xvp8KVUjgVzq7OxUZ2dnvpsBTMO2Cbdi24SbsX3CrXKRiRypJEYLdx1dv3692tvbVV1d\nrYaGBm3fvl2SVF5ervXr12vp0qWaMWOGurq6ctEsAAAAAEAUx0NiS0uLWlpaJEmlpaV69dVXbZ/3\nyCOP6JFHHnG6OQAAAACAOBy9TyJQTAKBQL6bANhi24RbsW3Czdg+UcwcH5OYTYxJBAAAAFDMcpGJ\nqCQCAAAAACyERAAAAACAhZAIAAAAALAQEgEAAAAAFkIiAAAAAMBCSAQAAAAAWAiJAAAAAAALIREA\nAAAAYCEkAgAAAAAshEQAAAAAgIWQCAAAAACwEBIBAAAAABZCIgAAAADAQkgEAAAAAFgIiUAR8D3u\nk+9xX76bAQAAAA8gJAJFxOcjKAIAACA+QiJQ4AiGAAAASIVjIfHChQu69957tXDhQgUCAb388suS\npPHxca1cuVIVFRVatWqVJiYmrNfs3r1bVVVV8vv9OnbsmFNNA4qaz+ebMgEAAACRHAuJn/rUp/Tc\nc89pcHBQP/zhD/W3f/u3Gh8f1759+1RRUaFf/epXmjdvnr7zne9Iki5duqS9e/fq8OHD2rdvnzo6\nOpxqGlBwkg1+xhhrinwtAAAAEOZYSLzppptUV1cnSZo7d64WLlyogYEB9ff3a926dZo5c6bWrl2r\nUCgkSQqFQmptbVVFRYVaWlpkjNH4+LhTzQM8LTIMhv8fGf5SCYzRzyc0AsA17BcBFKOcjEkcGRnR\n4OCgmpqaNDAwoJqaGklSTU2N+vv7JU2GxNraWus11dXV1mNAMUnlYCQyIIbZVQvjiXx+ZGgEADfL\nZD9lF/zslse+EECxcjwkjo+Pq62tTc8995xmz56d9IGrxM4ZxSe6OhjvOeHvUrzvVCrft+jXUFkE\nUEji7dOS2fcCQDH5PScXfuXKFT3wwAN68MEHtXLlSklSY2OjhoeHVV9fr+HhYTU2NkqSmpub1dfX\nZ7329OnT1mOROjs7rZ8DgYACgYCTq4Ai5fP50gpY2WCMmXYgE13hSyYgZtqGyPfL5+cBoHBkY1+S\nTm+LsPD+1e5xu31v9LLYDwLIh2AwqGAwmNP39BmH9njGGP31X/+15s6dq29961vW/B07dujChQva\nsWOHNm/erFtvvVWbN2/W6OioWlpadOjQIZ09e1YbN27UyZMnpzaWHTQcZhfM8v2+kcEw3e+A7/Gr\ny/hmeusU6335TgKZK4bvUXgdM11XuwCX6EJcae0zbdqZr78PABAtF383HKsk/uQnP9FLL72kxYsX\nq76+XpK0bds2rV+/Xu3t7aqurlZDQ4O2b98uSSovL9f69eu1dOlSzZgxQ11dXU41DbAVHcbszjBH\nytbZ8OjxhNESnd3OlegdUqLuWRxEAYnZfY8K4bsTvT+NnJfuutrtM8P7Rrt9dKrLT/Q6u3UphN8V\nANhxrJLohGI424r8SVQti9VtKd1tMtuhM+57ZVhJlJI7gx/ZTZXvKjCVXXCy+3c0L36Xktm/Jbuf\nSBQq3dADJFKsNrBfBJAtnq4kIveo6KQnmQpd9Flju1tQ2Em2G1S+q4TJSBSK43WTRWFLZfstpu0h\n3ti2sOjxxbH2NfE+t0Sff7xuk8k8P5nXpCtWmEylKpiNbqzpsvtsY50MCD9u9zoA03FiJb8IiQUg\n3hlbLjwSW7Jnn2N1O0q0PLsuoskGLLdK5bYakjfCL6bK5CA20WuiA08hHjBnenGp6OdH7kuSHSMX\nr+t89PNj7bfineDK1gmhWAErURf8ZJaVC/H+NsQashDv9wlg+v6G70r+EBI9Jp1qIQfq8bt5ZUuq\nQbNYOPF5F9r4LTdI9w9zKgEhVmiJF0q8+vvN5n43mbAXa16s8Bi97Gh23eyzPe7PbjnRlTivs/td\nFOJ6AtkUa9wxcouQ6ALx/lDYhZtUviipdFkqBrn8o1zsn3W6kv0d2R04F9Jnno31inWgH4vdH+Zk\n3j+dfVLk+0T+P5oXf79OtDMby0ymJ0OsCmM22xGPV37HiST7neFvM3CNkyehkDpCYh7FOpvuROWv\nmLu4cLY2f1LpYhWvuhTru+GWs42x3tfJbS/RspMZMxtvGbG6zUU/lqlkP5tY3SDz/b0u5J4ahbxu\nbkFVEfmSzvc7mSEFqezTE/1dR34REnMsmYPcZOenKl6XpVhtcrNEB+aRvLJOhciui1WikyF222qi\n70Aqy0+mvXZidVuOfCzW61LZBpMZU5bMe0f3QIi1rES/i+jn5qO6F6+raq7bUgwHNYW4Tm6VblWR\nYDmd3f4gG59TMr267PYLqexbsy2Zv4HxeppEr2e87TPZv9Wx9t9sw+5ESMyj6C9FMl2BsvmesSo1\nbu7elezOx23txlTJHGRHb5vJiHdWPtltJ9ZJhnjPSXQwkkwwi7U/SBRaU9nus3HQlM7vJVuy+Vk4\n2RYgHamc3EpmnxXrNXb7xXiv84Lo734yJ9cyfR+7z9OuB0aszzvb4dXuPcKPxfs9J3vsmahHUPTf\nF7ueH3Zt9eo2VwwIiTnixgATbyfhVPfUZHfayR7IF8ofuGKRbvBL5z1SWU4yZ0+TWV4ybY11ptuJ\nrj/JPicdxfZdo+slciXR/jFRVStRgIwXIpw4aI+3/0zlexXrJFcy+2e745lUv9OxjktSqRbG6t2R\nTk+TePOSLUKkKpkAHr095aJdcAYhMQe8VE6324GlKplKjN3j0a/N5Owo3M3NvyunKvrxThS5qSLn\nVdEHf9k82GUfg1xJFByit8VY22SqJ1DT7fKaqlg9liJ7MyUbxlLtRWG3jEzXM9OTkomOjWI9ls99\nUSonJ91YIEHyCIlZlM++59kUr6tcovWIFTCTPciK110w2fYC2ZSN7YttNLfSPcEVjQMc5EOiak2y\nf0fT7U6ebFCM9f1IdII3Xi+MZEJXZKCM95p4y4jVznxJ1JZ01tcN0qkYwz0IiVnmpS9vMpLtmmH3\nxyLdz6LQPkMA+ZFJRcSLB2QoHLFOdDjVTT+Z944Uq8tkLr43mR5jRC/HDWIF2ETh2iu83PZiRkjM\nULGdHbE76OJgCoDbZFJNZJ8GN8j39peoy2tYOj2AcE2649sBpxESM1BsO8R4XWCK5TMA4D3JjkMq\ntpN+QCx2J1ncNK4agPMIiRkqth1jsa0vgOJQbCf9gFTw/QCKDyExRZxpBgDvsLuKIvtxIDlcvAko\nXoTEFHAmDQC8p1Au/gDkklevqAkgOwiJcTD2DgC8i/01kBm+Q0DxIiTGwNkzAAAAAMWIkBiF/vcA\nAAAAillJvhsQ6Y033lBtba2qqqq0Z88ex98vfAGDyEmaDIcERAAAAADFyFUh8ZFHHlFXV5f6+vr0\n7W9/W2NjY468j10gjJwAAAAAoFi5prvp5cuXJUn33HOPJGn58uUKhUK6//77pzzP/srl6VzO3MRZ\nnneQaQEAAABkk2sqiQMDA6qpqbH+7ff7dfz48Tiv8EVMk5Evtcnn2gkAAAAA8sU1lcTk2YeowopW\nya+N1yuhyJHOyf9xE3EAAOA1xT4cLBgMKhgM5vQ9fcYln/rly5cVCAT085//XJL01a9+Va2trVO6\nm04e4No3N5UKnE9m2vN9EcuN9xiyxx1bXnHwPX614v5NPnQAAAAv8/l8jgdn13Q3nTNnjqTJK5y+\n9dZb6u3tVXNzc55bBQAAAADFxVXdTZ9//nk9/PDDunLlijo6OjR37txpz4kdmpNI0w51tSv0cYRU\nUgEAAIDi4ZrupsnIRWnV50u/u2khdVN1IvhGfx7e2fK8j+6mAAAAhSEXmchVlUQ3mPy8o8KMMi9C\n5rra6MaASigEAAAA3I+QmKTIgJNJYHRjeLNjd3EfAAAAAIWPkJiG6IpYqqExHL68EhgBAAAAFA9C\nYhZcC43R6TGDZUYESddV9Og3CgAAABQsQqKDpmSpDHOeq8Jilq4SG6+SSg4FAAAA8oOQmCtXU4+R\nMg6MYeGQ5ZrwmKK47XZilUieAAAAQEKExHyIDIyxeDP3AQAAAPA4QqJbTQmShtCYDRnfx4RKJAAA\nAAofIdEjrPs3Zmk8oJ1YYwTtuoUW5ZVZr34MZEUAAAAUMkKi10QklCk/ORge4/HqeMiMJFplUiQA\nAAA8jJBYKNINJlkMl3bVxeIMkVHrTGgEAACAhxASi11kgHEwz7nqFh4AAAAAYiIkwhK74GVXIbwq\njdxXbOMZi2ttAQAA4HWERGQmnCwdHBPp+QqkT3Q5BQAAgGcQEpEdxsS8Nmq+LqrjKql8BgRKAAAA\n5FFJvhuAImAMuQcAAADwCCqJyJ2rSTGcF7NRYCzI8Y1OFF47ry6aez0CAAAgAUIi8ib5oJJal9Xw\nGEavXlG1IIMvAAAAPIOQCG+YcquO5IMfgQsAAABIjSNjEh999FHV1taqoaFBX/va1/Tb3/7Wemz3\n7t2qqqqS3+/XsWPHrPnDw8NqaGhQZWWltm7d6kSzUCiMuTZlshj5PFdpDLc51Sn69fLleAIAAIBn\nOBISly9frsHBQZ04cUIffvihXn75ZUnSpUuXtHfvXh0+fFj79u1TR0eH9ZpNmzZpy5YtGhgY0JEj\nR3TixAknmoYCE5kX082NdiHKrRMAAADgNEdC4rJly1RSUqKSkhJ94Qtf0JEjRyRJoVBIra2tqqio\nUEtLi4wxmpiYkCSdOXNGbW1tuvHGG7V69WqFQiEnmoZicDUtphMYw3Es+t9enwAAAIBkOX4LjH/6\np3/SihUrJEn9/f2qra21HquurlYoFNLIyIjKysqs+X6/X8ePH3e6aSgC2a40eh0hEgAAAImkfeGa\nZcuW6eLFi9PmP/3001YofOKJJ1RaWqq/+Iu/kCQZmyN0n814JbvnhXV2dlo/BwIBBQKBFFuOYndt\n87r6Q5Jj5rwcFsMtd2QduJ8GAACAY4LBoILBYE7fM+2Q2NvbG/fxf/7nf9bBgwd1+PBha15zc7P6\n+vqsf58+fVqNjY0qLS3V6OioNX9oaEh33nmn7XIjQyKQFcYom/duLGQ+pdeNFwAAAOmJLow9/vjj\njr+nI91Nf/zjH+uZZ57R/v37NWvWLGt+U1OTDh48qPPnzysYDKqkpESlpaWSpJqaGnV3d2tsbEw9\nPT1qbm52omlAXFaX1AzGNUpybVfObIxtJEgDAAAUNp+J17czTVVVVfrd736nG264QZJ01113ae/e\nvZKkXbt2ac+ePZoxY4a6urp09913S5qsHra3t+v999/XmjVrtG3btumN9fnidkUFHJXm/Rnd0E3V\n1zn5f9OZpwbwvQUAAMiKXGQiR0KiUwiJcK2oAOnWkKhOb3x/+JoDAADYy0UmSntMIoAI0V/UFHJh\nbrqk5j+oAgAAwBsIiYADpmbG6ABpH9icrDg6enXTNLhtrCYAAACuISQCuRaZIIv0KjAJw2r0w/Q/\nBQAAyBlCIpBPxiTdEzSz6psvC8uIzy1VSgAAAGSGkAjkmTGFUVBMJYASKAEAANyLkAi4wGRvyuym\nxcgg5rYxiQAAAHAvQiLgJvHG3qWY76ZW9pzvbuqoq+vO0EQAAADnERIBj0j1ejduqCRGh9KM3z/W\ny0mPAAAAWUNIBDwoqUzkC//PSFMCY+4DVTbCadx2J7l4siQAAEBihESgyKQT2DINlm7p5prJkE8C\nJgAAKBaERAAp8erFb9wSVAEAANyOkAgUqnDpy4FMFxm4vBIaHRsPWUgolwIAAEkl+W4AAGdNOe53\nMAT4ZIp2AgAAKCRUEoFik0ZQnPKKFCpq0dW7QgpUXqmgAgAApIpKIoDUZFiNDNffAAAA4E5UEgGk\nbGpOdG7so5ulUxVlyB8AAPACQiKArLANQJncckK+guqeCgAA4BWERADOiUqORso4OHoVgRcAAHgF\nIRFAbl0NjuHIlMkN7u34ZKww6YZgFm6LkS/pgGzXbrqqAgCAXHH0wjXPPvusSkpK9N5771nzdu/e\nraqqKvn9fh07dsyaPzw8rIaGBlVWVmrr1q1ONguAixhjP2Vt+RE3q8iHdG6lYfeofBlOAAAASXIs\nJF64cEG9vb2aP3++Ne/SpUvau3evDh8+rH379qmjo8N6bNOmTdqyZYsGBgZ05MgRnThxwqmmAfAC\nY3ISHPMdIiM5ej9H8iIAAEiSYyFx48aN2rFjx5R5oVBIra2tqqioUEtLi4wxmpiYkCSdOXNGbW1t\nuvHGG7V69WqFQiGnmgbAq2KVHZ1IkuG3dEF4zIa0qpIAAKAoORISX331Vc2bN0+LFy+eMr+/v1+1\ntbXWv6urqxUKhTQyMqKysjJrvt/v1/Hjx51oGoACZ4yswMg4vmuoPgIAgGSlfeGaZcuW6eLFi9Pm\nP/XUU9q2bZsOHTpkzTPhC1XYHLH5bI4+7J4X1tnZaf0cCAQUCARSaDWAomMix/pdRegBAAAeEQwG\nFQwGc/qePhMvkaXh1KlT+pM/+RNdf/31kqTf/OY3+qM/+iOFQiH19/err69Pu3btkiTV1dXp6NGj\nKi0tVWVlpc6ePStp8oI3s2bN0oYNG6Y21ueLGyAB2PM9fvUKm9/k+5MyX/L3a3Rr19RsXOWVXS8A\nAO6Qi0yU9e6mixYt0ujoqM6dO6dz585p3rx5OnnypMrLy9XU1KSDBw/q/PnzCgaDKikpUWlpqSSp\npqZG3d3dGhsbU09Pj5qbm7PdNABIXcQFdDIRfQVTu8ecmqatkkvDLAAAcAfH75MY2Z20vLxc69ev\n19KlSzVjxgx1dXVZj+3cuVPt7e167LHHtGbNGi1ZssTppgFASiKDopfH6aVTWXR6falUAgDgHlnv\nbuokupsC6aG7aY7ESVLhYGZXxYu+R2L0PDfKpBoZa93YvQMAkFguMpHjlUQAKBrxdthZqMQZ+WwD\npZvEC7dubC8AAJiOkAgAOXAtP9qNEcxOd06fTN6DWDLvH6uamssuvFQtAQCIjZAIAC5gTPbH/bm9\nyyoAAHAnQiIAuMRkdSs8dtEGvTUBAEAOEBIBwCuMiQqPV/+VQgkyelyj26XafdZL6wYAgFsREgHA\n60xk9TH5fqv5Hr/oBNt1YgAiAAApISQCQKEJh6LCy4AAACAHCIkAUKBsC2gERwAAkAAhEQCKSeS4\nxmIJjKleNpbuqQCAIkdIBIBileF9N9xwX8ZMxLzITcQqkRcBAMWIkAgAxSydFBQnWEYHL7ddTdXL\noRYAgFwhJAIAUpNhBTLXQc1NIRUAAC8gJAIAUmdMxmMa7aqOToi13KTez8k8S19WAIBLERIBAGmZ\nzDhRQSdPvTk9WS10+LNyJHQTbAGgKBASAQBZMz1DXLuaagY9VNNri8vGQ6bCLuB5dV0AAN5DSAQA\n5MS1AHn1hzxUHblwDQAAiRESAQB5YVd1TEuGt/GYfOfCC4+OVB7jfEz0RAWAwkFIBAB4mzH2cSjF\n3Be+76Mbu3UWYogFALhXSb4bAACAI4xJu7rltlDmxuAKAChcjoXE733ve6qtrdXChQu1ZcsWa/7u\n3btVVVUlv9+vY8eOWfOHh4fV0NCgyspKbd261almAQCKjDHXpin/SDJBGvlyPgEAkE+OdDc9deqU\nXnjhBe3fv19VVVV65513JEmXLl3S3r17dfjwYZ07d04dHR06efKkJGnTpk3asmWL7rvvPq1cuVIn\nTpzQkiVLnGgeAACSoi6mk0E2K/RKH+MNAaC4OBISDxw4oHXr1qmqqkqS9JnPfEaSFAqF1NraqoqK\nClVUVMgYo4mJCc2ePVtnzpxRW1ubJGn16tUKhUKERABAzoTv+2iklANjIVT/Cj3oAgCS50h300OH\nDunUqVNasmSJvvzlL2toaEiS1N/fr9raWut51dXVCoVCGhkZUVlZmTXf7/fr+PHjTjQNAIDEro5n\nLKYKWtzur74MJwCAp6RdSVy2bJkuXrw4bf5TTz2ljz76SO+9956OHj2qvr4+feUrX9F//Md/yNj8\ntfXZ/PGwe15YZ2en9XMgEFAgEEir/QAAJCX6b1IWbrmRa3mvdBZKUCymswYAXCMYDCoYDOb0PX0m\nXiJL06OPPqpAIKD7779fkvSHf/iHOnv2rHp7e9XX16ddu3ZJkurq6nT06FGVlpaqsrJSZ8+elSQ9\n++yzmjVrljZs2DC1sT5f3AAJwJ7v8ckDNPNNvj+AI5IMQZG32ch7cEuBXbj1UvuzhmMQAC6Qi0zk\nSHfTu+66SwcOHJAxRqFQSAsWLNCsWbPU1NSkgwcP6vz58woGgyopKVFpaakkqaamRt3d3RobG1NP\nT4+am5udaBoAANmX4hVTw9w+DjDc4TTeY4kmAID3OHLhmpUrV+rQoUPy+/2qqanRt771LUlSeXm5\n1q9fr6VLl2rGjBnq6uqyXrNz5061t7frscce05o1a7hoDQDAm9LsnhoZqCKrdG4MWkVZRQSAIuJI\nd1On0N0USA/dTQEX8flsu50mCokFFcz4Ww4AactFJnKkkggAAGIwxop9hZT7UpLNC9kQOAEg6wiJ\nAADkSfjejJKKNzDaSKmLrY+cCADZRkgEAMAFpgYdM/2nLIZIN11dNToQWvdmTG0h05EcASBthEQA\nALwgi/drtBYRYxxkLjn2vm64NyNBFYBHERIBAPCiWAHEDeEoQ6le0dUtVVEAKBSERAAACkm86lUK\nAdKNt96IJZO2EjABYDpCIgAAxcJY11VVKldXJUgBQHEhJAIAUKRiXSwn2Yqjmy6Ak3OMNwRQwAiJ\nAABgKpPePRx9Vo2yCIKjU2M/CZ8AXICQCAAAppmSVTIIjEiRA9mT3AkgVYREAAAQX8wrqWZh0cVQ\ndYxCgAbgdoREAACQlsnsGDmWMbPlEZ4AwB0IiQAAICuuFRzTG9MIAHAHQiIAAMg6YzK/tkshdkVN\ntlrKOEIA+URIBAAAjkgu6CSXJumKCgC5Q0gEAAD5FU6TaRQO81FtdDqwUkUEkG+ERAAA4ArTwpFL\ne5tmI5hSGQXgZoREAADgTia1K6f6ZGTky2kASzcwxn1dNsIx5UgAGSAkAgAA17N6pLqsuujaimA6\nXXdduioAcq/EiYUODQ3pz/7sz1RXV6cVK1ZoeHjYemz37t2qqqqS3+/XsWPHrPnDw8NqaGhQZWWl\ntm7d6kSzAACAxxkzdcok2Rj5XDkBQL45EhKfeOIJ/dVf/ZX+8z//U3/5l3+pJ554QpJ06dIl7d27\nV4cPH9a+ffvU0dFhvWbTpk3asmWLBgYGdOTIEZ04ccKJpgEAgEJjJcbU5T8STp/yxeebOgEoXo50\nN50zZ47effdd/d///Z/effddffrTn5YkhUIhtba2qqKiQhUVFTLGaGJiQrNnz9aZM2fU1tYmSVq9\nerVCoZCWLFniRPMAAEABmsyJJuMxffmu5rm2CyuAouFISHzmmWfU1NSkb3zjG7r55putqmB/f79q\na2ut51VXVysUCmn+/PkqKyuz5vv9fv3rv/6rNmzY4ETzAABAAYtVVEylOua1oOZIsE1mkQxkBApS\n2iFx2bJlunjx4rT5Tz31lP7lX/5FX/3qV/Xwww/r29/+ttauXasf/OAHMjY7Ep/NHtvueWGdnZ3W\nz4FAQIFAIK32AwCA4pKNW2wkCmP5CpdOvm++K6tAsQsGgwoGgzl9T5+Jl8jSdNNNN+ncuXP6/d//\nfU1MTOi2227TxYsX9dprr6mvr0+7du2SJNXV1eno0aMqLS1VZWWlzp49K0l69tlnNWvWrGmVRJ/P\nFzdAArDne3zyD7z5Jt8fAEjIF/s2GsmGxHwHq2yGxrjrwnEZkHO5yESOXLjm3nvv1f79+yVJr776\nqpYtWyZJampq0sGDB3X+/HkFg0GVlJSotLRUklRTU6Pu7m6NjY2pp6dHzc3NTjQNAAAgviwefEVf\njMaLF7+J+16+6Re8iTUB8A5HKomDg4N68sknNTQ0pEWLFunv/u7vVFNTI0natWuX9uzZoxkzZqir\nq0t33323pMnbZrS3t+v999/XmjVrtG3btumNpZIIpIVKIgBkSYK0E11JjPXvRFJ9vtdwOAekLxeZ\nyJGQ6BRCIpAeQiIA5JgvuyEx391XE0k1zHI4B6QvF5nIkaubAgAAFLXwAZy7s13WpBxi7Z5OcgRc\ng5AIAADgkGu5x0T8NwlphEvPd01NYZ3Jk4CzCIkAAABuY66Fykwu+hKvwuflUJnLC+EQSFGMCIkA\nAAAuNhm8w0oeAAANf0lEQVRSwt1Xc5OO3D4GMlu8HJQBJxESAQAAvCJeWcuhXOeTkVHse0cWipjB\nuJDyMmVRJImQCAAAUABsj/+zHHDsglS+wmMxBFcgXwiJAAAAhSpO5ShuvMowXOaqu6qbu8VmEmDd\nvF4oDoREAAAATBUZLjPIK3ZBqVgCULGsJwoTIREAAAAxhS+cYyRHxud5vcuoY2GQ8YPII0IiAAAA\nkmMTXDK9Tce15SR3u45cVeiSDa+ZhlwqjnAjQiIAAAAyMj07hm/Z4cz7ha+46iQnlx8ZLPMRgIFE\nCIkAAABwhH2Pyeze8zGVSp7rQxhdTOEShEQAAADk3tVAZMWiHOQ3t1TtuH0H3I6QCAAAgPxL+nYd\nJuuBMhfdV6PZvl++C51UMnEVIREAAACeEr7iqqT8ByugABESAQAA4FmpFL+yNAwSKHiERAAAABSF\nqYEyuxfQCQuPNXT9RXLsJPlZJDOekp6r3kZIBAAAQPEyRlGX0JmUhfBYzBenCX98hEVvKkn3ha+8\n8ooWLlyo6667TidPnpzy2O7du1VVVSW/369jx45Z84eHh9XQ0KDKykpt3brVmn/lyhWtW7dO8+fP\nVyAQ0MWLF9NtFgAAAJA5Y65N2V60fAU/wdvSDom33367enp6dM8990yZf+nSJe3du1eHDx/Wvn37\n1NHRYT22adMmbdmyRQMDAzpy5IhOnDghSerp6dHly5c1PDys1tZWPfnkk+k2CwAAAMiuyMAYb8qQ\nFyqPyUREeF/a3U1ramps54dCIbW2tqqiokIVFRUyxmhiYkKzZ8/WmTNn1NbWJklavXq1QqGQlixZ\nolAopPb2dl1//fV66KGH9IUvfCHdZgEAAAD5YXVdlTK5VYfXg1Zk+xP12qU7qjulXUmMpb+/X7W1\ntda/q6urFQqFNDIyorKyMmu+3+/X8ePHrdf4/X5J0g033KDR0VF9/PHH2W4aAAAAkDMO9lgtGD5f\nchNyK24lcdmyZbbjA59++mmtWLHC9jXG5lvgs/nNGmOs+caYKa+zWwYAAADgVbZXVo1GGIJLxA2J\nvb29KS+wublZfX191r9Pnz6txsZGlZaWanR01Jo/NDSk5uZm6zVDQ0Oqrq7We++9p/Lycs2cOdN2\n+Z2dndbPgUBAgUAg5TYCAAAAbjMZJNPvporCFAwGFQwGc/qeWbkFRmTlr6mpSY8++qjOnz+vs2fP\nqqSkRKWlpZImxzF2d3frvvvuU09Pj55//nlJkyHxpZde0vLly/XCCy/ozjvvjPlekSERAAAAKDTx\nOtXR9bL4RBfGHn/8ccffM+2Q2NPTo46ODo2Njen+++9XfX29Dhw4oPLycq1fv15Lly7VjBkz1NXV\nZb1m586dam9v12OPPaY1a9ZoyZIlkqQ///M/149//GPV1taqsrJS3d3dma8ZAAAAUGAYlYVc8BkP\nDQD0+XyMVwQAAABQtHKRibJ+dVMAAAAAgHcREgEAAAAAFkIiAAAAAMBCSAQAAAAAWAiJAAAAAAAL\nIREAAAAAYCEkAgAAAAAshEQAAAAAgIWQCAAAAACwEBIBAAAAABZCIgAAAADAQkgEAAAAAFgIiQAA\nAAAACyERAAAAAGAhJAIAAAAALIREAAAAAICFkAgAAAAAsBASAQAAAAAWQiIAAAAAwJJ2SHzllVe0\ncOFCXXfddfrZz35mze/t7dWSJUu0ePFirVq1Sv39/dZjw8PDamhoUGVlpbZu3WrNv3LlitatW6f5\n8+crEAjo4sWL6TYLAAAAAJCBtEPi7bffrp6eHt1zzz3y+XzW/M985jP60Y9+pF/+8pfauHGjNm/e\nbD22adMmbdmyRQMDAzpy5IhOnDghSerp6dHly5c1PDys1tZWPfnkkxmsEpAfwWAw300AbLFtwq3Y\nNuFmbJ8oZmmHxJqaGn3uc5+bNr+urk433XSTJOnuu+/WqVOn9Mknn0iSzpw5o7a2Nt14441avXq1\nQqGQJCkUCqm9vV3XX3+9HnroIWs+4CX8MYFbsW3Crdg24WZsnyhmjo5J/Ld/+zfddddduu666zQy\nMqKysjLrMb/fr+PHj0uS+vv75ff7JUk33HCDRkdH9fHHHzvZNAAAAACAjd+L9+CyZctsxwc+/fTT\nWrFiRdwFv/nmm/r7v/979fb2SpKMMVMeN8ZY3VSNMVMej34uAAAAACBHTIYCgYD52c9+NmXehQsX\nzOc+9znz05/+dMr8W2+91fp5586d5h//8R+NMcZs3LjR/Pu//7sxxph3333XfP7zn7d9rwULFhhJ\nTExMTExMTExMTExMRTktWLAg0wiXUNxKYrJMROXvv//7v3X//fdr+/btuuuuu6Y8r6amRt3d3brv\nvvvU09Oj559/XpLU3Nysl156ScuXL9cLL7ygO++80/Z9RkZGstFcAAAAAEAMPmPS69vZ09Ojjo4O\njY2Nac6cOaqvr9eBAwf05JNP6h/+4R9UVVVlPbe3t1dz587V0NCQ2tvb9f7772vNmjXatm2bpMlb\nYDz88MPq6+tTZWWluru7rYvfAAAAAAByJ+2QCAAAAAAoPI5e3TSb3njjDdXW1qqqqkp79uzJd3NQ\noG655RYtXrxY9fX1ampqkiSNj49r5cqVqqio0KpVqzQxMWE9f/fu3aqqqpLf79exY8es+cPDw2po\naFBlZaW2bt1qzb9y5YrWrVun+fPnKxAI2F4YCghbu3atysvLdfvtt1vzcrU9vvLKK6qurlZ1dbV+\n+MMfOrym8Bq7bbOzs1Pz5s1TfX291bsojG0TuXLhwgXde++9WrhwoQKBgF5++WVJ7DvhDrG2T1fu\nPx0f9ZgldXV15siRI+att94y1dXV5p133sl3k1CAbrnlFvPuu+9Ombd9+3bzla98xXz00Udmw4YN\n5plnnjHGGDM6Omqqq6vNf/3Xf5lgMGjq6+ut1/zpn/6p6e7uNmNjY+aP//iPzcDAgDHGmO9///vm\ngQceMB9++KHZtm2b2bBhQ+5WDp7zxhtvmJMnT5pFixZZ83KxPX7yySemsrLSvPnmm+YXv/hFTgbI\nw1vsts3Ozk7z7LPPTnsu2yZy6e233zY///nPjTHGvPPOO+bWW281H3zwAftOuEKs7dON+09PVBIv\nX74sSbrnnns0f/58LV++XKFQKM+tQqEyUT2w+/v7tW7dOs2cOVNr1661tr1QKKTW1lZVVFSopaVF\nxhjrzOSZM2fU1tamG2+8UatXr57ymvb2dl1//fV66KGH2I4R1913361Pf/rTU+blYnscHBzUokWL\ntGjRIi1evFh+v1+Dg4M5XHO4nd22KU3ff0psm8itm266SXV1dZKkuXPnauHChRoYGGDfCVeItX1K\n7tt/eiIkDgwMqKamxvq33+/X8ePH89giFCqfz6elS5dq1apV2r9/v6Sp219NTY36+/slTX4Ja2tr\nrddWV1crFAppZGREZWVl1vzI7bW/v19+v1+SdMMNN2h0dFQff/xxTtYNhcHp7fGjjz5SKBSy5ke/\nBohnz549uvPOO7V9+3aNj49LmtzO2DaRDyMjIxocHFRTUxP7TrhOePtsbm6W5L79pydCIpArP/nJ\nT/SLX/xC27Zt08aNG3Xx4kXbMzux+Hy+afOMMdZ8Y8yU5aWybEBKbZvJ5vZotywg0vr163Xu3Dkd\nPHhQv/71r9XV1SXJfrti24TTxsfH1dbWpueee06zZ89m3wlXidw+/+AP/sCV+09PhMTGxkadPn3a\n+vfg4GDMeykCmbj55pslSbW1tfrSl76k1157TY2NjRoeHpY0OUi4sbFR0uT9PYeGhqzXnj59Wo2N\njbrttts0OjpqzR8aGrLOEkW+5r333lN5eblmzpyZk3VDYXB6e5w1a9a0ZUW+BoilrKxMPp9Pc+bM\n0YYNG9TT0yOJbRO5d+XKFT3wwAN68MEHtXLlSknsO+EedtunG/efngiJc+bMkTR5hdO33npLvb29\nfOmQdf/zP/9jlfffeecdHTx4UK2trWpubtaLL76o3/72t3rxxRetExRNTU06ePCgzp8/r2AwqJKS\nEpWWlkqa7MrS3d2tsbEx9fT0TPnivvTSS/rwww/1wgsvcLIDKcvF9uj3+3Xq1Cm9+eab+uUvf6nB\nwUEtXLgwPysMz3j77bclSf/7v/+rl19+WV/84hclsW0it4wxWrdunRYtWqSvfe1r1nz2nXCDWNun\nK/efSV2KxwWCwaCpqakxCxYsMLt27cp3c1CAzp49a+644w5zxx13mKVLl5rvfve7xhhjPvjgA/Ol\nL33JfPaznzUrV6404+Pj1muef/55s2DBAlNbW2veeOMNa/7g4KCpr683t9xyi/nGN75hzf/d735n\n/uZv/sZ89rOfNS0tLebtt9/O3QrCc9asWWNuvvlmM2PGDDNv3jzz4osv5mx7/P73v2+qqqpMVVWV\n+cEPfpCbFYZnhLfNT33qU2bevHnmu9/9rnnwwQfN7bffbj7/+c+br3/961OuFM22iVw5evSo8fl8\n5o477jB1dXWmrq7OHDhwgH0nXMFu+3z99ddduf/0GcOgKAAAAADAJE90NwUAAAAA5AYhEQAAAABg\nISQCAAAAACyERAAAAACAhZAIAAAAALAQEgEAAAAAFkIiAAAAAMBCSAQAAAAAWP4fyzwBccqyU+QA\nAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Summary\n", "-----------\n", "\n", "In this example, the heap-based plane sweep algorithm was 300 to 400 times\n", "faster than the basic brute force approach on my laptop! The advantage\n", "of the heap-based approach over the list based approach will become even \n", "more significant with deeper coverage as ``log(D)`` and ``D`` become more significant differences.\n", "\n", "In the next session we will explore the related question of how to index \n", "the alignments to be able to quickly determine which reads span a particular location. Plane-sweep algorithms are also widely used to solve many interesting problems in [computational geometry](http://en.wikipedia.org/wiki/Computational_geometry)\n" ] } ], "metadata": {} } ] }