Instruções

União de hashes

Um inner join é uma operação que combina duas tabelas de dados em uma tabela, com base na correspondência dos valores da coluna. A maneira mais simples de implementar esta operação é o algoritmo de junção de laço aninhado, mas uma alternativa mais escalável é o algoritmo de junção de hashes. O algoritmo de união de hashes (ou "hash join") consiste em duas etapas: <ol> <li><strong>Hash phase:</strong> Create a multimap from one of the two tables, mapping from each join column value to all the rows that contain it.</li> <ul> <li>The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm.</li> <li>Ideally we should create the multimap for the smaller table, thus minimizing its creation time and memory size.</li> </ul> <li><strong>Join phase:</strong> Scan the other table, and find matching rows by looking in the multimap created before.</li> </ol> Em pseudocódigo, o algoritmo poderia ser expresso da seguinte forma: <pre><strong>let</strong> <i>A</i> = the first input table (or ideally, the larger one) <strong>let</strong> <i>B</i> = the second input table (or ideally, the smaller one) <strong>let</strong> <i>j<sub>A</sub></i> = the join column ID of table <i>A</i> <strong>let</strong> <i>j<sub>B</sub></i> = the join column ID of table <i>B</i> <strong>let</strong> <i>M<sub>B</sub></i> = a multimap for mapping from single values to multiple rows of table <i>B</i> (starts out empty) <strong>let</strong> <i>C</i> = the output table (starts out empty) <strong>for each</strong> row <i>b</i> in table <i>B</i>: <strong>place</strong> <i>b</i> in multimap <i>M<sub>B</sub></i> under key <i>b(j<sub>B</sub>)</i> <strong>for each</strong> row <i>a</i> in table <i>A</i>: <strong>for each</strong> row <i>b</i> in multimap <i>M<sub>B</sub></i> under key <i>a(j<sub>A</sub>)</i>: <strong>let</strong> <i>c</i> = the concatenation of row <i>a</i> and row <i>b</i> <strong>place</strong> row <i>c</i> in table <i>C</i> </pre>

O que fazer:

Implemente o algoritmo de "hash join" como uma função e demonstre que ele passa pelo caso de teste listado abaixo. A função deve aceitar dois arrays de objetos e retornar um array de objetos combinados. Entrada <table> <tr> <td style="padding: 4px; margin: 5px;"> <table style="border:none; border-collapse:collapse;"> <tr> <td style="border:none"><i>A =</i></td> <td style="border:none"> <table> <tr> <th style="padding: 4px; margin: 5px;">Age</th> <th style="padding: 4px; margin: 5px;">Name</th> </tr> <tr> <td style="padding: 4px; margin: 5px;">27</td> <td style="padding: 4px; margin: 5px;">Jonah</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">18</td> <td style="padding: 4px; margin: 5px;">Alan</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">28</td> <td style="padding: 4px; margin: 5px;">Glory</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">18</td> <td style="padding: 4px; margin: 5px;">Popeye</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">28</td> <td style="padding: 4px; margin: 5px;">Alan</td> </tr> </table> </td> <td style="border:none; padding-left:1.5em;" rowspan="2"></td> <td style="border:none"><i>B =</i></td> <td style="border:none"> <table> <tr> <th style="padding: 4px; margin: 5px;">Character</th> <th style="padding: 4px; margin: 5px;">Nemesis</th> </tr> <tr> <td style="padding: 4px; margin: 5px;">Jonah</td> <td style="padding: 4px; margin: 5px;">Whales</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">Jonah</td> <td style="padding: 4px; margin: 5px;">Spiders</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">Alan</td> <td style="padding: 4px; margin: 5px;">Ghosts</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">Alan</td> <td style="padding: 4px; margin: 5px;">Zombies</td> </tr> <tr> <td style="padding: 4px; margin: 5px;">Glory</td> <td style="padding: 4px; margin: 5px;">Buffy</td> </tr> </table> </td> </tr> <tr> <td style="border:none"> <i>j<sub>A</sub> =</i> </td> <td style="border:none"> <i><code>Name</code> (i.e. column 1)</i> </td> <td style="border:none"> <i>j<sub>B</sub> =</i> </td> <td style="border:none"> <i><code>Character</code> (i.e. column 0)</i> </td> </tr> </table> </td> </tr> </table> Saída | A_age | A_name | B_character | B_nemesis | | ----- | ------ | ----------- | --------- | | 27 | Jonah | Jonah | Whales | | 27 | Jonah | Jonah | Spiders | | 18 | Alan | Alan | Ghosts | | 18 | Alan | Alan | Zombies | | 28 | Glory | Glory | Buffy | | 28 | Alan | Alan | Ghosts | | 28 | Alan | Alan | Zombies | A ordem das linhas na tabela de saída não é significativa.

Critérios de Aceitação:

Testes:

  • `hashJoin` deve ser uma função.
  • `hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])` deve retornar `[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]`

Console