fase 1 e 2 atualizadas

MaviMendes · Dec 23, 2021 · 3fd18fd · 3fd18fd
1 parent 3fc61c1
commit 3fd18fd
Show file tree

Hide file tree

Showing 3 changed files with 1,291 additions and 240 deletions.
diff --git a/fase1.py b/fase1.py
diff --git a/projeto2_fase1.ipynb b/projeto2_fase1.ipynb
diff --git a/projeto2_fase2.ipynb b/projeto2_fase2.ipynb
@@ -0,0 +1,284 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "projeto2-fase2.ipynb",
+      "provenance": [],
+      "collapsed_sections": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "**Trabalho 2 de algoritmo e estrutura de dados 2**<br>\n",
+        "Bruno Gabriel Justino dos Santos<br>\n",
+        "Maria Vitória Ribeiro Mendes<br>\n",
+        "2021\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "Q7p-NhEHJyXC"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "1CKAuU-fzEOg",
+        "outputId": "799de52d-448d-4300-ce0e-eda3123f483a"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Mounted at /content/drive/\n"
+          ]
+        }
+      ],
+      "source": [
+        "from operator import itemgetter\n",
+        "import networkx as nx\n",
+        "from networkx.algorithms import community\n",
+        "\n",
+        "from google.colab import drive\n",
+        "drive.mount('/content/drive/', force_remount=True)\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "\n",
+        "mention_file = \"/content/drive/MyDrive/projeto2/higgs-mention_network.edgelist\"\n",
+        "\n",
+        "reply_file = \"/content/drive/MyDrive/projeto2/higgs-reply_network.edgelist\"\n",
+        "\n",
+        "retweet_file = \"/content/drive/MyDrive/projeto2/higgs-retweet_network.edgelist\"\n",
+        "\n",
+        "activitie_file = \"/content/drive/MyDrive/projeto2/higgs-activity_time.txt\"\n",
+        " \n",
+        "#friend_follower = nx.DiGraph(nx.read_adjlist(friend_follower_file))\n",
+        "mention = nx.MultiDiGraph(nx.read_adjlist(mention_file))\n",
+        "reply = nx.MultiDiGraph(nx.read_adjlist(reply_file))\n",
+        "retweet = nx.MultiDiGraph(nx.read_adjlist(retweet_file))\n"
+      ],
+      "metadata": {
+        "id": "WSEY2ibDR0EY"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "print(\"Mention: \",nx.info(mention))\n",
+        "print(\"Reply graph: \",nx.info(reply))\n",
+        "print(\"Retweet graph: \",nx.info(retweet))"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "wHN2x0F14O5E",
+        "outputId": "325a0cf9-0ed7-44a6-948e-b4247e6f6020"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Mention:  MultiDiGraph with 116419 nodes and 505311 edges\n",
+            "Reply graph:  MultiDiGraph with 38928 nodes and 115889 edges\n",
+            "Retweet graph:  MultiDiGraph with 256496 nodes and 1131744 edges\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "Fase 2\n",
+        "---\n",
+        "Fase 2: Para esta fase iremos precisar dos outros trˆes conjuntos de dados. Espera-se, para esta fase, a reposta a algumas perguntas desafiadoras:<br>\n",
+        "(1) Os usu ́arios tidos como os mais ‘importantes’ da rede LA s ̃ao os usu ́arios que mais retweetam ou os mais retweetados?<br>\n",
+        "(2) Quem s ̃ao estes usu ́arios que mais retweetam ou s ̃ao mais retweetados? Liste os top 10.<br>\n",
+        "(3) Quem s ̃ao os usu ́arios que mais responderam os tweets? Liste os top 5.<br>\n",
+        "(4) Alguns usu ́arios s ̃ao os mais mencionados. Quem s ̃ao os dez mais mencionados?<br>\n",
+        "Correlacione-os com os mais ‘importantes’ da rede e tamb ́em com os que mais\n",
+        "tweetam ou s ̃ao tweetados.\n"
+      ],
+      "metadata": {
+        "id": "m6thd9Pi1W41"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# usuarios mais importantes: isso depende de finalizar a fase 1 toda\n"
+      ],
+      "metadata": {
+        "id": "RJbMpieX1waR"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#usuarios quee mais retweetam \n",
+        "#out_degree e in_degree não eram para estar iguais. WTF\n",
+        "\n",
+        "from operator import itemgetter\n",
+        "\n",
+        "out_degrees_retweet = list(retweet.out_degree())  # sort, and get the mean value\n",
+        "out_degrees_retweet.sort(key=itemgetter(1) , reverse = True)\n",
+        "\n",
+        "\n",
+        "print(\"Que mais retweeetou:\",out_degrees_retweet[0]) # max\n",
+        "print(\"Top 10+ mais retweeetou: \",out_degrees_retweet[:10])\n",
+        "\n",
+        "\n",
+        "## tá um pouco estranho aqui "
+      ],
+      "metadata": {
+        "id": "0bXyNY2X16Ss",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "72210d46-de47-4be1-c170-6a9ca55c5a0a"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Que mais retweeetou: ('1', 223408)\n",
+            "Top 10+ mais retweeetou:  [('1', 223408), ('88', 14062), ('2', 10873), ('14454', 6190), ('677', 5624), ('1988', 4337), ('349', 2804), ('3', 2318), ('283', 2039), ('3571', 1982)]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# usuarios mais retwettados \n",
+        "in_degrees_retweetados = list(retweet.in_degree())  # sort, and get the mean valu\n",
+        "in_degrees_retweetados.sort(key=itemgetter(1) , reverse = True)\n",
+        "\n",
+        "print(\"Usuario mais retwettado:\",in_degrees_retweetados[0]) # max\n",
+        "print(\"Top 10+ mais retwettado: \",in_degrees_retweetados[:10])\n",
+        "\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ianCJXWzZ1cL",
+        "outputId": "59e53a26-7c5f-4e01-ca6e-e3e7a1fc0ff3"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Usuario mais retwettado: ('1', 223408)\n",
+            "Top 10+ mais retwettado:  [('1', 223408), ('88', 14062), ('2', 10873), ('14454', 6190), ('677', 5624), ('1988', 4337), ('349', 2804), ('3', 2318), ('283', 2039), ('3571', 1982)]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# que mais  respondeu \n",
+        "\n",
+        "out_degrees_retweetou = list(reply.out_degree())  # sort, and get the mean value\n",
+        "out_degrees_retweetou.sort(key=itemgetter(1) , reverse = True)\n",
+        "print(\"Top 5+ quem mais responderam: \",out_degrees_retweetou[:5])\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "bx3NbJlK2Qlv",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "6ec641e1-dd4d-4f74-84d1-e28302117a2d"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Top 5+ quem mais responderam:  [('1', 25313), ('2', 2242), ('677', 1208), ('88', 1071), ('220', 470)]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "#mais mencionado\n",
+        "\n",
+        "in_degrees_mencionado= list(mention.in_degree())\n",
+        "in_degrees_mencionado.sort(key=itemgetter(1) , reverse = True)\n",
+        "print(\"Top 10+ mais mencionado:\",in_degrees_mencionado[:10]) # max\n"
+      ],
+      "metadata": {
+        "id": "vXhsFWTS2t47",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "94fef348-9225-4c18-fd41-35f005b91979"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Top 10+ mais mencionado: [('1', 98874), ('88', 11960), ('2', 7637), ('677', 3915), ('2417', 2538), ('3', 1704), ('59195', 1604), ('3998', 1594), ('7533', 1531), ('383', 1359)]\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "\n",
+        "Relacionando as informações obtidas\n",
+        "---\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "MeXcdqlF2yUt"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        ""
+      ],
+      "metadata": {
+        "id": "2UKnAvEM3Rxn"
+      }
+    }
+  ]
+}