Skip to content

GitAJov/TALAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Based on:

https://github.com/bangkit-pukulrata/machine-learning/tree/main/model
https://github.com/tantowjy/news-classification/blob/main/website/main.py

Dataset

Dataset LLM: https://www.kaggle.com/datasets/iqbalmaulana/indonesian-news-dataset
Dataset NER: https://github.com/yohanesgultom/nlp-experiments/blob/master/data/ner/training_data.txt

TALAS API Documentation

Overview

TALAS adalah sistem berbasis API untuk menganalisis berita menggunakan model pembelajaran mesin, termasuk analisis bias, deteksi hoaks, deteksi ideologi, pengelompokan, dan entitas bernama. API ini dibangun dengan layanan Google Cloud Platform (GCP) menggunakan App Engine untuk komputasi, Cloud SQL (MySQL) untuk penyimpanan data pengguna, dan model pembelajaran mesin (supervised & unsupervised learning).

Endpoints

1. Bias Detection Endpoint

  • URL: /bias
  • Method: POST
  • Description: Memproses teks untuk menentukan bias artikel berita.
  • Request:
    {
        "content": "string" // Isi artikel berita
    }
  • Response:
    {
        "bias": 0 // Kategori bias (0 atau 1)
    }

2. Hoax Detection Endpoint

  • URL: /hoax
  • Method: POST
  • Description: Memproses teks untuk menentukan apakah artikel tersebut mengandung hoaks.
  • Request:
    {
        "content": "string" // Isi artikel berita
    }
  • Response:
    {
        "hoax": 0.85 // Tingkat hoaks (0 hingga 1)
    }

3. Ideology Detection Endpoint

  • URL: /ideology
  • Method: POST
  • Description: Memproses teks untuk menentukan ideologi artikel berita.
  • Request:
    {
        "content": "string" // Isi artikel berita
    }
  • Response:
    {
        "ideology": 1 // Ideologi artikel (0 = konservatif, 1 = liberal)
    }

Unsupervised Learning Models

1. Cluster Endpoint

  • URL: /cluster
  • Method: POST
  • Description: Mengelompokkan teks ke dalam cluster tertentu berdasarkan isinya.
  • Request:
    {
        "content": "string" // Isi artikel berita
    }
  • Response:
    {
        "cluster": 3 // Cluster artikel (0-7)
    }

2. Generate Mode Cluster

  • URL: /modeCluster
  • Method: POST
  • Description: Mencari cluster yang paling umum dari kumpulan artikel berita.
  • Request:
    [
        {
            "title": "string", // Judul artikel
            "content": "string", // Isi artikel
            "embedding": [0.1, 0.2] // Representasi embedding
        }
    ]
  • Response:
    {
        "modeCluster": 2 // Cluster yang paling umum
    }

Large Language Model (LLM)

1. Generate Embedding Endpoint

  • URL: /embedding
  • Method: POST
  • Description: Menghasilkan embedding untuk teks yang diberikan.
  • Request:
    [
        {
            "title": "string", // Judul artikel
            "content": "string" // Isi artikel
        }
    ]
  • Response:
    {
        "embedding": [[0.1, 0.2]] // Daftar embedding
    }

2. Generate Title Endpoint

  • URL: /title
  • Method: POST
  • Description: Menghasilkan judul dari kumpulan artikel berita.
  • Request:
    [
        {
            "title": "string", // Judul artikel
            "content": "string", // Isi artikel
            "embedding": [0.1, 0.2] // Representasi embedding
        }
    ]
  • Response:
    {
        "title": "Generated Title" // Judul yang dihasilkan
    }

3. Generate Summary Endpoint

  • URL: /summarize
  • Method: POST
  • Description: Membuat dua ringkasan (liberal dan konservatif) dari kumpulan artikel berita.
  • Request:
    [
        {
            "title": "string", // Judul artikel
            "content": "string", // Isi artikel
            "embedding": [0.1, 0.2] // Representasi embedding
        }
    ]
  • Response:
    {
        "summary_liberalism": "string", // Ringkasan liberal
        "summary_conservative": "string" // Ringkasan konservatif
    }

Process All Articles

  • URL: /process-all
  • Method: POST
  • Description: Process input text articles to group, generate titles, clusters/categories, summaries, and bias analysis for each group

Request Body

[
  {
    "title": "string",
    "content": "string",
    "embedding": [0.0, 0.1, 0.2]
  }
]

Response Body

[
  {
    "title": "Generated Group Title",
    "modeCluster": "Cluster/Category Name",
    "summary_liberalism": "Liberal perspective summary",
    "summary_conservative": "Conservative perspective summary",
    "analysis": "Bias and content analysis details"
  }
]

Possible Responses

  • 200 OK: Successfully processed and grouped articles
  • 400 Bad Request: Invalid input data
  • 500 Internal Server Error: Processing error

Named Entity Recognition (NER)

1. Main NER Page

  • URL: /
  • Method: GET
  • Description: Menampilkan halaman utama untuk input artikel dan analisis NER.
  • Response:
    • 200 OK: Menampilkan halaman ner_home.html.

2. Text Processing

  • URL: /process
  • Method: POST
  • Description: Memproses teks menggunakan model spaCy untuk mendeteksi entitas.
  • Request:
    • Form Data:
      input_data: "string" // Artikel atau teks untuk dianalisis
      
  • Response:
    • 200 OK: Mengembalikan hasil analisis entitas dalam format HTML.

Authentication

1. Login

  • URL: /process-login
  • Method: POST
  • Request Body:
    {
        "email": "string",
        "password": "string"
    }
  • Response:
    {
        "auth": true,
        "token": "string"
    }

2. Register

  • URL: /process-regist
  • Method: POST
  • Request Body:
    {
        "username": "string",
        "email": "string",
        "password": "string"
    }
  • Response:
    • 200 OK:
      {
          "message": "Data berhasil disimpan"
      }
    • 500 Internal Server Error:
      {
          "message": "Terjadi kesalahan"
      }

News Endpoints

1. Fetch News List

  • URL: /article/news
  • Method: GET
  • Description: Retrieve a list of news articles

Responses

  • 200 OK
    {
      "message": "Data fetched successfully",
      "data": []
    }
  • 500 Internal Server Error
    {
      "message": "Internal server error"
    }

2. Get News Content

  • URL: /article/:title
  • Method: GET
  • Description: Retrieve content for a specific news article

Responses

  • 200 OK
    {
      "message": "Content fetched successfully",
      "data": []
    }
  • 500 Internal Server Error
    {
      "message": "Internal server error"
    }

Crawler Endpoints

1. Run General Crawler

  • URL: https://talas24.et.r.appspot.com/api/crawler/general
  • Method: GET
  • Description: Run general crawler to update news data in the database

Response

{
  "message": "News updated successfully from general crawler"
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages