دکتر محمد دوستی زاده نشریه علمی-تخصصی دستاوردهای نوین در برق،کامپیوتر و فناوری 3 6 2023 05 21 Classification of Persian news text with logistic regression algorithm دسته‌بندی متن اخبار فارسی با الگوریتم رگرسیون لجستیک 24 37 10.22051/jera.2021.31891.2698 FA حمیدرضا دانشجوی کارشناسی ارشد دانشگاه جامع امام حسین (ع) محمدعلی استادیار دانشگاه جامع امام حسین (ع) 2023 05 06 Due to the ever-increasing amount of data, the amount of textual data is also growing at a high speed. Extracting information from these textual data is one of the necessities of today's information-based world. Text classification is one of the methods of obtaining information from this massive data. In this research, using a standard dataset of Persian news, which included five features in more than 86 thousand news, we investigated the performance of the logistic regression algorithm in the classification of Persian text and also compared it with other similar works. Considering the steps of creating a text category, we have explained the method used in the vectorization section and also stated the importance of the pre-processing section, especially the method used in tagging and converting sub-tags to main ones. In the final evaluation, by changing the algorithm's parameters and modifying the news tags, we reached the desired result of 95% in the accuracy criterion for the text classification of the Persian news dataset. با توجه به افزایش روزافزون داده، حجم داده‌های متنی نیز با سرعت بالایی در حال رشد است. استخراج اطلاعات از این داده‌های متنی یکی از ضرورت‌های دنیای مبتنی بر اطلاعات امروزی است. دسته‌بندی متن یکی روش‌های دست‌یابی به اطلاعات این داده‌های حجیم است. در این تحقیق با استفاده از یک مجموعه‌داده‌ استاندارد اخبار فارسی که شامل پنج ویژگی در بیش از 86هزار خبر بود به بررسی عملکرد الگوریتم رگرسیون لجستیک در دسته‌بندی متن فارسی و همچنین مقایسه آن با سایر کارهای مشابه پرداختیم. با توجه مراحل ساخت یک دسته‌بند متن،روش مورد استفاده در بخش بردارسازی را توضیح داده و همچنین اهمیت بخش پیش‌پردازش و مخصوصا روش مورد استفاده در برچسب‌گذاری و تبدیل برچسب‌های فرعی به اصلی را بیان کردیم. در ارزیابی نهایی، با استفاده از تغییر پارامترهای الگوریتم و همچنین اصلاح برچسب‌ها‌ی اخبار، به نتیجه مطلوب 95% در معیار دقت برای دسته‌بندی متن مجموعه‌داده اخبار فارسی رسیدیم.

/downloadfilepdf/994631